`Buffer.concat` silently produces invalid output when its output size is greater than 4GiB #55422

rotemdan · 2024-10-17T11:49:54Z

Version

v22.9.0, v23.0.0

Platform

Windows 11 x64

Microsoft Windows NT 10.0.22631.0 x64

Subsystem

Buffer

What steps will reproduce the bug?

const largeBuffer = Buffer.alloc(2 ** 32 + 5)
largeBuffer.fill(111)

const result = Buffer.concat([largeBuffer])
console.log(result)

How often does it reproduce? Is there a required condition?

Consistent in v22.9.0 and v23.0.0

What is the expected behavior? Why is that the expected behavior?

All bytes of the return buffer produced by Buffer.concat([largeBuffer]) should be identical to the source:

In this example:

111, 111, 111, 111, 111, 111, 111, 111, 111, 111, 111, ....

What do you see instead?

In the returned buffer, first 5 bytes are 111, and all following ones are 0.

111, 111, 111, 111, 111, 0, 0, 0, 0, 0, 0, ....

The console.log(result) output looks like:

<Buffer 6f 6f 6f 6f 6f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 4294967251
 more bytes>

Additional information

No response

The text was updated successfully, but these errors were encountered:

rotemdan · 2024-10-17T13:49:35Z

My current workaround (tested to produce correct results with sizes greater than 4 GiB):

export function concatBuffers(buffers: Buffer[]) {
	let totalLength = 0

	for (const buffer of buffers) {
		totalLength += buffer.length
	}

	const resultBuffer = Buffer.alloc(totalLength)

	if (totalLength === 0) {
		return resultBuffer
	}

	let writeOffset = 0

	for (const buffer of buffers) {
		resultBuffer.set(buffer, writeOffset)

		writeOffset += buffer.length
	}

	return resultBuffer
}

RedYetiDev · 2024-10-17T13:53:15Z

The issue started in v22.7.0. I'll start bisecting. Maybe #54087?

RedYetiDev · 2024-10-17T14:45:20Z

I've finished bisecting. This was indeed caused by #54087 cc @ronag.

9f8f26eb2ff36f9352dd85643073af876b9d6b46 is the first bad commit
commit 9f8f26eb2ff36f9352dd85643073af876b9d6b46 (HEAD)
Author: Robert Nagy <ronagy@icloud.com>
Date:   Fri Aug 2 11:19:41 2024 +0200

    buffer: use native copy impl
    
    PR-URL: https://github.com/nodejs/node/pull/54087
    Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
    Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
    Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
    Reviewed-By: Daniel Lemire <daniel@lemire.me>

 benchmark/buffers/buffer-copy.js |  6 ------
 lib/buffer.js                    | 11 ++++++-----
 src/node_buffer.cc               | 56 +++++++++++++++++++++++++++-----------------------------
 src/node_external_reference.h    |  9 +++++++++
 4 files changed, 42 insertions(+), 40 deletions(-)

ronag · 2024-10-21T05:02:30Z

Anyone care to open a PR? I think this could be a simple case of just switching to .set(srcBuffer) (instead of using native methods) in the case where the total length exceeds e.g. 2 GB.

duncpro · 2024-10-21T08:25:49Z

I reproduced this on macOS.

@ronag I'd like to try and tackle this one.

MrJithil · 2024-10-21T10:08:40Z

I reproduced this on macOS.

@ronag I'd like to try and tackle this one.

good luck.

rotemdan · 2024-10-21T10:18:44Z

This call to _copy is possibly the reason:

function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
  if (sourceEnd - sourceStart > target.byteLength - targetStart)
    sourceEnd = sourceStart + target.byteLength - targetStart;

  let nb = sourceEnd - sourceStart;
  const sourceLen = source.byteLength - sourceStart;
  if (nb > sourceLen)
    nb = sourceLen;

  if (nb <= 0)
    return 0;

  _copy(source, target, targetStart, sourceStart, nb); // <--

  return nb;
}

_copy is imported from some sort of internal binding.

const {
  byteLengthUtf8,
  compare: _compare,
  compareOffset,
  copy: _copy, // <--
  fill: bindingFill,
  isAscii: bindingIsAscii,
  isUtf8: bindingIsUtf8,
  indexOfBuffer,
  indexOfNumber,
  indexOfString,
  swap16: _swap16,
  swap32: _swap32,
  swap64: _swap64,
  kMaxLength,
  kStringMaxLength,
  atob: _atob,
  btoa: _btoa,
} = internalBinding('buffer');

A thorough solution is to ensure this method correctly handles large array sizes, or fails.

Just working around it by falling back to TypedArray.set, would leave the possibility of a future issue if some other code calls _copy.

duncpro · 2024-10-21T10:31:42Z

So the root cause of this problem is 32-bit integer overflow in SlowCopy in node_buffer.cc here.

const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked();

Apparently Uint32Value performs a wrapping conversion. So that's why in the example below the target buffer only gets filled with 5 bytes.

const largeBuffer = Buffer.alloc(2 ** 32 + 5)
largeBuffer.fill(111)

const result = Buffer.concat([largeBuffer])
console.log(result); // 6f 6f 6f 6f 6f 00 00 00 ...
                     // 1  2  3  4  5

Simply replacing Uint32Value with IntegerValue will fix this barring edge cases I've yet to fully consider.

rotemdan · 2024-10-21T10:32:41Z

I'm not sure what exactly the binding refers to, but I found a candidate method in the C++ code (at node/src/node_buffer.cc) that treats all arguments as Uint32:

// Assume caller has properly validated args.
void SlowCopy(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  ArrayBufferViewContents<char> source(args[0]);
  SPREAD_BUFFER_ARG(args[1].As<Object>(), target);

  const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
  const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
  const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked();

  memmove(target_data + target_start, source.data() + source_start, to_copy);
  args.GetReturnValue().Set(to_copy);
}

Regardless on whether it's the method used in the binding, using Uint32Value to extract the arguments doesn't seem right.

This method follows, also taking in uint32_ts:

uint32_t FastCopy(Local<Value> receiver,
                  const v8::FastApiTypedArray<uint8_t>& source,
                  const v8::FastApiTypedArray<uint8_t>& target,
                  uint32_t target_start,
                  uint32_t source_start,
                  uint32_t to_copy) {
  uint8_t* source_data;
  CHECK(source.getStorageIfAligned(&source_data));

  uint8_t* target_data;
  CHECK(target.getStorageIfAligned(&target_data));

  memmove(target_data + target_start, source_data + source_start, to_copy);

  return to_copy;
}

duncpro · 2024-10-21T10:33:27Z

@rotemdan this is correct

rotemdan · 2024-10-21T10:43:39Z

If you simply search for the string "uint32" in node/src/node_buffer.cc, you'd realize that many other methods assume that indices are uint32 (4 GiB max). Examples I've found:

CopyArrayBuffer
Fill
StringWrite
FastByteLengthUtf8
SlowIndexOfNumber (makes assumption that needle is uint32 - not the index)
FastIndexOfNumber (makes assumption that needle is uint32 - not the index)
WriteOneByteString
FastWriteString
...

ronag · 2024-10-21T10:48:22Z

I think the fast methods won't get called with anything that doesn't fit into uint32.

ronag · 2024-10-21T10:48:45Z

It's the slow methods that need fixing I guess. Should we even support 4G+ Buffers? @jasnell

rotemdan · 2024-10-21T11:08:00Z

It already supports large typed arrays (new Uint8Array(>= 4 GiB)) and buffers (Buffer.alloc(>= 4 GiB)) since version 22 (or earlier? not sure), which I think is great because it opened up many use cases that were limited before (in my case audio processing of multi-hour audio, and loading large machine-learning models, etc).

Fixing the methods in node/src/node_buffer.cc, by itself, isn't really that hard. It's more about ensuring that the code works correctly in various 32 bit and 64 bit platforms and processor architectures that are currently supported by Node.js.

As an intermediate solution, you could allow large ArrayBuffers but disallow large Buffer objects, but eventually you'd want to fix the Buffer objects to match the capabilities of ArrayBuffers (unless Buffer would be entirely deprecated at some point, or something like that).

targos added the buffer Issues and PRs related to the buffer subsystem. label Oct 17, 2024

rotemdan changed the title ~~Buffer.concat silently produces invalid output when its output size is greater than 4GB~~ Buffer.concat silently produces invalid output when its output size is greater than 4GiB Oct 17, 2024

RedYetiDev added the confirmed-bug Issues with confirmed bugs. label Oct 17, 2024

RedYetiDev added the regression Issues related to regressions. label Oct 17, 2024

ronag added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Buffer.concat` silently produces invalid output when its output size is greater than 4GiB #55422

`Buffer.concat` silently produces invalid output when its output size is greater than 4GiB #55422

rotemdan commented Oct 17, 2024

rotemdan commented Oct 17, 2024

RedYetiDev commented Oct 17, 2024 •

edited

Loading

RedYetiDev commented Oct 17, 2024 •

edited

Loading

ronag commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024 •

edited

Loading

MrJithil commented Oct 21, 2024

rotemdan commented Oct 21, 2024

duncpro commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024

rotemdan commented Oct 21, 2024 •

edited

Loading

ronag commented Oct 21, 2024

ronag commented Oct 21, 2024

rotemdan commented Oct 21, 2024 •

edited

Loading

Buffer.concat silently produces invalid output when its output size is greater than 4GiB #55422

Buffer.concat silently produces invalid output when its output size is greater than 4GiB #55422

Comments

rotemdan commented Oct 17, 2024

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Additional information

rotemdan commented Oct 17, 2024

RedYetiDev commented Oct 17, 2024 • edited Loading

RedYetiDev commented Oct 17, 2024 • edited Loading

ronag commented Oct 21, 2024 • edited Loading

duncpro commented Oct 21, 2024 • edited Loading

MrJithil commented Oct 21, 2024

rotemdan commented Oct 21, 2024

duncpro commented Oct 21, 2024 • edited Loading

rotemdan commented Oct 21, 2024 • edited Loading

duncpro commented Oct 21, 2024

rotemdan commented Oct 21, 2024 • edited Loading

ronag commented Oct 21, 2024

ronag commented Oct 21, 2024

rotemdan commented Oct 21, 2024 • edited Loading

`Buffer.concat` silently produces invalid output when its output size is greater than 4GiB #55422

`Buffer.concat` silently produces invalid output when its output size is greater than 4GiB #55422

RedYetiDev commented Oct 17, 2024 •

edited

Loading

RedYetiDev commented Oct 17, 2024 •

edited

Loading

ronag commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading