Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-750472 Out of memory issue #43

Closed
stenou opened this issue Jun 25, 2019 · 24 comments
Closed

SNOW-750472 Out of memory issue #43

stenou opened this issue Jun 25, 2019 · 24 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature

Comments

@stenou
Copy link

stenou commented Jun 25, 2019

I am running into an out of memory issue when trying to stream rows from a table with 3 million rows. Does the snowflake stream support the highWaterMark and back pressure functionality of streams? It seems like it should take little memory to stream 10 rows at a time from the db.

const snowflake = require('snowflake-sdk');
const config = require('config');
const through2 = require('through2');

async function test() {
  var connection = snowflake.createConnection({
      account: config.snowflake.account,
      username: config.snowflake.username,
      password: config.snowflake.password
    }
  );

  const conn = await new Promise((resolve, reject) => {
    connection.connect((err, c) => (err ? reject(err) : resolve(c)));
  });

  var statement = conn.execute({
    sqlText: 'SELECT * FROM TABLE’
  });

  var stream = statement.streamRows();
  stream.pipe(through2.obj({highWaterMark: 5}, function (chunk, enc, next) {
    // console.log(chunk);
    next();
  }));
}

test();

Output

<--- Last few GCs --->

[5282:0x103801600]   119235 ms: Scavenge 1382.8 (1419.3) -> 1382.1 (1419.8) MB, 4.6 / 0.0 ms  (average mu = 0.167, current mu = 0.068) allocation failure 
[5282:0x103801600]   119244 ms: Scavenge 1383.2 (1419.8) -> 1382.5 (1420.3) MB, 4.3 / 0.0 ms  (average mu = 0.167, current mu = 0.068) allocation failure 
[5282:0x103801600]   120061 ms: Mark-sweep 1383.6 (1420.3) -> 1382.9 (1420.3) MB, 812.5 / 0.0 ms  (average mu = 0.100, current mu = 0.030) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x23988115be3d]
    1: StubFrame [pc: 0x2398811134b0]
Security context: 0x3459e929e6e1 <JSObject>
    2: extractFromRow [0x345902014769] [/Users/s/git/riversjs/node_modules/snowflake-sdk/lib/connection/result/column.js:~709] [pc=0x23988116e1f3](this=0x345945be4eb9 <Column map = 0x34590b7387b1>,row=0x3459467bd621 <Object map = 0x3459b262f599>,context=0x3459f8ee0899 <Object map = 0x34590b74b3c9>,asString=0x3...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003ae75 node::Abort() [/Users/s/.nvm/versions/node/v10.12.0/bin/node]
 2: 0x10003b07f node::OnFatalError(char const*, char const*) [/Users/s/.nvm/versions/node/v10.12.0/bin/node]
 3: 0x1001a6a85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/s/.nvm/versions/node/v10.12.0/bin/node]
 4: 0x100574422 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/s/.nvm/versions/node/v10.12.0/bin/node]
@smtakeda
Copy link
Contributor

need investigation.

@smtakeda smtakeda added the jira label Jul 16, 2019
@thegrumpysnail
Copy link

We actually encountered something similar several months ago. I did some pretty extensive testing at the time to figure out what the problem was, and it appears it was the chunk size that is being sent from snowflake. I don't remember all of the details, but basically snowflake will give a list of all of the available chunks to download, so as you stream the rows, it will download each sequential chunk. The problem is that the chunk size (which is determined by snowflake as far as I can tell) seems to more or less double in size until it hits some threshold. So what'd happen is we'd end up with something like -

Chunk 1: 1mb
Chunk 2: 2mb
Chunk 3: 4mb
Chunk 4: 8mb

... and when the snowflake client is parsing these larger chunks, the node client eventually exhausts all of its memory. The way that we got around it was that we had to use result_scan and iterate over the original query with a smaller pagination size (in our case, batching by 1,000,000 rows didn't generate chunks large enough to cause any out of memory errors).

With that said, I don't know for certain if that's what's happening here, but the symptoms do seem to be very much related, so wanted to post my observations here.

@balasphilip
Copy link

Hello @thegrumpysnail, seems I have the same problem in my project. I would really appreciate for more verbose example with result_scan. Thanks in advance!

@sfc-gh-jfan sfc-gh-jfan reopened this Jul 1, 2022
@github-actions github-actions bot closed this as completed Jul 2, 2022
@sfc-gh-jfan sfc-gh-jfan reopened this Jul 6, 2022
@kumaresansv
Copy link

I am also facing similar issues. Is there any workaround for it. I would rather use streaming to access all the data instead of having to manually batch up the calls.

@sfc-gh-dszmolka sfc-gh-dszmolka added enhancement The issue is a request for improvement or a new feature status-in_progress Issue is worked on by the driver team labels Mar 29, 2023
@sfc-gh-dszmolka sfc-gh-dszmolka changed the title Out of memory issue SNOW-750472 Out of memory issue Apr 4, 2023
@TimShilov
Copy link

TimShilov commented Apr 12, 2023

I'm also noticing unreasonably big memory usage when streaming.
I've tried to work around this using result_scan but it didn't help because to use result_scan you still need to execute the query first and it seems like there is just no way to execute a query without fetching the results.
Even if you just do connection.execute({ sqlText: "SELECT * FROM LARGE_TABLE" }) without providing any callback it still seems to fetch results under the hood (judging by the resource usage and OOM kills).

Am I missing something? How can I work around that?

@thegrumpysnail
Copy link

@TimShilov - it's been ages, so the details are very fuzzy in my memory, but basically we had to initially generate a result using result_scan, and instead of streaming the results of the result_scan, we'd select an offset / limit from the result_scan using the query ID returned. So you're effectively doing what snowflake is trying to do behind the scenes, except in a way that doesn't kill your service.

@TimShilov
Copy link

TimShilov commented Apr 13, 2023

@thegrumpysnail Thanks for the response. That's basically what I'm trying to achieve. What I don't understand is how to get the queryId in the first place?
I'm trying to:

  1. Execute a heavy query and get queryId (without fetching the result).
  2. Iterate over the results of that query using RESULT_SCAN(<queryId>)

The problem is step 1 fails due to high memory usage because the results of a heavy query are still fetched (even though I don't need them).

@haggholm
Copy link

We have also encountered this issue. I do not at the moment know very much about the conditions, except that it happened when we attempted to upgrade the version of snowflake-sdk from 1.6.14 to 1.6.21. Since this is only known to have happened when running against a customer warehouse, it's not entirely simple to bisect (if I had a repro case against our own warehouse I'd bisect and identify which specific version upgrade introduced the problem).

<--- Last few GCs --->
[14:0x4f816d0] 1126668 ms: Mark-sweep (reduce) 4945.1 (5472.8) -> 4944.9 (5472.8) MB, 5738.1 / 0.5 ms (average mu = 0.160, current mu = 0.017) allocation failure GC in old space requested
[14:0x4f816d0] 1132076 ms: Mark-sweep (reduce) 4945.5 (5472.8) -> 4945.1 (5473.1) MB, 5343.8 / 0.5 ms (average mu = 0.094, current mu = 0.012) allocation failure GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0xb06730 node::Abort() [/usr/local/bin/node]
2: 0xa1b6d0 [/usr/local/bin/node]
3: 0xce1e60 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
4: 0xce2207 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
5: 0xe99875 [/usr/local/bin/node]
6: 0xea953d v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
7: 0xeac23e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
8: 0x121bd3d v8::internal::Deserializer<v8::internal::Isolate>::Allocate(v8::internal::SnapshotSpace, int, v8::internal::AllocationAlignment) [/usr/local/bin/node]
9: 0x121ffa3 v8::internal::Deserializer<v8::internal::Isolate>::ReadObject(v8::internal::SnapshotSpace) [/usr/local/bin/node]
10: 0x121f58c int v8::internal::Deserializer<v8::internal::Isolate>::ReadSingleBytecodeData<v8::internal::SlotAccessorForHeapObject>(unsigned char, v8::internal::SlotAccessorForHeapObject) [/usr/local/bin/node]
11: 0x121fed1 v8::internal::Deserializer<v8::internal::Isolate>::ReadData(v8::internal::Handle<v8::internal::HeapObject>, int, int) [/usr/local/bin/node]
12: 0x122002e v8::internal::Deserializer<v8::internal::Isolate>::ReadObject(v8::internal::SnapshotSpace) [/usr/local/bin/node]
13: 0x121e960 v8::internal::Deserializer<v8::internal::Isolate>::ReadObject() [/usr/local/bin/node]
14: 0x1219218 v8::internal::ContextDeserializer::Deserialize(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSGlobalProxy>, v8::DeserializeInternalFieldsCallback) [/usr/local/bin/node]
15: 0x1219396 v8::internal::ContextDeserializer::DeserializeContext(v8::internal::Isolate*, v8::internal::SnapshotData const*, bool, v8::internal::Handle<v8::internal::JSGlobalProxy>, v8::DeserializeInternalFieldsCallback) [/usr/local/bin/node]
16: 0x123ea08 v8::internal::Snapshot::NewContextFromSnapshot(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSGlobalProxy>, unsigned long, v8::DeserializeInternalFieldsCallback) [/usr/local/bin/node]
17: 0xf4d705 v8::internal::Genesis::Genesis(v8::internal::Isolate*, v8::internal::MaybeHandle<v8::internal::JSGlobalProxy>, v8::Local<v8::ObjectTemplate>, unsigned long, v8::DeserializeInternalFieldsCallback, v8::MicrotaskQueue*) [/usr/local/bin/node]
18: 0xf4e538 v8::internal::Bootstrapper::CreateEnvironment(v8::internal::MaybeHandle<v8::internal::JSGlobalProxy>, v8::Local<v8::ObjectTemplate>, v8::ExtensionConfiguration*, unsigned long, v8::DeserializeInternalFieldsCallback, v8::MicrotaskQueue*) [/usr/local/bin/node]
19: 0xcef696 v8::NewContext(v8::Isolate*, v8::ExtensionConfiguration*, v8::MaybeLocal<v8::ObjectTemplate>, v8::MaybeLocal<v8::Value>, unsigned long, v8::DeserializeInternalFieldsCallback, v8::MicrotaskQueue*) [/usr/local/bin/node]
20: 0xcefb64 v8::Context::New(v8::Isolate*, v8::ExtensionConfiguration*, v8::MaybeLocal<v8::ObjectTemplate>, v8::MaybeLocal<v8::Value>, v8::DeserializeInternalFieldsCallback, v8::MicrotaskQueue*) [/usr/local/bin/node]
21: 0xaf68f1 node::contextify::ContextifyContext::CreateV8Context(node::Environment*, v8::Local<v8::Object>, node::contextify::ContextOptions const&) [/usr/local/bin/node]
22: 0xaf76d7 node::contextify::ContextifyContext::ContextifyContext(node::Environment*, v8::Local<v8::Object>, node::contextify::ContextOptions const&) [/usr/local/bin/node]
23: 0xaf7cd9 node::contextify::ContextifyContext::MakeContext(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]
24: 0x156ed0c [/usr/local/bin/node]
Child process exited with code null

@sfc-gh-dszmolka
Copy link
Collaborator

quick update: team is looking into implementing the highwatermark / backpressure functionality . will link the PR once available.

@TimShilov
Copy link

It may be a separate issue but decided to leave it here cause the symptoms are similar.
Since the update to version 1.6.21 the OOM problems got worse. In our case processes started crashing more frequently even on relatively small responses. Just downgrading to 1.6.20 with no other changes to the code fixed the crashes.
I assume that the latest patch introduced some memory leak or something. 😩

cc @sfc-gh-dszmolka

@sfc-gh-dszmolka
Copy link
Collaborator

there hasn't been any changes implemented (yet) in context with the request detailed in this issue #43 (highWaterMark / adding backpressure capability for resultset streams)

please open a new Issue for what you experience. If you can please add more details on the reproduction, that would surely help troubleshooting faster. thank you in advance !

@haggholm
Copy link

quick update: team is looking into implementing the highwatermark / backpressure functionality . will link the PR once available.

Is there any progress on this front?

@sfc-gh-dszmolka
Copy link
Collaborator

there is. work already started but got reproritized due to other more critical bugs which are now fixed so we resume working on this one. Hope to be able to provide an update by mid-June

@sfc-gh-dszmolka
Copy link
Collaborator

#505

@sfc-gh-dszmolka sfc-gh-dszmolka added status-pr_pending_merge A PR is made and is under review and removed jira status-in_progress Issue is worked on by the driver team labels May 30, 2023
@jdgcj
Copy link

jdgcj commented Jun 6, 2023

We're also running into this issue. It would be nice to have some control over it.

@sfc-gh-dszmolka
Copy link
Collaborator

PR is now merged into main and will be part of the next release, 1.6.23. Will comment again once release is available; expected towards end of June.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. and removed status-pr_pending_merge A PR is made and is under review labels Jun 13, 2023
@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Jun 15, 2023

thank you all for bearing with us - the long-awaited improvement of the connector supporting backpressure functionality is now out with release 1.6.23 ! some important notes though to add, and we'll amend the official documentation too but until then, here it is:

  1. most importantly, it can be toggled with streamResult in Connection, which is by default false. For any result set that could exceed node's default memory, it is highly recommended to set streamResult: true especially if you know already you'll be streaming the results anyway. With the default setting false, the connector will attempt to store all the rows in an array before streaming the results. With smaller result sets, this been never an issue but with larger result sets, storing all the results in memory can contribute to an OOM error.
  2. the backpressure mechanism is automatic and only takes effect when the stream has reached the highWaterMark threshold
  3. example code snippet of usage, please note the streamResult in Connection:
var connection = snowflake.createConnection({
    account: process.env.SFACCOUNT,
    username: process.env.SFUSER,
..
    streamResult: true
});
[..rest of the code..]
connection.execute({
  sqlText: " select L_COMMENT from SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM limit 100000000;",
  streamResult: true,
  complete: function (err, stmt)
  {
    var stream = stmt.streamRows();
    stream.on('readable', function (row) // Read data from the stream when it is available
    {
      let row;

      // Before the change, the amount of data in the stream is possibly greater than the highWaterMark threshold
      // After the change, the amount of data in the stream will be lesser or equal than the threshold
      while ((row = this.read()) !== null) 
      {
        console.log(row);
      }
    }).on('end', function ()
    {
      console.log('done');
    }).on('error', function (err)
    {
      console.log(err);
    });
  }
});

@sfc-gh-dszmolka sfc-gh-dszmolka removed the status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. label Jun 15, 2023
@azriel46d
Copy link

I have attempted to use this as per below. unfortunatly round the 49,000 record IStill get Javascript heap out of memory

snowflakeInteractionsStream
        .on(
          'readable',
          function () 
          {
            let row;
            while ((row = this.read()) !== null) {
              count++;
              if (count % 500 === 0) {
                console.log(`Rows ${count}`);
              }
            
            }
          },
        )
        .on('end', function () {

          console.log(`The End`);
          resolve();
        })
        .on('error', function (err) {
          console.log(err);
        });

@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Jun 24, 2023

can you please include a

  • full runnable script which leads to the issue with all configuration you're using
  • table definition
  • actual dataset or a mock dataset which represents the data in the table

asking this to confirm whether you're still hitting the same issue which was fixed with introducing backpressure and streamResult: true, or perhaps you're hitting #220

if you don't wish to share it here, please open a case with Snowflake Support and mention this Issue. You can then work 1:1 with a support engineer if you don't wish to do that in public.

@azriel46d
Copy link

@sfc-gh-dszmolka

Here is the full runnable script. First you have a function which returns the runnable stream as the result of a promise.
A query is then made. In terms of the tabular structure its limited in terms of the number of columns. So i do not think its linked to #220 .

I cannot supply the full results but can supply a sample and a total number of records round 500,000.

USER_ID,EVENT_TYPE,EVENT_ID,EVENT_DATE,USERNAME,METADATA
4b116319-efcf-4674-93c6-4ed16524b331,New account,new-account-876014695332-823034494,2019-04-29,user001,"{
  ""account_no"": ""01220304055"",
  ""ccy"": ""EUR"",
  ""scheme_code"": ""CRA123""
}"
02786cf3-0d4e-43b1-b202-ba85c1bb9291,New account,new-account-876014695433-823034495,2019-04-29,user002,"{
  ""account_no"": ""998989898989"",
  ""ccy"": ""EUR"",
  ""scheme_code"": ""CRA123""
}"
bbf13761-8fa5-47e4-98b6-6f2b436f84f6,New account,new-account-876014695534-823034496,2019-04-28,user003,"{
  ""account_no"": ""8786868686868"",
  ""ccy"": ""EUR"",
  ""scheme_code"": ""CRA123""
}"
const getStatementStream = function (query): Promise<internal.Readable> {
  return new Promise((resolve, reject) => {
    const statement = this.connection.execute({
      sqlText: query,
      streamResult: true,
      complete: function (err, statement: Statement) {
        if (err) {
          reject(err);
        } else {
          const stream = statement.streamRows();
          resolve(stream);
        }
      },
    });
  });
}

const snowflakeInteractionsStream = getStatementStream(`SELECT
    user_id,
    event_type,
    event_id,
    event_date,
    username, 
    OBJECT_CONSTRUCT(
      'scheme_code', code, 
      'account_no',  account_id, 
      'ccy', currency_code
    ) as metadata  
FROM accounts_table`);
snowflakeInteractionsStream
  .on(
    'readable',
    function () // row, // Read data from the stream when it is available
    {
      let row;
      while ((row = this.read()) !== null) {
        count++;
        if (count % 500 === 0) {
          console.log(`new ${interaction} interactions: ${count}`);
        }

      }
    },
  )
  .on('end', function () {
    console.log(`finished ${interaction} interactions: ${count}`);
    resolve();
  })
  .on('error', function (err) {
    console.log(err);
    reject();
  });

@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Jun 26, 2023

thank you for providing the snippet and the rest of the details ! i managed to reproduce the issue and realized that

  1. in my above comment I forgot to include the necessary configuration on Connection. this is now included in the example.
  2. still with streamResult set to true on the Connection, the execution still breaks with JavaScript heap out of memory

until I reverted #465 :

     ....
      //ret = betterEval("(" + rawColumnValue + ")");
      ret = eval("(" + rawColumnValue + ")");

So using the exact same setup for this reproduction and test data (which is around 9million rows from the above 3 in your example), even with or without streamResult set to true on the Connection, it works now and doesn't break.

So I believe what you're seeing now is more likely connected to issues brought in by #465 , for which we have multiple Issues open (#528 , #539)

@azriel46d
Copy link

Thanks for the quick update. Something small i noticed is that @types/snowflake-sdk is not updated as well to reflect the connection settings change .
I can confirm with reverting betterEval it does not go out of memory for the above example. I have also tried with a more complex setup in which it sinks into a local DB and it works flawlessly.

@sfc-gh-dszmolka
Copy link
Collaborator

can confirm, @types/snowflake-sdk is not managed by Snowflake at this moment

@APTy
Copy link
Contributor

APTy commented Jun 28, 2023

For others in the same boat as me, it's worth noting that before you can rely on the streaming OOM fixes in 1.6.23, you may want to wait on:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature
Projects
None yet
Development

No branches or pull requests