-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-750472 Out of memory issue #43
Comments
need investigation. |
We actually encountered something similar several months ago. I did some pretty extensive testing at the time to figure out what the problem was, and it appears it was the chunk size that is being sent from snowflake. I don't remember all of the details, but basically snowflake will give a list of all of the available chunks to download, so as you stream the rows, it will download each sequential chunk. The problem is that the chunk size (which is determined by snowflake as far as I can tell) seems to more or less double in size until it hits some threshold. So what'd happen is we'd end up with something like -
... and when the snowflake client is parsing these larger chunks, the node client eventually exhausts all of its memory. The way that we got around it was that we had to use With that said, I don't know for certain if that's what's happening here, but the symptoms do seem to be very much related, so wanted to post my observations here. |
Hello @thegrumpysnail, seems I have the same problem in my project. I would really appreciate for more verbose example with |
I am also facing similar issues. Is there any workaround for it. I would rather use streaming to access all the data instead of having to manually batch up the calls. |
I'm also noticing unreasonably big memory usage when streaming. Am I missing something? How can I work around that? |
@TimShilov - it's been ages, so the details are very fuzzy in my memory, but basically we had to initially generate a result using |
@thegrumpysnail Thanks for the response. That's basically what I'm trying to achieve. What I don't understand is how to get the
The problem is step 1 fails due to high memory usage because the results of a heavy query are still fetched (even though I don't need them). |
We have also encountered this issue. I do not at the moment know very much about the conditions, except that it happened when we attempted to upgrade the version of
|
quick update: team is looking into implementing the highwatermark / backpressure functionality . will link the PR once available. |
It may be a separate issue but decided to leave it here cause the symptoms are similar. |
there hasn't been any changes implemented (yet) in context with the request detailed in this issue #43 (highWaterMark / adding backpressure capability for resultset streams) please open a new Issue for what you experience. If you can please add more details on the reproduction, that would surely help troubleshooting faster. thank you in advance ! |
Is there any progress on this front? |
there is. work already started but got reproritized due to other more critical bugs which are now fixed so we resume working on this one. Hope to be able to provide an update by mid-June |
We're also running into this issue. It would be nice to have some control over it. |
PR is now merged into |
thank you all for bearing with us - the long-awaited improvement of the connector supporting backpressure functionality is now out with release 1.6.23 ! some important notes though to add, and we'll amend the official documentation too but until then, here it is:
var connection = snowflake.createConnection({
account: process.env.SFACCOUNT,
username: process.env.SFUSER,
..
streamResult: true
});
[..rest of the code..]
connection.execute({
sqlText: " select L_COMMENT from SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM limit 100000000;",
streamResult: true,
complete: function (err, stmt)
{
var stream = stmt.streamRows();
stream.on('readable', function (row) // Read data from the stream when it is available
{
let row;
// Before the change, the amount of data in the stream is possibly greater than the highWaterMark threshold
// After the change, the amount of data in the stream will be lesser or equal than the threshold
while ((row = this.read()) !== null)
{
console.log(row);
}
}).on('end', function ()
{
console.log('done');
}).on('error', function (err)
{
console.log(err);
});
}
}); |
I have attempted to use this as per below. unfortunatly round the 49,000 record IStill get snowflakeInteractionsStream
.on(
'readable',
function ()
{
let row;
while ((row = this.read()) !== null) {
count++;
if (count % 500 === 0) {
console.log(`Rows ${count}`);
}
}
},
)
.on('end', function () {
console.log(`The End`);
resolve();
})
.on('error', function (err) {
console.log(err);
}); |
can you please include a
asking this to confirm whether you're still hitting the same issue which was fixed with introducing backpressure and if you don't wish to share it here, please open a case with Snowflake Support and mention this Issue. You can then work 1:1 with a support engineer if you don't wish to do that in public. |
Here is the full runnable script. First you have a function which returns the runnable stream as the result of a promise. I cannot supply the full results but can supply a sample and a total number of records round 500,000.
const getStatementStream = function (query): Promise<internal.Readable> {
return new Promise((resolve, reject) => {
const statement = this.connection.execute({
sqlText: query,
streamResult: true,
complete: function (err, statement: Statement) {
if (err) {
reject(err);
} else {
const stream = statement.streamRows();
resolve(stream);
}
},
});
});
}
const snowflakeInteractionsStream = getStatementStream(`SELECT
user_id,
event_type,
event_id,
event_date,
username,
OBJECT_CONSTRUCT(
'scheme_code', code,
'account_no', account_id,
'ccy', currency_code
) as metadata
FROM accounts_table`);
snowflakeInteractionsStream
.on(
'readable',
function () // row, // Read data from the stream when it is available
{
let row;
while ((row = this.read()) !== null) {
count++;
if (count % 500 === 0) {
console.log(`new ${interaction} interactions: ${count}`);
}
}
},
)
.on('end', function () {
console.log(`finished ${interaction} interactions: ${count}`);
resolve();
})
.on('error', function (err) {
console.log(err);
reject();
}); |
thank you for providing the snippet and the rest of the details ! i managed to reproduce the issue and realized that
until I reverted #465 : ....
//ret = betterEval("(" + rawColumnValue + ")");
ret = eval("(" + rawColumnValue + ")"); So using the exact same setup for this reproduction and test data (which is around 9million rows from the above 3 in your example), even with or without So I believe what you're seeing now is more likely connected to issues brought in by #465 , for which we have multiple Issues open (#528 , #539) |
Thanks for the quick update. Something small i noticed is that |
can confirm, |
For others in the same boat as me, it's worth noting that before you can rely on the streaming OOM fixes in |
I am running into an out of memory issue when trying to stream rows from a table with 3 million rows. Does the snowflake stream support the highWaterMark and back pressure functionality of streams? It seems like it should take little memory to stream 10 rows at a time from the db.
Output
The text was updated successfully, but these errors were encountered: