Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-972142: The CLIENT_RESULT_CHUNK_SIZE parameter is not taken into account in different ways of executing the request #1563

Open
krocodl opened this issue Nov 16, 2023 · 4 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@krocodl
Copy link

krocodl commented Nov 16, 2023

Driver net.snowflake:snowflake-jdbc:3.14.2

First of all we need to prepare the table with 10_000_000 rows. we can do it by the next SQL:

create or replace table long_table as select row_number() over (order by null) as id from table (generator(rowcount =>10000000)) `

After that we can implement the test, which tries to download a lot of simulated data with minimal usage of memory:

` @test
public void bigDataTest() throws Exception {
int colSize = 1024;
int colCount = 12;

    String colValue = "'" + "x".repeat(colSize) + "'";
    String query = "select id, " +
            String.join(", ", Collections.nCopies(colCount, colValue)) +
            " from long_table";

    try ( Connection conn = getConnection() ) {
        try (Statement statement = conn.createStatement()) {
            statement.execute("ALTER SESSION SET CLIENT_MEMORY_LIMIT=100");
            statement.execute("ALTER SESSION SET CLIENT_PREFETCH_THREADS=1");
            statement.execute("ALTER SESSION SET CLIENT_RESULT_CHUNK_SIZE=48");
            statement.execute("ALTER SESSION SET JDBC_QUERY_RESULT_FORMAT=JSON");
        }

        PreparedStatement ps = conn.prepareStatement(query, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
        ps.setFetchSize(10);
        ps.setMaxRows(-1);
        ResultSet rs = ps.executeQuery();

        int count = 0;
        while (rs.next()) {
            if (count++ % 1_000 == 0) {
                LOG.info("Count = {}", count);
            }
        }
    }
}`

The count of chunks, which will be downloaded, we can observe in the variable SFStatement#executeQueryInternal#result. The value of this variable is JSON document, which contains the list of reference on the chunk's URLs as "data.chunks" path. The size of this list is interesting for us - the larger the list, the smaller the size of one chunk and the less memory is used in the process of retrieving data. The following data is obtained with different values ​​of ps.setMaxRows():

  • ps.setMaxRows(Integer.MAX_VALUE) => data.chunks.size= 469
  • #ps.setMaxRows(Integer.MAX_VALUE); => data.chunks.size= 2453
  • ps.setMaxRows(0); => data.chunks.size= 2453
  • ps.setMaxRows(-1); => data.chunks.size= 469

At the same time, we see that with ps.setMaxRows(Integer.MAX_VALUE)
and ps.setMaxRows(-1); the value of the CLIENT_RESULT_CHUNK_SIZE configuration parameter is not taken into account at all when calculating the number of chunks.

The following graphs illustrate the actual memory consumption in two fundamentally different cases (data.chunks.size=2423 and data.chunks.size= 469)

Screenshot 2023-11-15 at 17 12 14
Screenshot 2023-11-15 at 16 55 25

In the second case from time to time we receive OOM due to the process of re-downloading chunks. This is another problem - in the process of re-requesting an incorrectly received chunk, this chunk is still held in memory, so the actual consumption will be CLIENT_RESULT_CHUNK_SIZE * 3 instead of the values ​​stated for the given values ​​of the variables CLIENT_RESULT_CHUNK_SIZE * 2

@krocodl krocodl added the bug label Nov 16, 2023
@github-actions github-actions bot changed the title The CLIENT_MEMORY_LIMIT parameter is not taken into account in different ways of executing the request SNOW-972142: The CLIENT_MEMORY_LIMIT parameter is not taken into account in different ways of executing the request Nov 16, 2023
@krocodl
Copy link
Author

krocodl commented Nov 16, 2023

  1. Without ALTER SESSION SET JDBC_QUERY_RESULT_FORMAT=JSON the behaviour is the same
  2. The larger the size of one chunk, the more likely it is that there will be several attempts to download it, which will lead to additional memory consumption. Therefore, it is quite possible that reducing the size of chunks will not increase the time it takes to obtain the result of the query as a whole

@krocodl krocodl changed the title SNOW-972142: The CLIENT_MEMORY_LIMIT parameter is not taken into account in different ways of executing the request SNOW-972142: The CLIENT_RESULT_CHUNK_SIZE parameter is not taken into account in different ways of executing the request Nov 16, 2023
@krocodl
Copy link
Author

krocodl commented Nov 16, 2023

BTW, there is an error in the documentation:
CLIENT_MEMORY_LIMIT is not global maximum memory usage limit for all queries. It is the property of the SnowflakeChunkDownloader and each SnowflakeResultSetSerializableV1 (each query under execution) has one own instance of this class.

@sfc-gh-igarish
Copy link
Collaborator

@sfc-gh-wfateem could you please triage this issue?

@sfc-gh-dszmolka sfc-gh-dszmolka added enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team and removed bug labels Apr 27, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi all - unfortunately this is a limitation across all of the Snowflake drivers and likely will need server-side changes too. The relevant (multiple) teams are aware, and scoping and prioritizing this issue is in progress. No timeline can be promised at this point though, unfortunately.

However if you're perhaps already a Snowflake customer, do reach out to your Account Team please and phrase how important this change would be for you to get. This might give the efforts some traction and help the involved teams prioritize. Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

4 participants