You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Backfill queries typically take 1 to 30 seconds to complete. There are exceptions which we'll get to, but in general if we're in the middle of issuing a backfill query and then don't see any data for multiple minutes (let's say 15m here, just for the sake of argument) then something is very wrong.
And usually what's wrong is that the database connection has silently dropped or something along those lines. It doesn't happen often, but if it does there's no indication what's going on, we just sit there indefinitely waiting for backfill result rows which will never arrive.
We should add a basic watchdog timeout to the backfillStream() function so that if >15m has elapsed since we last saw any data we fail the capture.
There is one situation where this could break something that previously "worked" however, so we should exercise a bit of care and think this part through. Sometimes (in certain tables with certain backfill modes) a backfill query will require a full-table sort. If the table is small there's no problem: the sort takes a few seconds, we issue a few backfill queries each requiring such a sort, and then we're done. But if the table is sufficiently big then we have a quadratic problem: each sort takes many minutes and we have to issue a whole bunch of backfill queries. In this case the backfill is unlikely to ever complete anyway, because we're averaging some pathetic data rate because every 50k rows requires another full-table sort.
I'm going to arbitrarily say that 15 minutes is a reasonable dividing line between the two cases. This is probably overly generous even -- just consider the size of a dataset that takes 15 minutes to sort in memory, it's unlikely that at 50k/15 rows/minute we're ever going to finish the backfill before someone gets tired of waiting. When this sort of thing happens we usually go and either manually change the backfill mode for the table or fix the connector bug which caused it anyway.
TL;DR: We should add a watchdog timeout in backfillStream, but in order to not break marginally-slow cases it should be fairly generous. I think 15 minutes is probably a reasonable compromise between those concerns.
The text was updated successfully, but these errors were encountered:
Backfill queries typically take 1 to 30 seconds to complete. There are exceptions which we'll get to, but in general if we're in the middle of issuing a backfill query and then don't see any data for multiple minutes (let's say 15m here, just for the sake of argument) then something is very wrong.
And usually what's wrong is that the database connection has silently dropped or something along those lines. It doesn't happen often, but if it does there's no indication what's going on, we just sit there indefinitely waiting for backfill result rows which will never arrive.
We should add a basic watchdog timeout to the
backfillStream()
function so that if >15m has elapsed since we last saw any data we fail the capture.There is one situation where this could break something that previously "worked" however, so we should exercise a bit of care and think this part through. Sometimes (in certain tables with certain backfill modes) a backfill query will require a full-table sort. If the table is small there's no problem: the sort takes a few seconds, we issue a few backfill queries each requiring such a sort, and then we're done. But if the table is sufficiently big then we have a quadratic problem: each sort takes many minutes and we have to issue a whole bunch of backfill queries. In this case the backfill is unlikely to ever complete anyway, because we're averaging some pathetic data rate because every 50k rows requires another full-table sort.
I'm going to arbitrarily say that 15 minutes is a reasonable dividing line between the two cases. This is probably overly generous even -- just consider the size of a dataset that takes 15 minutes to sort in memory, it's unlikely that at 50k/15 rows/minute we're ever going to finish the backfill before someone gets tired of waiting. When this sort of thing happens we usually go and either manually change the backfill mode for the table or fix the connector bug which caused it anyway.
TL;DR: We should add a watchdog timeout in
backfillStream
, but in order to not break marginally-slow cases it should be fairly generous. I think 15 minutes is probably a reasonable compromise between those concerns.The text was updated successfully, but these errors were encountered: