You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current ABFE workflow set up, the file read and preprocess are running on a single thread, which is kind of waste of time when reading a lot of files.
In the ABFE workflow, I think we could speed things up by wrap the read and preprocess in a multiprocess thread.
I'm thinking of adding a new dependency joblib for that. I wonder if I could get some advice if the community are happy with that.
The text was updated successfully, but these errors were encountered:
joblib is not a big dependency; however, we can also think about making it an optional dep.
Perhaps a good starting point for discussion is to see how much speed-up it can bring and if it's something people will likely always want to use. Do you have some benchmark comparisons for typical data sets and how it scales?
It is quite a big speed up. Assume that we have 64 lambda windows where each one has 6251 time points.
On a 64 core instance, the read goes from 7.98 s ± 114 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
to 625 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The preprocess goes from 1min 10s ± 598 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
to 3.12 s ± 86.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In the current ABFE workflow set up, the file read and preprocess are running on a single thread, which is kind of waste of time when reading a lot of files.
In the ABFE workflow, I think we could speed things up by wrap the read and preprocess in a multiprocess thread.
I'm thinking of adding a new dependency
joblib
for that. I wonder if I could get some advice if the community are happy with that.The text was updated successfully, but these errors were encountered: