-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of ContinuousData, such as strings #172
base: dev
Are you sure you want to change the base?
Conversation
This reverts commit 419aa8f.
Oh, also is there a reason we're leaving strings as bytestrings (b"asdf") instead of decoding them to utf-8 ("asdf")? |
related to some of what i was writing about here last night: #67 basically the subject class is in the middle of a complete overhaul here: https://github.com/wehr-lab/autopilot/tree/detour-datamodel that i'm trying to finish today or tomorrow. I think it might be a good idea to revisit this after that gets merged, as changing the way that data is declared in general is the goal and things will look pretty different internally (though the In the meantime:
I think that a good way to do this is to have a mapping between python types and tables types, so rather than relying on
basically this^ but for all types ya.
In general no data should be dropped silently, totally agree this is a bug.
Serializing arrays is supported! when the message is serialized it uses a which uses compression before send used to be faster on the raspi3 becuase its network card was very limited, but we should do some perf testing to see if that's still the case (or at least make it optional/detect which is faster depending on the arrays being sent). I also want to switch from using json to using something like msgpack because python builtin json is notoriously slow... but one thing at a time.
Yeah there's too much assumption about the typing all the way through data handling, one of the major reasons i'm rewriting it to be more strictly and transparently typed.
This is sort of a tricky question: in my opinion all of this should be made abstract behind the Subject class, so the literal arrangement of data in the hdf5 file shouldn't really matter all that much, as long as an API to declare, save, and load are exposed. As exists currently, the declaration of both trial and continuous data is flat: effectively a list of variables and types, but with the shift to using more explicit data models it should be possible to make recursive models like this class Tetrode(Data):
timestamp: datetime
channel1: float
channel2: float
channel3: float
channel4: float
class LickSensor(Data):
timestamp: datetime
state: bool
class ContinuousData(ContinuousDataModel):
tetrode: Tetrode
lick: LickSensor
accelerometer: float where the To me storing several arrays with identical timestamps, each having a single data stream is a bit more flexible than trying to pack multiple streams of data in a particular table, but i think it should be an option if the streams are truly coupled. |
this a pytables/hdf5 thing: https://www.pytables.org/MIGRATING_TO_3.x.html
which i'm handling like this, as pandas has a pretty fast vectorized bytestring encoding method |
sorry for the delay, but I really really really need to turn to writing my dissertation so I won't be able to get to this for a bit. :\ i want to pull in the changes to subject as v0.5.0 before resubmitting the autopilot manuscript though so it won't be indefinite. |
This is a PR to fix (I think) the way ContinuousData is handled.
I created an example Task called ex_ContData to illustrate the issue. This task returns both TrialData and ContinuousData.
When I first ran it, I got the following errors:
I believe this is a simple syntax error.
task_class.ContinuousData.keys()
does not work, it should betask_class.ContinuousData.columns.keys()
. This is fixed by 920f3edAfter this, I got a new error:
Basically, when creating the ContinuousData group, it fails to properly detect the datatype of strings (when strings are returned as ContinuousData). This is because strings do not have a
dtype
field. Because this whole thing is wrapped in a try/except, it simply passes silently, and doesn't store any continuous data.This also interacts super weirdly with what I think is a bug in pytables. tables.Atom.from_type simply does not work on strings, as far as I can tell. You can see this even in their own docstrings, which contain an error, I guess they are using some code that inserts the result of running itself into the docstrings (which in this case is an error).
https://www.pytables.org/_modules/tables/atom.html#Atom.from_type
The only way I could figure around this is to detect if the returned data is a string, and then use tables.StringAtom instead.
The fix is in dbc585d
This actually works. I can now see ContinuousData!!
A remaining issue -- I don't know how to get the intended length of the string from ContinuousData column. Instead, I just use the length of the string that is passed -- but this will fail if a subsequent string is longer. Probably this is a simple fix. In general, we should probably be creating these Atom using the ContinuousData description itself, not the datatype of the provided data.
Also, just some questions about how ContinuousData is meant to work: