Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trackintel write csv function allows writing non-trackintel format csv files #451

Open
henrymartin1 opened this issue Dec 1, 2022 · 5 comments

Comments

@henrymartin1
Copy link
Member

At the moment it is possible to write a .csv file that does not correspond to trackintel standards with the write_csv functions. I think this is a problem because then you can not open the file later using a read_csv function. Said differently, I think that every file that is written using a trackintel write_csv function should be readable using a trackintel read_csv function and this is not the case at the moment if the write_csv function is called without the accessor (e.g., ti.io.write_staypoints_csv).

I think the problem is that the dataframe is not checked before writing and an easy solution would be to include a call of the accessor before writing the dataframe in order to validate it.

Here is some sample code:

import trackintel as ti

from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
import datetime
p1 = Point(8.5067847, 47.4)
p2 = Point(8.5067847, 47.5)
p3 = Point(8.5067847, 47.6)

t1 = pd.Timestamp("1971-01-01 00:00:00", tz="utc")
t2 = pd.Timestamp("1971-01-01 05:00:00", tz="utc")
t3 = pd.Timestamp("1971-01-02 07:00:00", tz="utc")
one_hour = datetime.timedelta(hours=1)

list_dict = [
    {"user_id": 0, "started_at": t1, "finished_at": t2, "geom": p1},
    {"user_id": 0, "started_at": t2, "finished_at": t3, "geom": p2},
    {"user_id": 1, "started_at": t3, "finished_at": t3 + one_hour, "geom": p3},
]
sp = gpd.GeoDataFrame(data=list_dict, geometry="geom", crs="EPSG:4326")
sp.index.name = "id"

sp.rename(inplace=True, columns={"geom":"geometry"})
sp.drop(columns=['finished_at'], inplace=True)


ti.io.write_staypoints_csv(sp, "test2.csv")
sp2 = ti.io.read_staypoints_csv("test2.csv", geom_col="geometry")

This produces the following error:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\graphentropy\lib\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'finished_at'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\graphentropy\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-818f93779dfc>", line 1, in <module>
    sp2 = ti.io.read_staypoints_csv("test2.csv", geom_col="geometry")
  File "C:\ProgramData\Anaconda3\envs\graphentropy\lib\site-packages\trackintel\io\file.py", line 30, in wrapper
    return func(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\graphentropy\lib\site-packages\trackintel\io\file.py", line 293, in read_staypoints_csv
    df["finished_at"] = pd.to_datetime(df["finished_at"])
  File "C:\ProgramData\Anaconda3\envs\graphentropy\lib\site-packages\pandas\core\frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\ProgramData\Anaconda3\envs\graphentropy\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'finished_at'

@bifbof
Copy link
Collaborator

bifbof commented Dec 4, 2022

I also think this is a problem, but more of a problem with the way the library is set up.
We have two ways to access this function

ti.io.write_positionfixes(sp, "test2.csv")
sp.as_positionfixes.to_csv("test2.csv")

and in the latter the attributes are checked, which is the preferred way to access this function.
Now if we were to add a check to write_positionfixes, we would have this overhead twice.
But maybe this is not a big problem and we should add it anyway? What do you think?

@henrymartin1
Copy link
Member Author

Hm... its true that this is a problem in the architecture and I don't really see an easy way out. Maybe if there is an easy way to tell the write_positionfixes function where the user is coming from and whether or not we can skip the validation?
If you have no specific idea, I would just add it anyways.

@bifbof
Copy link
Collaborator

bifbof commented Dec 5, 2022

We could use inspect.currentframe().f_back (stackoverflow) to get the callers frame, but I have to say that is really hacky. :D
Has one of us ever tested how much work that check is? Else I'll check the difference and just add the extra check if it isn't too much.

@henrymartin1
Copy link
Member Author

The check runs assert obj.geometry.is_valid.all() which isn't great in terms of performance but I think we should just add the check to the write_csv functions and optimize performance later if necessary.

@hongyeehh
Copy link
Member

Is this solved with #490? @bifbof

@bifbof bifbof removed their assignment Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants