-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans to migrate from GeoPackage to GeoParquet #290
Comments
The python team supports this decision emphatically. I just recommend to plan the transition carefully given that the geoparquet specs are not stable yet. Their current documentation expects stability at version v1.0.0, but they are still at version v0.3.0. (see text below)
|
For the record, GeoParquet v1.0.0 (stable) has now been released. In order to implement GeoParquet in geobr, we still need to investigate the best approaches / packages to read geoparquet into R and Python. Because this is all very recent, it might take a few months before we have stable R and Python packages to do this. |
Context
All data sets used in geobr are currently stored in the format of GeoPackage
.gpkg
files. The choice for GeoPackage was an easy one. GeoPackage is a very robust, open standard and compact format for geospatial data. A key aspect here is that.gpkg
files are platform-independent, so we can make sure that geobr data is consistent for bothR
andPython
users.Nonetheless, we are seeing major advances with the development of GeoParquet, a new data format to store geospatial vector data (point, lines, polygons). GeoParquet is built on top of Apache Parquet, a popular columnar storage format for tabular data. It is much (much!) more efficent than GeoPackage in terms of file storage as well as in terms speed to read and save files. I believe it's safe to say that GeoParquet has a bright future in the geospatial industry because of its flexibility and efficiency.
What to expect:
I would like to migrate all data sets available in geobr from GeoPackage to GeoParquet
.parquet
format in geobr v2.0. This should be done in 2023. I need some time fix some issues in geobr and it would be good to wait a little longer to see GeoParquet become a stable specification with more robust and stable packages to manipulate GeoParquet in R and Python.How will this affect geobr users?
How will this affect geobr developers?
There are already libraries that can read GeoParquet files in both
R
andPython
(see below). geobr v2.0 will need to include just a couple more package dependencies to be able to read geospatial data in.parquet
format. In practice, this should have minimum effects on code development.The text was updated successfully, but these errors were encountered: