17.01.24 - GeoParquet: The Upcoming Open Geospatial Consortium Standard

#GeoParquet is an incubating Open Geospatial Consortium (OGC) standard that adds geospatial types (Point, Line, Polygon) to Apache Parquet, the latter being an open source, column-oriented data file format designed for efficient data storage and retrieval. GeoParquet is already supported by GeoPandas (Python), GDAL/OGR, QGIS, ArcGIS GeoAnalytics Engine, FME, etc.

While GeoParquet can be used for vector data only, other cloud optimized formats exist for raster (COG = Cloud Optimized GeoTIFF) and point cloud data (COPC = Cloud Optimized Point Cloud). In terms of deployment and usage, all these data formats rely on the same paradigm: data providers host GeoParquet/COG/COPC files on generic HTTP file servers; data consumers leverage HTTP range requests to fetch partial data within GeoParquet/COG/COPC files, without having to download them entirely.

As far as the (Geo)Parquet format is concerned, we wish to mention a very interesting coupling with DuckDB, another open source emerging technology which, among other features, allows to query local or remote Parquet files using SQL.

Are you already using any of these technologies? If yes, what's your feedback? If not, are you looking forward to it? Let us know in the comments here below 👇

« retour