‘Geospatial Pandas’ provides a set of tools for working with geo-data in a pandas-like way. See: GeoPandas.org and Read the Docs.
The Python Spatial Analysis Library provides an array of tools for performing spatial analysis. Now uses geo-pandas as default data structure. See: PySAL.org and Read the Docs.
A newer library that builds on Geopandas to offer movement analysis tools.
Supported file formats:
Type | Extension | Notes |
---|---|---|
Shape | .shp (etc.) |
Maximum compatibility |
GeoPackage | .gpkg |
Good default choice |
GeoJSON | .geojson |
For web mapping |
Zip | .zip |
For use with Shapefiles |
WKT | .txt |
Plain-text & SQL |
GeoParquet | .geoparquet |
Good for large data sets & SQL |
Additionally, it is possible to read only subsets of the data using row, column, geometry, and bbox filters.
Again, depending on file size you may want to save these locally, but…
Write any OGR-supported vector drivers.
GeoPandas does all this by adding just two new classes:
In principle, a GeoSeries can contain multiple geo-data types, but in practice you’ll want to be one of the following shapely
classes:
Recall that we can ask if a particular object is an instance of any given class:
import pandas as pd
print(isinstance(world, str))
print(isinstance(world, pd.DataFrame))
print(isinstance(world, gpd.GeoDataFrame))
Prints: False
, True
, True
.
import pandas as pd
print(isinstance(world.geometry, str))
print(isinstance(world.geometry, pd.Series))
print(isinstance(world.geometry, gpd.GeoSeries))
Also prints: False
, True
, True
.
Depending on your data source, you may or may not have projection information attached to your GeoDataFrame:
outputs epsg:4326
, but:
outputs:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
You’ll have already covered this in GIS, but you can find nearly any EPSG you might need at epsg.io. By far the most commonly-used here are:
Note: recall that large territories (such as Canada, China and Russia) may well have multiple projections at the state of provincial level.
For data sets without projection information (i.e. often anything loaded from a shapefile) you must gdf.set_crs(<spec>)
. For all others you should gdf.to_crs(<spec>)
.
We can use GeoSeries’ spatial index directly to perform simple spatial queries:
A GeoSeries has attributes like any other Series, but also includes some spatially-specifc ones:
area
— if a polygonbounds
— for each featuretotal_bounds
— for each GeoSeriesgeom_type
— if you don’t already knowis_valid
— if you need to testAdditional GeoSeries methods icnlude:
distance()
— returns Series
measuring distances to some other feature (called as: <GeoSeries>.distance(<feature>)
)centroid
— returns GeoSeries
of strict centroids (called as: <GeoSeries>.centroid
)representative_point()
— returns GeoSeries
of points within featuresto_crs()
and plot()
, which you’ve already seen.Simple geographical tests:
geom_almost_equals()
— tries to deal with rounding issues when comparing two features.contains()
— is shape contained within some other features.intersects()
— does shape intersect some other features.Lat/Long and Northing/Easting benefit from a helper function gpd.points_from_xy()
:
Other feature types need to be in some kind of regular format such as Well-Known Text (WKT), GeoJSON, or something readable as a Shapely geometry.
from shapely import wkt
# Notice coordinate pairs and last point is same as first one
bbox = 'POLYGON((5000000.0 2500000.0, 5000000.0 -2500000.0, -5000000.0 -2500000.0, -5000000.0 2500000.0, 5000000.0 2500000.0))'
# Create GeoPandas from dict just like Pandas
bgdf = gpd.GeoDataFrame({'id':[0], 'coordinates':bbox})
# Turn it into a geometry
bgdf['geometry'] = bgdf.coordinates.apply(wkt.loads)
bgdf = bgdf.set_crs('ESRI:54030')
bgdf.plot() # Not very interesting but...
Geopandas • Jon Reades