Foundations of Spatial Data Science – Running Multiple Containers

Find a PostGIS Image

Search Quay.io for an image:

On a modern M-chip Mac you need to use arm64 images,
On Windows it’s usually amd64

Tip

However, in most cases you don’t need to search for these explicitly because ‘builds’ for most images are completed for both.

Create a ‘Pod’

By default, containers can’t talk to each other for security reasons. We need to join them together in a ‘pod’ to allow this to happen.

This command creates a pod that exposes two ‘ports’ (8888 and 5432) to the wider world.

podman pod create -p 8888:8888 -p 5432:5432 myapp

Tip

This ‘maps’ port 8888 inside the pod to port 8888 outside the pod, but we could change it to -p 7777:8888 so that requests for 7777 from the outside world are ‘forwarded’ to 8888 inside the pod. That would allow copies of containers to run at the same time in different pods.

Attach PostGIS to the Pod

There’s a lot going on here that took quite some to figure out¹, but the key thing turned out to be the pg_hba.conf file which tells Postgres on which ports it can use.

podman run --rm -d --name postgres --pod myapp \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=test \
-e POSTGRES_DB=test \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v "${PWD}"/data/postgres:/var/lib/postgresql/data \
-v /tmp:/tmp \
-v "${PWD}"/data/postgres/pg_hba.conf:/var/lib/postgresql/data/pg_hba.conf \
quay.io/taolu/postgis:14-3.5-alpine

Attach SDS to the Pod

Only containers inside the Pod can talk to other containers in the pod. So for the SDS container to talk to PostGIS, they both need to be attached to the pod using myapp.

podman run --rm -d --name sds --pod=myapp \
-v "$(pwd):/home/jovyan/work" \
docker.io/jreades/sds:2025-amd \
start.sh \
jupyter lab --LabApp.password='' --ServerApp.password='' --NotebookApp.token=''

You can now connect using your browser: http://localhost:8888/

Tip

The rest of this short tutorial is all run on the SDS container using your browser as the interface. This is true even for bits about the command line interface: in Jupyter you pick File > New > Terminal.

Install psycopg2

If I haven’t had time to update the SDS container then you can do this on the SDS Terminal in your browser using the folllowing command:

pip install psycopg2`

This is because the sqlalchemy framework is already there but the psycopg2 driver for Postgres isn’t.

Load Data

Run Python

Now you will start a new Notebook (File > New > Notebook) and create code cells for each of the following sections of code.

# Connect to the database
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:test@localhost:5432/test')

# Insert data into a new table
import geopandas as gpd
gdf = gpd.read_file('work/data/src/TM_WORLD_BORDERS-0.3.gpkg')
gdf.to_postgis('world', engine)

Querying Data

# Get information about the database
insp = inspect(engine) 
insp.get_table_names()

# Get data out of the database
import geopandas as gpd

# Simple query
gdf = gpd.read_postgis('SELECT * FROM msoa', geom_col='geometry', con=engine)

# What's in the table
gdf.head(2)

Plotting the Data

Since PostGIS is geospatial and Python can ‘talk’ to PostGIS:

# Plotting the table
gdf.plot()

More Complex Queries

# Query the database
gdf = gpd.read_postgis("""
    SELECT * 
    FROM msoa 
    WHERE "MSOA21NM" 
    LIKE 'Waltham%%'
""", geom_col='geometry', con=con)
gdf.plot()

Querying the Data without Python

One final thing: if you run a Terminal on your computer (so not in the SDS terminal any more) you can directly query the data.

psql -h localhost -p 5432 -U postgres -d test

Tip

The ‘host’ computer is the only other machine that can access the pod.

SELECT * FROM world LIMIT 0;
SELECT "NAME", "ISO3", "POP2005", "REGION" FROM world LIMIT 5;
SELECT "NAME", "POP2005" FROM world WHERE AREA > 900000;

Running Multiple Containers

Using Pods to link it all together

Find a PostGIS Image

Create a ‘Pod’

Attach PostGIS to the Pod

Attach SDS to the Pod

Install psycopg2

Load Data

Querying Data

Plotting the Data

More Complex Queries

Querying the Data without Python