Spatial Data

Overview

This week we will be focussing on the use of the geopandas library for spatial data analysis and management through a focus on spatial data and its distribution(s). Geopandas will help to clarify how Object-Oriented design and inheritance processes work, while also allowing to interrogate and map the assigned data set(s). A critical concept that should be emerging here is that spatial and numerical data analyses are, fundamentally, just two different views of the same data.

Learning Outcomes
  1. You develop better judgement about interpreting and representing data.
  2. You understand how GeoPandas extends Pandas with spatial functionality.
  3. You build on material covered in Week 1-3, and 5 of CASA0005 to extend your understanding of mapping and spatial data.
  4. You develop better practices for (spatial) data exploration.

So we’re going to be looking at both how to work geo-data in Python and how to explore a real-world data set using Exploratory Data Analysis (EDA) and Exploratory Spatial Data Analysis (ESDA) approaches to mapping distributions, testing for NaNs, and so on.

Lectures

Come to class prepared to present/discuss:

Session Video Presentation
Mapping Video Slides
GeoPandas Video Slides
EDA Video Slides
ESDA Video Slides

Other Prep

  • Come to class prepared to present:
    • D’Ignazio and Klein (2020), chap. 6, The Numbers Don’t Speak for Themselves <URL>
    • Elwood and Wilson (2017) <URL>
  • Complete the short Moodle quiz associated with this week’s activities.
  • Student Dialogue: Mentimeter poll to be completed by the start of class.
Connections

The really crucial insight here is to see how we are constantly looking back to earlier concepts, data structures, and tools to assemble our data and data visualisations in a computational manner: caching local data, defining the corners of a map, zooming in/out, creating coordinate pairs from a tuple, etc. The hard part is that Python’s most-used way of presenting data visually (matplotlib) works utterly differently from ggplot: matplotlib is much more like a canvas on to which things are ‘painted’ by the computer and much less like a series of data transformations using a ‘grammer of graphics’. On the whole, ggplot is easier to learn and, often, can yield better-looking plots, whereas matplotlib is a bit like dropping through the looking glass into a world where everything can be changed and tweaked endlessly but it’s hard to get something that looks good without a lot more work.

Practical

In the practical we will continue to work with the InsideAirbnb data, here focussing on the second ‘class’ of data in the data set: geography. We will see how to use GoePandas and PySAL for (geo)visualisation and analysis.

Connections

The practical focusses on:

  • Creating/working with geo-data in Python.
  • Making maps with Python.
  • Exploring the data visually.

To access the practical:

  1. Preview on GitHub
  2. Download the Notebook

References

D’Ignazio, Catherine, and Lauren F. Klein. 2020. Data Feminism. MIT Press. https://bookbook.pubpub.org/data-feminism.
Elwood, S., and M. Wilson. 2017. “Critical GIS Pedagogies Beyond ‘Week 10: Ethics.” International Journal of Geographical Information Science 31 (10):2098–2116. https://doi.org/10.1080/13658816.2017.1334892.