Spatial Data

Overview

This week we will be focussing on the use of the geopandas library for spatial data analysis and management through a focus on spatial data and its distribution(s). Geopandas will help to clarify how Object-Oriented design and inheritance processes work, while also allowing to interrogate and map the assigned data set(s). A critical concept that should be emerging here is that spatial and numerical data analyses are, fundamentally, just two different views of the same data.

Learning Outcomes

You develop better judgement about interpreting and representing data.
You understand how GeoPandas extends Pandas with spatial functionality.
You build on material covered in Week 1-3, and 5 of CASA0005 to extend your understanding of mapping and spatial data.
You develop better practices for (spatial) data exploration.

So we’re going to be looking at both how to work geo-data in Python and how to explore a real-world data set using Exploratory Data Analysis (EDA) and Exploratory Spatial Data Analysis (ESDA) approaches to mapping distributions, testing for NaNs, and so on.

Preparatory Lectures

Come to class prepared to present/discuss:

Session	Video	Presentation
Mapping	Video	Slides
GeoPandas	Video	Slides
EDA	Video	Slides
ESDA	Video	Slides

Other Preparation

Readings

Come to class prepared to discuss the following readings:

Citation	Article	ChatGPT Summary
D’Ignazio and Klein (2020a) Ch.6	URL	N/A
Lu and Henning (2013)	URL	N/A
Bunday (n.d.)	URL	N/A
VanderPlas (2014)	URL	N/A

Study Guide

Thinking about Bunday (n.d.):

The professors in Bundy’s article seem to be searching for a data transformation that will reveal the “true” ranking of the students. How does this relate to the concept of a “data-generating process” discussed in Lu and Henning?
Bundy’s tale suggests that any data transformation can be used to justify a particular conclusion. How does this relate to D’Ignazio and Klein (2020b, Ch.4) and warnings about the potential for bias in data analysis? Are there specific examples that resonate?

Reflecting on Lu and Henning (2013):

Lu and Henning use the example of retail cashier salaries to illustrate the limitations of traditional population-based thinking. How does their example help us to understand how the concept of a “population” is used and potentially misused?
What are the implications of Lu and Henning’s argument for the use of data in policy-making, and how can we connect this back to D’Ignazio and Klein (2020b, Ch.4) as part of a larger debate around ‘the numbers’?

Considering VanderPlas (2014):

In light of the above, what can we learn from Jake’s analysis of cycling data in Seattle about exploratory data analysis?

Connections

We’re focussing this week on the links between the data you’re working with and the process you’re trying to study! You might (quite reasonably) assume that these line up nicely, but in the era of big data that isn’t the case. ‘Accidental’ data (Arribas-Bel 2014) such as smartcard, mobile, web traffic, etc. are only ever partial accounts of messy human reality, so we want you to think about the gap between what you have and what you want to study.

Practical

In the practical we will continue to work with the InsideAirbnb data, here focussing on the second ‘class’ of data in the data set: geography. We will see how to use GoePandas and PySAL for (geo)visualisation and analysis.

Connections

The practical focusses on:

Creating/working with geo-data in Python.
Making maps with Python.
Exploring the data visually.

To access the practical:

References

Arribas-Bel, Daniel. 2014. “Accidental, Open and Everywhere: Emerging Data Sources for the Understanding of Cities.” Applied Geography 49. Elsevier:45–53. https://doi.org/10.1016/j.apgeog.2013.09.012.

Bunday, B. D. n.d. “A Final Tale or You Can Prove Anything with Figures.” https://www.ucl.ac.uk/~ucahhwi/AFinalTale.pdf.

D’Ignazio, Catherine, and Lauren F. Klein. 2020b. “Data Feminism.” In. MIT Press. https://data-feminism.mitpress.mit.edu/.

———. 2020a. “Data Feminism.” In. MIT Press. https://data-feminism.mitpress.mit.edu/.

Lu, Yonggang, and Kevin SS Henning. 2013. “Are statisticians cold-blooded bosses? a new perspective on the ’old’ concept of statistical population.” Teaching Statistics 35 (1). Wiley Online Library:66–71. https://doi.org/10.1111/j.1467-9639.2012.00524.x.

VanderPlas, Jake. 2014. “Is Seattle Really Seeing an Uptick in Cycling?” http://jakevdp.github.io/blog/2014/06/10/is-seattle-really-seeing-an-uptick-in-cycling/.