Spatial Data
Overview
This week we will be focussing on the use of the geopandas library for spatial data analysis and management through a focus on spatial data and its distribution(s). Geopandas will help to clarify how Object-Oriented design and inheritance processes work, while also allowing to interrogate and map the assigned data set(s). A critical concept that should be emerging here is that spatial and numerical data analyses are, fundamentally, just two different views of the same data.
- You develop better judgement about interpreting and representing data.
- You understand how GeoPandas extends Pandas with spatial functionality.
- You build on material covered in Week 1-3, and 5 of CASA0005 to extend your understanding of mapping and spatial data.
- You develop better practices for (spatial) data exploration.
So we’re going to be looking at both how to work geo-data in Python and how to explore a real-world data set using Exploratory Data Analysis (EDA) and Exploratory Spatial Data Analysis (ESDA) approaches to mapping distributions, testing for NaNs, and so on.
Preparatory Lectures
Come to class prepared to present/discuss:
Session | Video | Presentation |
---|---|---|
Mapping | Video | Slides |
GeoPandas | Video | Slides |
EDA | Video | Slides |
ESDA | Video | Slides |
Other Preparation
Readings
Come to class prepared to discuss the following readings:
Citation | Article | ChatGPT Summary |
---|---|---|
D’Ignazio and Klein (2020a) Ch.6 | URL | N/A |
Lu and Henning (2013) | URL | N/A |
Bunday (n.d.) | URL | N/A |
VanderPlas (2014) | URL | N/A |
Study Guide
Thinking about Bunday (n.d.):
- The professors in Bundy’s article seem to be searching for a data transformation that will reveal the “true” ranking of the students. How does this relate to the concept of a “data-generating process” discussed in Lu and Henning?
- Bundy’s tale suggests that any data transformation can be used to justify a particular conclusion. How does this relate to D’Ignazio and Klein (2020b, Ch.4) and warnings about the potential for bias in data analysis? Are there specific examples that resonate?
Reflecting on Lu and Henning (2013):
- Lu and Henning use the example of retail cashier salaries to illustrate the limitations of traditional population-based thinking. How does their example help us to understand how the concept of a “population” is used and potentially misused?
- What are the implications of Lu and Henning’s argument for the use of data in policy-making, and how can we connect this back to D’Ignazio and Klein (2020b, Ch.4) as part of a larger debate around ‘the numbers’?
Considering VanderPlas (2014):
- In light of the above, what can we learn from Jake’s analysis of cycling data in Seattle about exploratory data analysis?
We’re focussing this week on the links between the data you’re working with and the process you’re trying to study! You might (quite reasonably) assume that these line up nicely, but in the era of big data that isn’t the case. ‘Accidental’ data (Arribas-Bel 2014) such as smartcard, mobile, web traffic, etc. are only ever partial accounts of messy human reality, so we want you to think about the gap between what you have and what you want to study.
Practical
In the practical we will continue to work with the InsideAirbnb data, here focussing on the second ‘class’ of data in the data set: geography. We will see how to use GoePandas and PySAL for (geo)visualisation and analysis.
The practical focusses on:
- Creating/working with geo-data in Python.
- Making maps with Python.
- Exploring the data visually.
To access the practical: