Numeric Data
Overview
This week we will be introducing the use of the pandas library for data analysis and management through a focus on numeric data and its distribution(s). This marks a major shift from working with concepts (lists, dictionaries, functions, etc.) largely in isolation to encountering all of them together ‘in the wild’ as part of a full data science workflow. So we are moving from the acquisition of concepts to their integration in the same way that we will — over the course of these three sessions — be coming from data acquisition to data integration.
- An appreciation of how and why this module differs from (QM) CASA0007.
- The beginnings of a more integrative understanding of foundational computer science concepts and the practice(s) of data science.
- A basic understanding of data acquisition and manipulation in Python.
Lectures
Come to class prepared to present/discuss:
Session | Video | Presentation |
---|---|---|
Logic | Video | Slides |
Randomness | Video | Slides |
Data | Video | Slides |
Pandas | Video | Slides |
More on the Assessments | In class | Slides |
Other Prep
- Come to class prepared to present/discuss:
Two more readings about the impact of Airbnb on cities (Wachsmuth and Weisler 2018; Harris 2018) that you’re likely to find useful for developing your thinking for the Group Work and one by D’Ignazio and Klein (2020) to highlight the importance of thinking about what a data set captures… and what it excludes. You should almost never be claiming that your (social) data represents the ‘universe’ of behaviours or is somehow ‘complete’.
Practical
In this practical we will begin working with the InsideAirbnb data, which you will have briefly examined in CASA0005. This week we focus on the first ‘class’ of data in the data set: simple numeric columns. We will see how to use Pandas for (simple) visualisation and (the beginnings of) analysis. It is hoped that you will see how Pandas combines and builds on techniques that we’ve already seen: while Pandas is incredibly sophisticated, the underlying concepts have been covered in the preceding three weeks! At this point we will also begin to make use of Pandas functionality to subset and explore the data.
The practical focusses on:
- Seeing how Pandas is ‘just’ a sophisticated extension of what we’ve already done.
- Familiarising yourself with Pandas functionality.
- Performing basic data cleaning and exploration tasks (including visualisation).
- Selecting and aggregating data in pandas.
To access the practical: