Setting Up
Overview
In the first week we will be focussing on the supporting infrastructure for ‘doing data science’. That is to say, we’ll be dealing with the installation and configuration of tools such as GitHub and Docker which support replicable, shareable, and document-able data science. As a (free) bonus, the use of these tools also protects you against catastrophic (or the merely irritating) data loss thanks to over-zealous editing of code or content. You should see this as preparing the foundation not only for your remaining CASA modules (especially those in Term 2) but also for your post-MSc career.
- A basic understanding of the data science ‘pipeline’.
- An understanding of how data scientists use a wide range of ‘tools’ to do data science.
- A completed installation/configuration of these tools.
If you missed the Induction Week ‘install fest’, please now complete as many of these activities as you can:
- Go through the computer health check.
- Install the base utilities.
- Install the programming environment.
The last of these is the stage where you’re most likely to encounter problems that will need our assistance, so knowing that you need our help in Week 1 means that you can ask for it much sooner in the practical!
Readings
Please make time to read:
Citation | Article |
---|---|
Arribas-Bel and Reades (2018) | URL |
Study Guide
The following questions will help guide your reading and prepare you for class discussions:
- Drawing on Arribas-Bel and Reades (2018), compare and contrast GIS, Geocomputation, and Geographical Data Science (GDS):
- What are their core focuses and methodological approaches?
- How do they differ in their relationship to technological change?
- What are the unique contributions of GDS in the context of “big data” and the rise of data science?
- Still drawing on Arribas-Bel and Reades (2018), consider the role of technological determinism in the evolution of geographical thought:
- Do technological advancements determine the direction of geographical inquiry?
- How do the authors characterize the relationship between technological change and the development of geographical thought?
- What evidence do they provide to support their view?
In-Person Lectures
In this week’s workshop we will review the module aims, learning outcomes, and expectations with a general introduction to the course.
Session | Video | Presentation |
---|---|---|
Getting Started | In Class | Slides |
Computers in Urban Studies | In Class | Slides |
Principles | In Class | Slides |
Tools of the Trade | In Class | Slides |
Practical
This week’s practical is focussed on getting you set up with the tools and accounts that you’ll need to across many of the CASA modules in Terms 1 and 2, and familiarising you with ‘how people do data science’. Outside of academia, it’s rare to find a data scientist who works entirely on their own: most code is collaborative, as is most analysis! But collaborating effectively requires tools that: get out of the way of doing ‘stuff’; support teams in negotating conflicts in code; make it easy to share results; and make it easy to ensure that everyone is ‘on the same page’.
The practical focusses on:
- Getting you up and running with the coding and collaboration tools.
- Providing you with hands-on experience of using these tools.
- Configuring your programming environment for the rest of the programme.
To save a copy of notebook to your own GitHub Repo: follow the GitHub link, click on Raw
and then Save File As...
to save it to your own computer. Make sure to change the extension from .ipynb.txt
(which will probably be the default) to .ipynb
before adding the file to your GitHub repository.
To access the practical: