Rationale

This is a collaborative project (worth 60%) due Tuesday, 19 December @ 18:00 that you will undertake in a small group of no more than five students. The project is intended to resemble real-world data science ways of working: you will be part of a small team, you will need to figure out how to work effectively together, you will need to jointly produce an output in which you all have confidence. You will be submitting a reproducible analysis (written for Quarto+Python) that we will be able to run on our own computers in order to generate a PDF output.

The focus of this assessment is the student’s ability to make use of concepts and methods covered in class as part of an analytical process to support decision-making in a non-academic context. It is not necessary that you employ every technique covered in class. It is necessary that you justify your choice of approach with reference to relevant academic and ‘grey’ literature, as well as the computational, statistical, and analytical objectives of your submission. It is perfectly possible to complete this assessment without the use of advanced analytical topics (e.g. clustering, NLP, or global/local/LISA autocorrelation methods); however, it is unlikely that you would be able to complete this assessment to a high standard without some graphs and some maps chosen for their ability to advance your argument.

The assessment may be completed without substantially new modelling or coding by drawing on the code written in practicals to develop an analysis based on the judicious use of descriptive statistics (see, for instance, Housing and Inequality in London and The suburbanisation of poverty in British cities, 2004-16: extent, processes and nature), but it is likely that a better mark will be obtained by demonstrating the capacity to go beyond exactly what was taught by selectively deploying more advanced programming techniques.

Group Disputes

In the event that there is irreconcilable disagreement within a group, we will use GitHub to determine contributions and inform individual marks.

The reproducible analysis must be a runnable QMD (Quarto Markdown Document) file that addresses the set questions provided in class. The QMD file will be assessed on two components:

  1. Its reproducibility (40% of this assessment): do the analyses employed, and outputs created by the group run fully and without errors on a different computer, and do they show evidence of thought in relation to the quality of coding and outputs?
  2. Its content (60% of this assessment): do the answers written by the group engage through a mix of literature, critical thinking, and data analysis with the set questions?

Supporting Documents

  • A template should be used, though you are free to modify this template as needed. You can see both PDF and HTML output, but please only submit the PDF!