Rationale

This is a collaborative project (worth 60%) due Tuesday, 17 December 2024 @ 18:00 that you will undertake in a small group of no more than five students. The project is intended to resemble real-world data science ways of working: you will be part of a small team, you will need to figure out how to work effectively together, you will need to jointly produce an output in which you all have confidence. You will be submitting a reproducible analysis (written for Quarto+Python) that we will be able to run on our own computers in order to generate a PDF output.

The focus of this assessment is therefore the student’s ability to make use of concepts and methods covered in class as part of an analytical process to support decision-making. It is not necessary that you employ every technique covered in class. It is necessary that you justify your choice of approach with reference to relevant academic and ‘grey’ literature, as well as the computational, statistical, and analytical objectives of your submission. It is perfectly possible to obtain a distinction-level grade without the use of any advanced analytical techniques (e.g. clustering, NLP, or Random Forests); however, it is unlikely that you would be able to complete this assessment to a high standard without some graphs and some maps chosen for their ability to advance your argument.

So the assessment may be completed without substantially new modelling or coding by drawing on the code written in practicals to develop an analysis based on the judicious use of descriptive statistics (see, for instance, Housing and Inequality in London and The suburbanisation of poverty in British cities, 2004-16: extent, processes and nature), but it is likely that a better mark will be obtained by demonstrating the capacity to go beyond exactly what was covered in class.

The submission will have two parts and both are evaluated as part of your overall grade:

  1. A runnable QMD (Quarto Markdown Document) file that addresses the set questions provided in class. The QMD file allows us to evaluate reproducibility (40% of this assessment) by rendering your QMD file on our computer. So we are looking at whether outputs created by the group run fully and without error on a different computer, and whether they show evidence of thought in relation to the quality of coding and outputs.
  2. A rendered PDF file that is the output of your QMD file. The PDF file allows us to focus on your content (60% of this assessment), regardless of whether or not there are issues with its reproducibility. So we are looking at whether the answers outputs by the group engage with the set questions through a mix of literature, critical thinking, and data analysis.

Please see Resources for the templates and information about how to create and render the QMD file.