Rationale
This is group work due Tuesday, 16 December 2025 @ 10:00 that you will undertake in a small group of no more than four students. The project is intended to resemble real-world data science ways of working: you will be part of a small team thrown together at short notice, you will need to figure out how to work effectively, and you will need to jointly produce an output in which you have confidence.
Applying Your Knowledge
The focus of this assessment is therefore the student’s ability to make use of concepts and methods covered in class as part of an analytical process to support decision-making by a non-expert. It is not necessary that you employ every technique covered in class. It is necessary that you justify your choices, results, and conclusions using relevant literature as needed.
So the assessment may be completed by drawing on the code written in the practicals and the judicious use of descriptive statistics (see, for instance, Housing and Inequality in London and The suburbanisation of poverty in British cities, 2004-16: extent, processes and nature) for examples of how much can be achieved in this way. However, it is likely that a better mark will be obtained by demonstrating the capacity to go beyond exactly what was covered in class by connecting concepts and demonstrating a deeper understanding of how to apply what has been learned across FSDS, QM, and GIS to the problem at hand.
Two-Part Submission
The submission will have two parts and both are evaluated as part of your overall grade:
- The runnable QMD (Quarto Markdown Document) file that produces the PDF submitted in the second part of the assessment. The QMD will be evaluated for its reproducibility and is worth 25% of your module grade. See the Code page for more guidance.
- The rendered PDF file output by your QMD file in the first part of the assessment. This second part allows us to focus on the content, regardless of whether or not there are issues with its reproducibility, and is worth 35% of your module grade. See the Contnent page for more guidance.
You can find both sets of guidance together in the Rubric.
Data
- Your base data set is: the InsideAirbnb 20250615 Listings.
- You will find reviews and calendar data for the same time period.
- You can also use any other relevant data available from the server, but this is not required.
Process
- Over Reading Week you should start a literature scan in order to identify relevant approaches, policies, and findings. The module bibliography might help get you started, but it is not exhaustive.
- Do not start trying to tackle the data analysis components of any of the questions until we have covered Pandas in Week 6.
- Do not start trying to work directly with Quarto until we have covered Text in Week 8. Use iPython to create a exploratory notebooks and then transfer only the necessary bits of code and commentary as you develop familiarity with Quarto.
- So from Week 9 you will want to be working primarily in Quarto to prepare the project submission, even as other members of the group/you continue to to use notebooks to test out ideas and undertake E(S)DA.
Please also see Resources for the templates and information about how to create and render the QMD file.