Code
The Reproducible Analysis is worth 25% of your module grade.
Reproducibilty Rubric
To underpin the briefing document (the PDF), and to ensure that your advice can be updated as new information emerges, you will provide a QMD file that demonstrates your ability to produce a transparent, reproducible, and well-documented analytical workflow. It should run from start to finish with minimal intervention and clearly show how your narrative, evidence, and visuals were generated.
Markers will assess the structure, clarity, and reproducibility of your analysis, not the persuasiveness of your argument (which is evaluated in the PDF briefing).
Marking Rubric
You’ll be marked on how well your code runs, how clearly it’s written, and how easy it is for someone else to reproduce your results.
We’ll open your .qmd file and test it in the sds2025 container to see if it runs ‘end-to-end’.
| Criterion | Weight | What we will look for |
|---|---|---|
| Reproducibility & environment | 35% | Can your file run smoothly in the provided container? Are dependencies and data handled properly? |
| Code quality & documentation | 30% | Is your code clean, commented, and easy to follow? |
| Analytical soundness | 25% | Are your methods appropriate and correctly implemented? |
| Sustainable Practices | 10% | Does your workflow make effective use of the relevant tools taught this term and their embeddedness in an open source/data context? |
Tips for Success
- Test your .qmd in a fresh container before submitting.
- Use relative paths and include code to download data automatically.
- Use comments, functions, and other ‘tricks’ so that we can follow your workflow.
- Commit regularly if using version control and make use of GitHub’s collaboration and sharing features.
- Give thought to different aspects of sustainability, including data access/persistence.
Additional Information
The Reproducible Analysis must be written using Python in a Quarto Markdown Document (QMD file). You are free to draw on concepts and methods covered in both Quantitative Methods and GIS, but must still write the code in Python (e.g. adapting something from R in the GIS module to Python).
Preparing Your Submission
You are expected to use sustainable authorship tools for this submission. You may be asked to provide evidence of this. A template will be provided, and you should also look at the Quarto Guide and, in particular, the PDF options in order to customise your project.
You are strongly advised to develop and maintain the submission in GitHub so that we can review contributions if necessary. In the absence of a GitHub commit history or everything being committed by only one member of the group we will be unable to determine individual contributions when marking the submission as a whole.
How We Measure Reproducibility
You will be submitting a runnable markdown document (.qmd file) that we will run on our own computers in order to generate a PDF output. If you have made use of one or more libraries that are not part of the Docker image then you can install these using ! pip install; however, if you take this approach then you should also ‘place nice’ by checking first to see if the library is already installed using try... except code that you can find on Stack Overflow and elsewhere (you will need to look this up).
Data and Resources Used
It is also up to you to ensure that all relevant data are available via a valid URL for downloading and running. You may host your data anywhere you like, but please bear in mind that the markers will be based in the U.K. so some servers may be inaccessible. For very small data sets we’d recommend a GitHub repo, but for larger ones a Dropbox or OneDrive link would be more appropriate (you will need to check that the link you’ve created gives permission to anyone to download).
For Large/Long Workflows
If your analysis has a particularly time-consuming stage (e.g. Named-Entity Recognition or Part-of-Speech tagging) then you can provide partially-processed data via a download in the QMD file: comment out the code up to the point where you have generated the ‘expensive’ data set but leave it in the markdown document. That way we can see how you generated the data without it being part of the reproducibility stage.