Code

The Reproducible Analysis is worth 40% of the Group Assessment.

The Reproducible Analysis must be written using Python in a Quarto Markdown Document (QMD file). You are free to draw on concepts and methods covered in both Quantitative Methods and GIS, but must still write the code in Python (e.g. adapting something from R in the GIS module to Python).

Preparing Your Submission

You are expected to use sustainable authorship tools for this submission. You may be asked to provide evidence of this. A template will be provided, and you should also look at the Quarto Guide and, in particular, the PDF options in order to customise your project.

You are strongly advised to develop and maintain the submission in GitHub so that we can review contributions if necessary. In the absence of a GitHub commit history or everything being committed by only one member of the group we will be unable to determine individual contributions when marking the submission as a whole.

How We Measure Reproducibility

You will be submitting a runnable markdown document (.qmd file) that we will run on our own computers in order to generate a PDF output. If you have made use of one or more libraries that are not part of the Docker image then you can install these using ! pip install; however, if you take this approach then you should also ‘place nice’ by checking first to see if the library is already installed using try... except code that you can find on Stack Overflow and elsewhere (you will need to look this up).

Data and Resources Used

It is also up to you to ensure that all relevant data are available via a valid URL for downloading and running. You may host your data anywhere you like, but please bear in mind that the markers will be based in the U.K. so some servers may be inaccessible. For very small data sets we’d recommend a GitHub repo, but for larger ones a Dropbox or OneDrive link would be more appropriate (you will need to check that the link you’ve created gives permission to anyone to download).

For Large/Long Workflows

If your analysis has a particularly time-consuming stage (e.g. Named-Entity Recognition or Part-of-Speech tagging) then you can provide partially-processed data via a download in the QMD file: comment out the code up to the point where you have generated the ‘expensive’ data set but leave it in the markdown document. That way we can see how you generated the data without it being part of the reproducibility stage.