Using Docker

Author

Jon Reades

Published

September 27, 2024

Understanding Docker

What is Virtualisation?

Docker is ‘virtualisation’ tool that allows you to run ‘virtual machines’ on your computer’s ‘host’ operating system. That’s a lot of new, probably meaningless words. If you’re one of those people who (understandably) likes to understand what’s going on then here’s how some people define it:

  1. Google on What is a virtual machine?
  2. VMWare on What is a virtual machine?
  3. Microsoft on What is a virtual machine (VM)?

Docker in a Nutshell

So in order to make use of Docker (and understand what’s happening when you get errors), it’s helpful to have some sense of what’s going on behind the scenes. You can click on the image below to make it larger, or you can download and print out a PDF version.

Sketch of Docker Usage

Sketch of Docker Usage

Here’s what’s happening:

Step 1. docker pull

You issue the docker pull jreades/sds:2024 command to your computer, which turns around and asks Docker Hub for a copy of this image. Docker Hub responds by transferring a copy of the jreades/sds:2024 image to your computer. You now have a file containing all the instructions to set up and run a virtual machine on your computer.1

Step 2, docker run

You issue the docker run ... jreades/sds:2024 ... command (which you’ll be running in a minute) from your computer, and this tells Docker to use the jreades/sds:2024 image as a template for creating a container called sds20242. sds2024 will do whatever it was told to do by its creator. This could be wait to run Python code, start up a database, serve web pages, the list is practically endless. But sds2024 is read-only, although you can make changes to the container while it’s running, as soon as you shut it down those changes are lost. So you cannot break a Docker image, only a container. And if you do that, you delete the container and start a new one from the image… we can cover this if you ever do it.

As part of the docker run command, we also told Docker what resources the container could access. There are two main types of resources for our purposes:

  • A mount point which is a part of your computer’s hard drive that Docker can use to write things down permanently. We use $(pwd), which is short-hand for print working directory and refers to the ‘place’ on your computer where we issued the docker run command. We tell Docker to connect this to a directory called work (which resides in /home/jovyan/) on the sds2024 container. This allows you to share data between the container and your computer, and for changes to be saved when you shut down Docker.
  • One or more ports which are like channels on a radio where the container can talk to other computers (including yours). In this case, we connect port 8888 on sds2024 to port 8888 on your computer. And that is why you have to tell your browser to go to localhost:8888 to access Jupyter Lab.

Step 4. Interacting with the Container

Now when you type things into the browser and tell code to ‘run’, what’s actually happening is that your computer is forwarding the request to the container, which does its thing, updates the web page, and this change is then forwarded back to you via the browser.

Step 5. Anatomy of docker run

In the next section you’ll see the full Docker run command, here we just want to focus on the most important options (each -X is an option) for most users:

  • -v: this specified the point on your hard drive that the sds2024 can use. By default we use $(pwd) which means ‘use the location where the docker run command was executed. You can also ’hard code’ this to something like /Users/<your username>/Documents/casa/fsds/ if you always want to use the same location.
  • -p: this specified the channel (or port) on which the web browser can talk to the sds2024.
  • jreades/sds:2024: this specified the image we wanted to use

Installing Docker

Essential Topic

This next section is essential to running the sds2024 environment.

Docker is a complex application doing very complex things. It’s not surprising that it can be a bit of a pain to install. But once installed, it’s a very powerful platform for ‘doing (spatial) data science’ that’s widely used in industry and, increasingly, academia.

One way to think of it as a ‘library’ of ready-made virtual computers that you can copy and use free-of-charge. If you’d like to know more about what Docker is and how it works, you can read more in the Understanding Docker section.

Windows Users

Please ensure that you have installed WSL2 before installing Docker! If you cannot install WSL2 then please have a look at the ‘dealing with errors’ section.

After you’ve downloaded Docker, you need to:

  1. Install it – usually this will mean opening the image and either dragging it your Application folder (Mac) or running the installer (Windows)
  2. Start it up – double-click the Docker icon in your Applications folder to start Docker running.
  3. Finish setup – once Docker is finished starting up, you should see the login screen below. You do not need to create an account (notice Continue without signing in)
  4. On all the subsequent questions you can Skip (upper-right corner) answering as well.

Docker trying to trick you into creating an account

Docker trying to trick you into creating an account

You must finish setting up before proceeding to the next step. You’ll know that you’re ready to move on when you see the ‘Docker Desktop’ window appear listing ‘downloaded images’ and ‘running containers’:

Docker Desktop

Docker Desktop

If you didn’t see this then you will need to have a look at the ‘dealing with errors’ section.

Testing Docker

To test if Docker is installed correctly, you will need either the Terminal (macOS) or the Power Shell (Windows).

Copy Code to Clipboard

Whenever you see a ‘code block’ below, you will also see a ‘clipboard’ icon (Copy to Clipboard Icon) in the right. Click that, and the code will be copied to your computer’s ‘clipboard’ so that you can then paste it into the Terminal or Power Shell. That will save you a lot of time and effort.

Docker has provided a simple way to test if your installation is working correctly. You can run the following command in the Terminal or Power Shell:

docker run hello-world

This should output something like:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
478afc919002: Pull complete
Digest: sha256:91fb4b041da273d5a3273b6d587d62d518300a6ad268b28628f74997b93171b2
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

If you see this message, then Docker is installed correctly and you can move on to the next step. If you see an error message, then you will need to have a look at the ‘dealing with errors’ section. Notice how there are several things happening:

  1. Unable to find image... locally – this is because the hello-world image is not (yet) stored on your computer.
  2. latest: Pulling from library/hello-world – this is Docker downloading the hello-world image from the internet.
  3. Pull complete – this is Docker telling you that it has finished downloading the image.
  4. Hello from Docker! – this is the hello-world image running and telling you that Docker is working correctly.

There’s obviously a lot more to that message, but that’s the basic idea.

Installing sds2024

We now need to download the sds2024 image that we use for teaching Foundations and Quantitative Methods. The image is the ‘template’ for running virtual machines (i.e. computers) on our ‘host’ computer and it comes complete with all of the Python libraries and other tools that you’ll need to complete the module (and a good deal more besides!). Installing the image will take a while as it’s quite large (2-5GB) and so will depend on the speed of your internet connection.

If your Apple computer has an Intel chipset (see:  -> About this Mac -> Processor) or is running Windows:

docker pull jreades/sds:2024-intel

If your Apple computer has an M1 or M2 ‘Silicon’ chipset (see:  -> About this Mac -> Processor):

docker pull jreades/sds:2024-silicon

Using jreades/sds:2024

Now that you have the image downloaded, you can start an sds2024 container by copying the following command into the Terminal or Power Shell (this does not work in the Command Prompt/cmd):

docker run --rm -d --name sds2024 -p 8888:8888 \
   -v "$(pwd):/home/jovyan/work" \
  jreades/sds:2024-intel start.sh jupyter lab \
  --LabApp.password='' --ServerApp.password='' --NotebookApp.token=''
docker run --rm -d --name sds2024 -p 8888:8888 \
   -v "$(pwd):/home/jovyan/work" \
  jreades/sds:2024-silicon start.sh jupyter lab \
  --LabApp.password='' --ServerApp.password='' --NotebookApp.token=''
docker run --rm -d --name sds2024 -p 8888:8888 -v "$(pwd):/home/jovyan/work" jreades/sds:2024-intel start.sh jupyter lab --LabApp.password='' --ServerApp.password='' --NotebookApp.token=''

Success!

However, most of you should now be able to connect to the virtual machine by pointing your browser at: localhost:8888 where you should see something like this:

Jupyter Lab Success

Jupyter Lab Success

Dealing with Problems

For more help with error messages or other challenges in installing and configuring Docker, see the Problems page.

Advanced Topics

For more about how to get the most out of Docker, see the Advanced Topics page.

Footnotes

  1. A virtual machine is just a computer that runs on your computer. So it ‘borrows’ resources like hard drive space, memory, and processor in order to behave like an independent computer that you can interact with in various ways.↩︎

  2. A container is the name Docker uses to refer to a running virtual machine. The image on its own does nothing until you tell docker to run it, at which point it becomes a container!↩︎