Dictionaries

In this lesson, we’ll continue our exploration of more advanced data structures. Last time we took a peek at a way to represent ordered collections of items via lists. This time we’ll use dictionaries to create collections of ordered items accessed by name, instead of position.

Watch Out for Pythons

Python is unusual amongst programming languages in maintaining order in dictionaries. Most other languages make no such promise: if you use a ‘hash’ in Perl or ‘map’ in Java, then items are stored in any order the programming language sees fit to optimise the speed of retrieval.

So even though asking for a dictionary’s keys or values will return a list in insertion order, we’d recommend never assuming that key/value data structures have any kind of reliable order as it will save you a lot of pain later.

According to the Official Docs:

It is best to think of a dictionary as a set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}.

Creating

In other words, dictionaries are not lists: instead of just a checklist, we now have a key and a value. We use the key to find the value. So a generic dictionary looks like this:

theDictionary = {
    key1: value1,
    key2: value2,
    key3: value3,
    ...
}

Each key/value pair is linked by a ‘:’, and each pair is separated from other pairs by a ‘,’. It doesn’t really matter if you put everything on newlines (as we do here) or all on the same line. We’re just doing it this way to make it easier to read.

Here’s a more useful implementation of a dictionary:

myDict = {
    "key1": "Value 1",
    3: "3rd Value",
    "key2": "2nd Value",
    "Fourth Key": [4.0, 'Jon']
}
print(myDict)
{'key1': 'Value 1', 3: '3rd Value', 'key2': '2nd Value', 'Fourth Key': [4.0, 'Jon']}

Notice that almost any type of data can go into a dictionary: strings, integers, and floats. There’s even a list in this dictionary ([4.0, 'Jon'])! The only constraint is that the key must be immutable; this means that it is a simple, static identifier and that can’t change. So this will result in an error:

This doesn’t work because you can’t use a list (["key1",1]) as a dictionary key, though as you saw above you can use a list as a value. For more on the subject of (im)mutability check out this Stack Overflow answer ).

Accessing

Like lists, we access an element in a dictionary using a ‘location’ marked out by a pair of square brackets ([…]). The difference is that the index is no longer an integer indicating the position of the item that we want to access, but is a key in the key:value pair:

print(myDict["key1"])
print(myDict["Fourth Key"])
Value 1
[4.0, 'Jon']

Notice how now we just jump straight to the item we want? We don’t need to think about “Was that the fourth item on the list? Or the fifth?” We just use a sensible key, and we can ask for the associated value directly.

A challenge for you!

How would you print out 2nd Value from myDict?

myDict = {
    "key1": "Value 1",
    3: "3rd Value",
    "key2": "2nd Value",
    "Fourth Key": [4.0, 'Jon']
}

print(myDict["key2"])
2nd Value

When it comes to error messages, dicts and lists behave in similar ways. If you try to access a dictionary using a key that doesn’t exist then Python raises an exception.

What is the name of the exception generated by the following piece of code? Can you find it the Official Docs?

Handy, no? Again, Python’s error messages are giving you helpful clues about where the problem it’s encountering might be! Up above we had a TypeError when we tried to create a key using a list. Here, we have a KeyError that tells us something must be wrong with using 99 as a key in myDict. In this case, it’s that there is no key 99!

A challenge for you

We already found out that we can easily convert Python variables between different types. A dictionary is also a ‘type’, so we can convert lists to dictionaries just like we convert strings into integers. However, since we need to pair the elements in each list, we need a function called zip to ‘zip together’ the key-value pairs.

Can you turn these lists into dictionary called capitalDict?

country = ['Costa Rica','Croatia','Cuba'] #keys
capital_city = ['San Jose','Zagreb','Havana'] #values

capitalDict = dict(zip(country,capital_city))

How would you print out the capital city of Croatia from capitalDict?

print(capitalDict['Croatia'])
Zagreb

A Simple Phone Book

One of the simplest uses of a dictionary is as a phone book! (If you’re not sure what a phone book is here’s a handy guide and here’s an example of someone using one).

So here are some useful contact numbers: 1. American Emergency Number: 911 2. British Emergency Number: 999 3. Icelandic Emergency Number: 112 4. French Emergency Number: 112 5. Russian Emergency Number: 102

Now, how would you create a dictionary that allowed us to look up and print out an emergency phone number based on the two-character ISO country code? It’s going to look a little like this:

eNumbers = {
    "IS": '112', # It's not very important here whether we use single- or double-quotes
    "US": '911'
}
print("The Icelandic emergency number is " + eNumbers['IS'])
print("The American emergency number is " + eNumbers['US']) 
The Icelandic emergency number is 112
The American emergency number is 911

Useful Dictionary Methods

We are going to see in the next couple of lessons how to systematically access values in a dictionary (amongst other things). For now, let’s also take in the fact the dictionaries also have utility methods similar to what we saw with the list. And as with the list, these methods are functions that only make sense when you’re working with a dictionary, so they’re bundled up in a way that makes them easy to use.

Let’s say that you have forgotten what keys you put in your dictionary…

programmers = {
    "Charles": "Babbage",
    "Ada": "Lovelace",
    "Alan": "Turing"
}

print(programmers.keys())
dict_keys(['Charles', 'Ada', 'Alan'])

Or maybe you just need to access all of the values without troubling to ask for each key:

print(programmers.values())
dict_values(['Babbage', 'Lovelace', 'Turing'])

Or maybe you even need to get them as pairs:

print(programmers.items())
dict_items([('Charles', 'Babbage'), ('Ada', 'Lovelace'), ('Alan', 'Turing')])

A challenge for you

Can you print out all the values of capitalDict from the previous challenge?

print(capitalDict.values())
dict_values(['San Jose', 'Zagreb', 'Havana'])

Are You On the List? (Part 2)

As with the list data type, you can check the presence or absence of a key in a dictionary, using the in / not in operators… but note that they only work on keys.

print("Charles" in programmers)
print("Babbage" in programmers)
print(True  not in programmers)
True
False
True

What Do You Do if You’re Not On the List?

One challenge with dictionaries is that sometimes we have no real idea if a key exists or not. With a list, it’s pretty easy to figure out whether or not an index exists because we can just ask Python to tell us the length of the list. So that makes it fairly easy to avoid having the list ‘blow up’ by throwing an exception.

It’s rather harder for a dictionary though, so that’s why we have the dedicated get() method: it not only allows us to fetch the value associated with a key, it also allows us to specify a default value in case the key does not exist:

print(programmers.get("Lady Ada", "Are you sure you spelled that right?") )
Are you sure you spelled that right?

See how this works: the key doesn’t exist, but unlike what happened when we asked for myDict[99] we don’t get an exception, we get the default value specified as the second input to the method get.

So you’ve learned two things here: that functions can take more than one input (this one takes both the key that we’re looking for, and value to return if Python can’t find the key); and that different types (or classes) of data have different methods (there’s no get for lists).

Lists of Lists, Dictionaries of Lists, Dictionaries of Dictionaries… Oh my!

OK, this is where it’s going to get a little weird but you’re also going to see how programming is a little like Lego: once you get the building blocks, you can make lots of cool/strange/useful contraptions from some pretty simple concepts.

Remember that a list or dictionary can store anything: so the first item in your list could itself be a list! For most people starting out on programming, this is the point where their brain starts hurting (it happened to us) and you might want to throw up your hands in frustration thinking “I’m never going to understand this!” But if you stick with it, you will.

And this is really the start of the power of computation.

A Data Set of City Attributes

Let’s start out with what some (annoying) people would call a ‘trivial’ example of how a list-of-lists (LoLs, though most people aren’t laughing) can be useful. Let’s think through what’s going on below: what happens if we write cityData[0]?

# Format: city, country, population, area (km^2)
cityData = [
    ['London','U.K.',8673713,1572],
    ['Paris','France',2229621,105],
    ['Washington, D.C.','U.S.A.',672228,177],
    ['Abuja','Nigeria',1235880,1769],
    ['Beijing','China',21700000,16411],
]

print(cityData[0])
['London', 'U.K.', 8673713, 1572]

So how would we access something inside the list returned from cityData[0]?

Why not try:

cityData[0][1]
'U.K.'

See if you can figure out how to retrieve and print the following from cityData:

  1. France
  2. 16411
  3. Washington, D.C.

Take the following as a starting point…

A challenge for you

Now can you retrieve and print the following from cityData:

  1. Nigeria
  2. 8673713
  3. 177
print(cityData[3][1])
print(cityData[0][2])
print(cityData[2][3])
Nigeria
8673713
177

A Phonebook+

So that’s an LoL (list-of-lists). Let’s extend this idea to what we’ll call Phonebook+ which will be a DoL (dictionary-of-lists). In other words, a phonebook that can do more than just give us phone numbers! We’re going to build on the emergency phonebook example above.

# American Emergency Number: 911
# British Emergency Number: 999
# Icelandic Emergency Number: 112
# French Emergency Number: 112
# Russian Emergency Number: 102
eNumbers = {
    'IS': ['Icelandic',112],
    'US': ['American',911],
    'FR': ['French',112],
    'RU': ['Russian',102],
    'UK': ['British',999]
}
print("The " + eNumbers['IS'][0] + " emergency number is " + str(eNumbers['IS'][1]))
print("The " + eNumbers['US'][0] + " emergency number is " + str(eNumbers['US'][1]))
print("The " + eNumbers['FR'][0] + " emergency number is " + str(eNumbers['FR'][1]))
The Icelandic emergency number is 112
The American emergency number is 911
The French emergency number is 112

A Challenge for you

See if you can create the rest of the eNumbers dictionary and then print out the Russian and British emergency numbers.

print("The " + eNumbers['RU'][0] + " emergency number is " + str(eNumbers['RU'][1]))
print("The " + eNumbers['UK'][0] + " emergency number is " + str(eNumbers['UK'][1]))
The Russian emergency number is 102
The British emergency number is 999

Dictionary-of-Dictionaries

OK, this is the last thing we’re going to throw at you today – getting your head around ‘nested’ lists and dictionaries is hard. Really hard. But it’s the all-important first step to thinking about data the way that computer ‘thinks’ about it. This is really abstract: something that you access by keys, which in turn gives you access to other keys… it’s got a name: recursion. And it’s probably one of the cleverest things about computing.

Here’s a bit of a complex DoD, combined with a DoL, and other nasties:

cityData2 = {
    'London' : {
        'population': 8673713,
        'area': 1572, 
        'location': [51.507222, -0.1275],
        'country': {
            'ISO2': 'UK',
            'Full': 'United Kingdom',
        },
    },
    'Paris' : {
        'population': 2229621,
        'area': 105.4,
        'location': [48.8567, 2.3508],
        'country': {
            'ISO2': 'FR',
            'Full': 'France',
        },
    }
}

Now look at the following code:

print(cityData2['Paris'])
print(cityData2['Paris']['country']['ISO2'])
print(cityData2['Paris']['location'][0])
{'population': 2229621, 'area': 105.4, 'location': [48.8567, 2.3508], 'country': {'ISO2': 'FR', 'Full': 'France'}}
FR
48.8567

Now, figure out how to print:

The population of Paris, the capital of France (FR), is 2229621.

print("The population of Paris, the capital of " + str(cityData2['Paris']['country']['Full']) + " " \
      + "(" + str(cityData2['Paris']['country']['ISO2']) + ") " + "is "+ str(cityData2['Paris']['population']) + ".")
The population of Paris, the capital of France (FR) is 2229621.

And now add It has a density of 21153.899 persons per square km.

Hint: to calculate density, divide population with area.

print("It has a density of " + str(cityData2['Paris']['population'] / cityData2['Paris']['area'] ))
It has a density of 21153.899430740035

And do the same for London.

# Note that we can tweak the formatting a bit: Python is smart 
# enough to understand that if you have a '+' on the end of a
# string and there next line is also a string then it'll 
# continue to concatenate the string...
print("The population of " + 'London' + ", the capital of " + 
      cityData2['London']['country']['Full'] + " (" + cityData2['London']['country']['ISO2'] + "), is " + 
      str(cityData2['London']['population']) + ". It has a density of " + 
      str(cityData2['London']['population']/cityData2['London']['area']) + " persons per square km")

# But a _better_ way to do this might be one in which we don't
# hard-code 'London' into the output -- by changing the variable
# 'c' to Paris we can change the output completely...
c  = 'Paris'
cd = cityData2[c]
print("The population of " + c + ", the capital of " + 
      cd['country']['Full'] + " (" + cd['country']['ISO2'] + "), is " + 
      str(cd['population']) + ". It has a density of " + 
      "{0:8.1f}".format(cd['population']/cd['area']) + " persons per square km")
The population of London, the capital of United Kingdom (UK), is 8673713. It has a density of 5517.629134860051 persons per square km
The population of Paris, the capital of France (FR), is 2229621. It has a density of  21153.9 persons per square km

Applied Geo-example

Let’s continue our trips around the world! This time though, we’ll do things better, and instead of using a simple URL, we are going to use a real-word geographic data type, that you can use on a web-map or in your favourite GIS software.

If you look down below at the KCL_position variable you’ll see that we’re assigning it a complex and scary data structure. Don’t be afraid! If you look closely enough you will notice that is just made out the “building blocks” that we’ve seen so far: floats, lists, strings..all wrapped comfortably in a cosy dictionary!

This is simply a formalised way to represent a geographic marker (a pin on the map!) in a format called GeoJSON. According to Lizy Diamond:

GeoJSON is an open and popular geographic data format commonly used in web applications. It is an extension of a format called JSON, which stands for JavaScript Object Notation. Basically, JSON is a table turned on its side. GeoJSON extends JSON by adding a section called “geometry” such that you can define coordinates for the particular object (point, line, polygon, multi-polygon, etc). A point in a GeoJSON file might look like this:

    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [
          -122.65335738658904,
          45.512083676585156
        ]
      },
      "properties": {
        "name": "Hungry Heart Cupcakes",
        "address": "1212 SE Hawthorne Boulevard",
        "website": "http://www.hungryheartcupcakes.com",
        "gluten free": "no"
      }
    }

GeoJSON files have to have both a "geometry" section and a "properties" section. The "geometry" section houses the geographic information of the feature (its location and type) and the "properties" section houses all of the descriptive information about the feature (like fields in an attribute table). Source

So to create a “web map”, we have to create a GeoJSON structure. In the code below there are two variables containing Longitude and Latitude coordinate positions for the UCL Quadrangle. So let’s see how this works…

# The two lines below import specified functions 
# from a coding library written by someone else. 
# When we install and import these functions we 
# gain coding 'superpowers' without having to write 
# any new code ourselves
import json
from ipyleaflet import Map, GeoJSON, basemaps

# The coordinates
# What format are they in? Does it seem appropriate?
# How would you convert them back to numbers?
longitude = -0.133956
latitude = 51.524542

# Set up the location attribute
Location = {
        "type": "Point",
        "coordinates": [longitude, latitude]
      }

# And set up the rest of the attributes for the web map
Position = {
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
        "marker-color": "#7e7e7e",
        "marker-size": "medium",
        "marker-symbol": "building",
        "name": "UCL Quadrangle"
      },
      "geometry": Location
    }
  ]
}

# OUTPUT
# -----------------------------------------------------------
# I'm justing using the "imported" module to print the output
# in a nice and formatted way
print(json.dumps(Position, indent=4))
{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {
                "marker-color": "#7e7e7e",
                "marker-size": "medium",
                "marker-symbol": "building",
                "name": "UCL Quadrangle"
            },
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -0.133956,
                    51.524542
                ]
            }
        }
    ]
}
# We can also show this in the web page directly 
# (it won't show up in the PDF version though)
m = Map(center = (51.51, -0.10), zoom=12, min_zoom=5, max_zoom=20, 
   basemap=basemaps.OpenTopoMap)
geo = GeoJSON(data=Position)
m.add_layer(geo)
m

And here we request a remote GeoJSON file (from url), convert to a dictionary, and place it in a map as a new layer.

import json
import random
import requests

from ipyleaflet import Map, GeoJSON

url = 'https://github.com/jupyter-widgets/ipyleaflet/raw/master/examples/europe_110.geo.json'
r   = requests.get(url)
d   = r.content.decode("utf-8")
j   = json.loads(d)

def random_color(feature):
    return {
        'color': 'black',
        'fillColor': random.choice(['red', 'yellow', 'green', 'orange']),
    }

m = Map(center=(50.6252978589571, 0.34580993652344), zoom=3)

geo_json = GeoJSON(
    data=j,
    style={
        'opacity': 1, 'dashArray': '9', 'fillOpacity': 0.1, 'weight': 1
    },
    hover_style={
        'color': 'white', 'dashArray': '0', 'fillOpacity': 0.5
    },
    style_callback=random_color
)
m.add_layer(geo_json)

m

As proof that behind this all is just a dictionary:

print(json.dumps(j, indent=4)[:1500] + '...')
{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {
                "scalerank": 1,
                "featurecla": "Admin-0 country",
                "labelrank": 6,
                "sovereignt": "Albania",
                "sov_a3": "ALB",
                "adm0_dif": 0,
                "level": 2,
                "type": "Sovereign country",
                "admin": "Albania",
                "adm0_a3": "ALB",
                "geou_dif": 0,
                "geounit": "Albania",
                "gu_a3": "ALB",
                "su_dif": 0,
                "subunit": "Albania",
                "su_a3": "ALB",
                "brk_diff": 0,
                "name": "Albania",
                "name_long": "Albania",
                "brk_a3": "ALB",
                "brk_name": "Albania",
                "brk_group": null,
                "abbrev": "Alb.",
                "postal": "AL",
                "formal_en": "Republic of Albania",
                "formal_fr": null,
                "note_adm0": null,
                "note_brk": null,
                "name_sort": "Albania",
                "name_alt": null,
                "mapcolor7": 1,
                "mapcolor8": 4,
                "mapcolor9": 1,
                "mapcolor13": 6,
                "pop_est": 3639453,
                "gdp_md_est": 21810,
                "pop_year": -99,
                "lastcensus": 2001,
                "gdp_year": -99,
    ...

After you’ve run the code, Python will have saved a file called my-first-marker.geojson in the folder where you are running the lesson. Try to upload it on this website (Geojson.io) and check it shows a marker somewhere in central London…

Further references:

General list or resources - Awesome list of resources - Python Docs - HitchHiker’s guide to Python - Learn Python the Hard Way - Lists - Learn Python the Hard Way - Dictionaries

Credits!

Contributors:

The following individuals have contributed to these teaching materials: - James Millington - Jon Reades - Michele Ferretti - Zahratu Shabrina

License

The content and structure of this teaching project itself is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, and the contributing source code is licensed under The MIT License.

Acknowledgements:

Supported by the Royal Geographical Society (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.

Potential Dependencies:

This lesson may depend on the following libraries: None