= {
myDict "key1": "Value 1",
3: "3rd Value",
"key2": "2nd Value",
"Fourth Key": [4.0, 'Jon']
}print(myDict)
{'key1': 'Value 1', 3: '3rd Value', 'key2': '2nd Value', 'Fourth Key': [4.0, 'Jon']}
In this lesson, we’ll continue our exploration of more advanced data structures. Last time we took a peek at a way to represent ordered collections of items via lists. This time we’ll use dictionaries to create collections of ordered items accessed by name, instead of position.
Python is unusual amongst programming languages in maintaining order in dictionaries. Most other languages make no such promise: if you use a ‘hash’ in Perl or ‘map’ in Java, then items are stored in any order the programming language sees fit to optimise the speed of retrieval.
So even though asking for a dictionary’s keys or values will return a list in insertion order, we’d recommend never assuming that key/value data structures have any kind of reliable order as it will save you a lot of pain later.
According to the Official Docs:
It is best to think of a dictionary as a set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary:
{}
.
In other words, dictionaries are not lists: instead of just a checklist, we now have a key and a value. We use the key to find the value. So a generic dictionary looks like this:
Each key/value pair is linked by a ‘:’, and each pair is separated from other pairs by a ‘,’. It doesn’t really matter if you put everything on newlines (as we do here) or all on the same line. We’re just doing it this way to make it easier to read.
Here’s a more useful implementation of a dictionary:
myDict = {
"key1": "Value 1",
3: "3rd Value",
"key2": "2nd Value",
"Fourth Key": [4.0, 'Jon']
}
print(myDict)
{'key1': 'Value 1', 3: '3rd Value', 'key2': '2nd Value', 'Fourth Key': [4.0, 'Jon']}
Notice that almost any type of data can go into a dictionary: strings, integers, and floats. There’s even a list
in this dictionary ([4.0, 'Jon']
)! The only constraint is that the key must be immutable; this means that it is a simple, static identifier and that can’t change. So this will result in an error:
This doesn’t work because you can’t use a list (["key1",1]
) as a dictionary key, though as you saw above you can use a list as a value. For more on the subject of (im)mutability check out this Stack Overflow answer ).
Like lists, we access an element in a dictionary using a ‘location’ marked out by a pair of square brackets ([…]). The difference is that the index is no longer an integer indicating the position of the item that we want to access, but is a key in the key:value pair:
Notice how now we just jump straight to the item we want? We don’t need to think about “Was that the fourth item on the list? Or the fifth?” We just use a sensible key, and we can ask for the associated value directly.
How would you print out 2nd Value
from myDict
?
When it comes to error messages, dict
s and list
s behave in similar ways. If you try to access a dictionary using a key that doesn’t exist then Python raises an exception.
What is the name of the exception generated by the following piece of code? Can you find it the Official Docs?
Handy, no? Again, Python’s error messages are giving you helpful clues about where the problem it’s encountering might be! Up above we had a TypeError
when we tried to create a key using a list. Here, we have a KeyError
that tells us something must be wrong with using 99
as a key in myDict
. In this case, it’s that there is no key 99!
We already found out that we can easily convert Python variables between different types. A dictionary is also a ‘type’, so we can convert lists to dictionaries just like we convert strings into integers. However, since we need to pair the elements in each list, we need a function called zip
to ‘zip together’ the key-value pairs.
How would you print out the capital city of Croatia from capitalDict
?
One of the simplest uses of a dictionary is as a phone book! (If you’re not sure what a phone book is here’s a handy guide and here’s an example of someone using one).
So here are some useful contact numbers: 1. American Emergency Number: 911 2. British Emergency Number: 999 3. Icelandic Emergency Number: 112 4. French Emergency Number: 112 5. Russian Emergency Number: 102
Now, how would you create a dictionary that allowed us to look up and print out an emergency phone number based on the two-character ISO country code? It’s going to look a little like this:
eNumbers = {
"IS": '112', # It's not very important here whether we use single- or double-quotes
"US": '911'
}
print("The Icelandic emergency number is " + eNumbers['IS'])
print("The American emergency number is " + eNumbers['US'])
The Icelandic emergency number is 112
The American emergency number is 911
We are going to see in the next couple of lessons how to systematically access values in a dictionary (amongst other things). For now, let’s also take in the fact the dictionaries also have utility methods similar to what we saw with the list
. And as with the list, these methods are functions that only make sense when you’re working with a dictionary, so they’re bundled up in a way that makes them easy to use.
Let’s say that you have forgotten what keys you put in your dictionary…
programmers = {
"Charles": "Babbage",
"Ada": "Lovelace",
"Alan": "Turing"
}
print(programmers.keys())
dict_keys(['Charles', 'Ada', 'Alan'])
Or maybe you just need to access all of the values without troubling to ask for each key:
Or maybe you even need to get them as pairs:
dict_items([('Charles', 'Babbage'), ('Ada', 'Lovelace'), ('Alan', 'Turing')])
Can you print out all the values of capitalDict
from the previous challenge?
As with the list
data type, you can check the presence or absence of a key in a dictionary, using the in / not in operators… but note that they only work on keys.
One challenge with dictionaries is that sometimes we have no real idea if a key exists or not. With a list, it’s pretty easy to figure out whether or not an index exists because we can just ask Python to tell us the length of the list. So that makes it fairly easy to avoid having the list ‘blow up’ by throwing an exception.
It’s rather harder for a dictionary though, so that’s why we have the dedicated get()
method: it not only allows us to fetch the value associated with a key, it also allows us to specify a default value in case the key does not exist:
Are you sure you spelled that right?
See how this works: the key doesn’t exist, but unlike what happened when we asked for myDict[99]
we don’t get an exception, we get the default value specified as the second input to the method get
.
So you’ve learned two things here: that functions can take more than one input (this one takes both the key that we’re looking for, and value to return if Python can’t find the key); and that different types (or classes) of data have different methods (there’s no get
for lists).
OK, this is where it’s going to get a little weird but you’re also going to see how programming is a little like Lego: once you get the building blocks, you can make lots of cool/strange/useful contraptions from some pretty simple concepts.
Remember that a list or dictionary can store anything: so the first item in your list could itself be a list! For most people starting out on programming, this is the point where their brain starts hurting (it happened to us) and you might want to throw up your hands in frustration thinking “I’m never going to understand this!” But if you stick with it, you will.
And this is really the start of the power of computation.
Let’s start out with what some (annoying) people would call a ‘trivial’ example of how a list-of-lists (LoLs, though most people aren’t laughing) can be useful. Let’s think through what’s going on below: what happens if we write cityData[0]
?
# Format: city, country, population, area (km^2)
cityData = [
['London','U.K.',8673713,1572],
['Paris','France',2229621,105],
['Washington, D.C.','U.S.A.',672228,177],
['Abuja','Nigeria',1235880,1769],
['Beijing','China',21700000,16411],
]
print(cityData[0])
['London', 'U.K.', 8673713, 1572]
So how would we access something inside the list returned from cityData[0]
?
Why not try:
See if you can figure out how to retrieve and print the following from cityData
:
Take the following as a starting point…
Now can you retrieve and print the following from cityData
:
So that’s an LoL (list-of-lists). Let’s extend this idea to what we’ll call Phonebook+ which will be a DoL (dictionary-of-lists). In other words, a phonebook that can do more than just give us phone numbers! We’re going to build on the emergency phonebook example above.
# American Emergency Number: 911
# British Emergency Number: 999
# Icelandic Emergency Number: 112
# French Emergency Number: 112
# Russian Emergency Number: 102
eNumbers = {
'IS': ['Icelandic',112],
'US': ['American',911],
'FR': ['French',112],
'RU': ['Russian',102],
'UK': ['British',999]
}
print("The " + eNumbers['IS'][0] + " emergency number is " + str(eNumbers['IS'][1]))
print("The " + eNumbers['US'][0] + " emergency number is " + str(eNumbers['US'][1]))
print("The " + eNumbers['FR'][0] + " emergency number is " + str(eNumbers['FR'][1]))
The Icelandic emergency number is 112
The American emergency number is 911
The French emergency number is 112
See if you can create the rest of the eNumbers
dictionary and then print out the Russian and British emergency numbers.
OK, this is the last thing we’re going to throw at you today – getting your head around ‘nested’ lists and dictionaries is hard. Really hard. But it’s the all-important first step to thinking about data the way that computer ‘thinks’ about it. This is really abstract: something that you access by keys, which in turn gives you access to other keys… it’s got a name: recursion. And it’s probably one of the cleverest things about computing.
Here’s a bit of a complex DoD, combined with a DoL, and other nasties:
Now look at the following code:
print(cityData2['Paris'])
print(cityData2['Paris']['country']['ISO2'])
print(cityData2['Paris']['location'][0])
{'population': 2229621, 'area': 105.4, 'location': [48.8567, 2.3508], 'country': {'ISO2': 'FR', 'Full': 'France'}}
FR
48.8567
Now, figure out how to print:
The population of Paris, the capital of France (FR), is 2229621.
And now add It has a density of 21153.899 persons per square km.
Hint: to calculate density, divide population with area.
And do the same for London.
# Note that we can tweak the formatting a bit: Python is smart
# enough to understand that if you have a '+' on the end of a
# string and there next line is also a string then it'll
# continue to concatenate the string...
print("The population of " + 'London' + ", the capital of " +
cityData2['London']['country']['Full'] + " (" + cityData2['London']['country']['ISO2'] + "), is " +
str(cityData2['London']['population']) + ". It has a density of " +
str(cityData2['London']['population']/cityData2['London']['area']) + " persons per square km")
# But a _better_ way to do this might be one in which we don't
# hard-code 'London' into the output -- by changing the variable
# 'c' to Paris we can change the output completely...
c = 'Paris'
cd = cityData2[c]
print("The population of " + c + ", the capital of " +
cd['country']['Full'] + " (" + cd['country']['ISO2'] + "), is " +
str(cd['population']) + ". It has a density of " +
"{0:8.1f}".format(cd['population']/cd['area']) + " persons per square km")
The population of London, the capital of United Kingdom (UK), is 8673713. It has a density of 5517.629134860051 persons per square km
The population of Paris, the capital of France (FR), is 2229621. It has a density of 21153.9 persons per square km
Let’s continue our trips around the world! This time though, we’ll do things better, and instead of using a simple URL, we are going to use a real-word geographic data type, that you can use on a web-map or in your favourite GIS software.
If you look down below at the KCL_position
variable you’ll see that we’re assigning it a complex and scary data structure. Don’t be afraid! If you look closely enough you will notice that is just made out the “building blocks” that we’ve seen so far: floats
, lists
, strings
..all wrapped comfortably in a cosy dictionary
!
This is simply a formalised way to represent a geographic marker (a pin on the map!) in a format called GeoJSON
. According to Lizy Diamond:
GeoJSON is an open and popular geographic data format commonly used in web applications. It is an extension of a format called JSON, which stands for JavaScript Object Notation. Basically, JSON is a table turned on its side. GeoJSON extends JSON by adding a section called “geometry” such that you can define coordinates for the particular object (point, line, polygon, multi-polygon, etc). A point in a GeoJSON file might look like this:
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-122.65335738658904,
45.512083676585156
]
},
"properties": {
"name": "Hungry Heart Cupcakes",
"address": "1212 SE Hawthorne Boulevard",
"website": "http://www.hungryheartcupcakes.com",
"gluten free": "no"
}
}
GeoJSON files have to have both a
"geometry"
section and a"properties"
section. The"geometry"
section houses the geographic information of the feature (its location and type) and the"properties"
section houses all of the descriptive information about the feature (like fields in an attribute table). Source
So to create a “web map”, we have to create a GeoJSON
structure. In the code below there are two variables containing Longitude and Latitude coordinate positions for the UCL Quadrangle. So let’s see how this works…
# The two lines below import specified functions
# from a coding library written by someone else.
# When we install and import these functions we
# gain coding 'superpowers' without having to write
# any new code ourselves
import json
from ipyleaflet import Map, GeoJSON, basemaps
# The coordinates
# What format are they in? Does it seem appropriate?
# How would you convert them back to numbers?
longitude = -0.133956
latitude = 51.524542
# Set up the location attribute
Location = {
"type": "Point",
"coordinates": [longitude, latitude]
}
# And set up the rest of the attributes for the web map
Position = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"marker-color": "#7e7e7e",
"marker-size": "medium",
"marker-symbol": "building",
"name": "UCL Quadrangle"
},
"geometry": Location
}
]
}
# OUTPUT
# -----------------------------------------------------------
# I'm justing using the "imported" module to print the output
# in a nice and formatted way
print(json.dumps(Position, indent=4))
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"marker-color": "#7e7e7e",
"marker-size": "medium",
"marker-symbol": "building",
"name": "UCL Quadrangle"
},
"geometry": {
"type": "Point",
"coordinates": [
-0.133956,
51.524542
]
}
}
]
}
# We can also show this in the web page directly
# (it won't show up in the PDF version though)
m = Map(center = (51.51, -0.10), zoom=12, min_zoom=5, max_zoom=20,
basemap=basemaps.OpenTopoMap)
geo = GeoJSON(data=Position)
m.add_layer(geo)
m
And here we request a remote GeoJSON file (from url
), convert to a dictionary, and place it in a map as a new layer.
import json
import random
import requests
from ipyleaflet import Map, GeoJSON
url = 'https://github.com/jupyter-widgets/ipyleaflet/raw/master/examples/europe_110.geo.json'
r = requests.get(url)
d = r.content.decode("utf-8")
j = json.loads(d)
def random_color(feature):
return {
'color': 'black',
'fillColor': random.choice(['red', 'yellow', 'green', 'orange']),
}
m = Map(center=(50.6252978589571, 0.34580993652344), zoom=3)
geo_json = GeoJSON(
data=j,
style={
'opacity': 1, 'dashArray': '9', 'fillOpacity': 0.1, 'weight': 1
},
hover_style={
'color': 'white', 'dashArray': '0', 'fillOpacity': 0.5
},
style_callback=random_color
)
m.add_layer(geo_json)
m
As proof that behind this all is just a dictionary:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"scalerank": 1,
"featurecla": "Admin-0 country",
"labelrank": 6,
"sovereignt": "Albania",
"sov_a3": "ALB",
"adm0_dif": 0,
"level": 2,
"type": "Sovereign country",
"admin": "Albania",
"adm0_a3": "ALB",
"geou_dif": 0,
"geounit": "Albania",
"gu_a3": "ALB",
"su_dif": 0,
"subunit": "Albania",
"su_a3": "ALB",
"brk_diff": 0,
"name": "Albania",
"name_long": "Albania",
"brk_a3": "ALB",
"brk_name": "Albania",
"brk_group": null,
"abbrev": "Alb.",
"postal": "AL",
"formal_en": "Republic of Albania",
"formal_fr": null,
"note_adm0": null,
"note_brk": null,
"name_sort": "Albania",
"name_alt": null,
"mapcolor7": 1,
"mapcolor8": 4,
"mapcolor9": 1,
"mapcolor13": 6,
"pop_est": 3639453,
"gdp_md_est": 21810,
"pop_year": -99,
"lastcensus": 2001,
"gdp_year": -99,
...
After you’ve run the code, Python will have saved a file called my-first-marker.geojson
in the folder where you are running the lesson. Try to upload it on this website (Geojson.io) and check it shows a marker somewhere in central London…
General list or resources - Awesome list of resources - Python Docs - HitchHiker’s guide to Python - Learn Python the Hard Way - Lists - Learn Python the Hard Way - Dictionaries
The following individuals have contributed to these teaching materials: - James Millington - Jon Reades - Michele Ferretti - Zahratu Shabrina
The content and structure of this teaching project itself is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license, and the contributing source code is licensed under The MIT License.
Supported by the Royal Geographical Society (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.
This lesson may depend on the following libraries: None