Data Structures

Jon Reades

It’s a very deep rabbit hole…

cities = {
  'London': [[51.5072, 0.1275], +0], 
  'New York': [[40.7127, 74.0059], -5], 
  'Tokyo': [[35.6833, 139.6833], +8]
}

So:

print(cities['London'][0]) # Prints [51.5072, 0.1275]

But Compare…

Consider how these two data structures differ:

cities = [
  {'name': 'London', 'loc': [51.5072, 0.1275], 'tz': +0}, 
  {'name': 'New York', 'loc': [40.7127, 74.0059], 'tz': -5}, 
  {'name': 'Tokyo', 'loc': [35.6833, 139.6833], 'tz': +8}
]

Or:

cities = {
  'London': {'loc': [51.5072, 0.1275], 'tz': +0}, 
  'New York': {'loc': [40.7127, 74.0059], 'tz': -5}, 
  'Tokyo': {'loc': [35.6833, 139.6833], 'tz': +8}
}

Implications

So we can mix and match dictionaries and lists in whatever way we need to store… ‘data’. The question is then: what’s the right way to store our data?

One more thing…

But Compare…

How do these data structures differ?

Option 1

ds1 = [
  ['lat','lon','name','tz'],
  [51.51,0.13,'London',+0],
  [40.71,74.01,'New York',-5],
  [35.69,139.68,'Tokyo',+8]
]

Option 2

ds2 = {
  'lat': [51.51,40.71,35.69],
  'lon': [0.13,74.01,139.68],
  'tz':  [+0,-5,+8],
  'name':['London','New York','Tokyo']
}

Thinking it Through

Why does this work for both computers and people?

ds2 = {
  'lat': [51.51,40.71,35.69],
  'lon': [0.13,74.01,139.68],
  'tz':  [+0,-5,+8],
  'name':['London','New York','Tokyo']
}

Examples

ds2 = {
  'lat': [51.51,40.71,35.69],
  'lon': [0.13,74.01,139.68],
  'tz':  [+0,-5,+8],
  'name':['London','New York','Tokyo']
}

print(ds2['name'][0]) # London
print(ds2['lat'][0])  # 51.51
print(ds2['tz'][0])   # 0

So 0 always returns information about London, and 2 always returns information about Tokyo. But it’s also easy to ask for the latitude (ds2['lat'][0]) or time zone (ds2['tz'][0]) value once you know that 0 is London!

How is that easier???

Remember that we can use any immutable ‘thing’ as a key. This means…

ds2 = {
  'lat': [51.51,40.71,35.69],
  'lon': [0.13,74.01,139.68],
  'tz':  [+0,-5,+8],
  'name':['London','New York','Tokyo']
}

city_nm = 'Tokyo'
city_idx = ds2['name'].index(city_nm)

print(f"The time zone of {city_nm} is {ds2['tz'][city_idx]}")

We can re-write this into a single line as:

city_nm = 'New York'
print(f"The time zone of {city_nm} is {ds2['tz'][ ds2['name'].index(city_nm)]}")

This is critical!

Once you get your head around this, then 🤯🤯🤯 because pandas and everything we do next will make a lot more sense.

Resources