Foundations of Spatial Data Science

Useful, But Limited?

Method	Achieves
`count()`	Total number of items
`first()`, `last()`	First and last item
`mean()`, `median()`	Mean and median
`min()`, `max()`	Minimum and maximum
`std()`, `var()`	Standard deviation and variance
`mad()`	Mean absolute deviation
`prod()`	Product of all items
`sum()`	Sum of all items

In Pandas these follow a split / apply / combine approach:

grouped_df = df.groupby(<fields>).<function>

For instance, if we had a Local Authority (LA) field:

grouped_df = df.groupby('LA').sum()

Using apply the function could be anything:

def norm_by_data(x): # x is a column from the grouped df
    x['d1'] /= x['d2'].sum() 
    return x

df.groupby('LA').apply(norm_by_data)

mapping = {'HAK':'Inner', 'TH':'Outer', 'W':'Inner'}
df.set_index('LA', inplace=True)
df.groupby(mapping).sum()

A ‘special case’ of Group By features:

Commonly-used in business to summarise data for reporting.
Grouping (summarisation) happens along both axes (Group By operates only on one).
pandas.cut(<series>, <bins>) can be a useful feature here since it chops a continuous feature into bins suitable for grouping.

age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')