Jon Reades - j.reades@ucl.ac.uk
1st October 2025
Pros:
Cons:
survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
We can compare performance using profiling1:
pd_execution_time = timeit.timeit(load_pandas, number=num_reps)
print(f"Pandas execution time: {pd_execution_time:.6f} seconds")
Pandas execution time: 0.784750 seconds
pl_execution_time = timeit.timeit(load_polars, number=num_reps)
print(f"Polars execution time: {pl_execution_time:.6f} seconds")
Polars execution time: 0.161258 seconds
79.45% faster than pandas.
db_execution_time = timeit.timeit(load_duck, number=num_reps)
print(f"DuckDB execution time: {db_execution_time:.6f} seconds")
DuckDB execution time: 0.100967 seconds
87.13% faster than pandas and 37.39% faster than polars.
Method | Achieves |
---|---|
count() |
Total number of items |
first() , last() |
First and last item |
mean() , median() |
Mean and median |
min() , max() |
Minimum and maximum |
std() , var() |
Standard deviation and variance |
mad() |
Mean absolute deviation |
prod() |
Product of all items |
sum() |
Sum of all items |
In Pandas these follow a split / apply / combine approach:
For instance, if we had a Local Authority (LA
) field:
Using apply
the function could be anything:
A ‘special case’ of Group By features:
pandas.cut(<series>, <bins>)
can be a useful feature here since it chops a continuous feature into bins suitable for grouping.