SensorID | Place | Owner | |
---|---|---|---|
0 | 1 | LHR | BAA |
1 | 2 | LGA | GIP |
2 | 3 | STA | MAG |
3 | 4 | LUT | Luton LA |
4 | 5 | SEN | Stobart |
Jon Reades - j.reades@ucl.ac.uk
1st October 2025
A join refers to the merging of two (or more) data tables using one (or more) matching columns:
SensorID | Place | Owner | Parameter | Value | |
---|---|---|---|---|---|
0 | 1 | LHR | BAA | Temperature | 5ºC |
1 | 1 | LHR | BAA | Humidity | 15% |
2 | 3 | STA | MAG | Temperature | 7ºC |
3 | 4 | LUT | Luton LA | Temperature | 7ºC |
This can also be written:
SensorID | Place | Owner |
---|---|---|
1 ⇒ | LHR | BAA |
2 ✘ | LGA | GIP |
3 ⇒ | STA | MAG |
4 ⇒ | LUT | Luton LA |
5 ✘ | SEN | Stobart |
SensorID | Parameter | Value |
---|---|---|
1 ⇐ | Temperature | 5ºC |
1 ⇐ | Humidity | 15% |
3 ⇐ | Temperature | 7ºC |
4 ⇐ | Temperature | 7ºC |
6 ✘ | Humidity | 18% |
If Data Set 2 had a SensorKey
instead of a SensorID
then:
On an Outer Join all rows are retained, including ones with no match:
On a Left Join all rows on the left table are retained, including ones with no match, but unmatched right rows are dropped:
Pandas has two additional join-like functions:
SensorID | Place | Owner | |
---|---|---|---|
0 | 1 | LHR | BAA |
1 | 2 | LGA | GIP |
2 | 3 | STA | MAG |
3 | 4 | LUT | Luton LA |
4 | 5 | SEN | Stobart |
5 | 2 | STA | BAA |
6 | 3 | LUT | Luton LA |
7 | 8 | BHX | ??? |
8 | 9 | MAN | ??? |
9 | 10 | INV | ??? |
to_append = [
{'SensorID': 0, 'Parameter': 'Humidity', 'Value': 0.45},
{'SensorID': 1, 'Parameter': 'Humidity', 'Value': 0.31},
{'SensorID': 2, 'Parameter': 'Temperature', 'Value': 2},
{'SensorID': 3, 'Parameter': 'Temperature', 'Value': 3}]
pd.concat([df2, pd.DataFrame(to_append)], ignore_index=True)
SensorID | Parameter | Value | |
---|---|---|---|
0 | 1 | Temperature | 5ºC |
1 | 1 | Humidity | 15% |
2 | 3 | Temperature | 7ºC |
3 | 4 | Temperature | 7ºC |
4 | 6 | Humidity | 18% |
5 | 0 | Humidity | 0.45 |
6 | 1 | Humidity | 0.31 |
7 | 2 | Temperature | 2.0 |
8 | 3 | Temperature | 3.0 |
As usual, Stack Overflow to the rescue:
A very high level difference is that
merge()
is used to combine two (or more) dataframes on the basis of values of common columns (indices can also be used, useleft_index=True
and/orright_index=True
), andconcat()
is used to append one (or more) dataframes one below the other (or sideways, depending on whether the axis option is set to 0 or 1).
join()
is used to merge 2 dataframes on the basis of the index; instead of usingmerge()
with the optionleft_index=True
we can usejoin()
.
Hint: axis=0
refers to the row index & axis=1
to the column index.
These achieve the same thing, but they are not always equivalent or what you want:
SensorID | Place | Owner | SensorID_r | Parameter | Value | |
---|---|---|---|---|---|---|
0 | 1 | LHR | BAA | 1 | Temperature | 5ºC |
1 | 2 | LGA | GIP | 1 | Humidity | 15% |
2 | 3 | STA | MAG | 3 | Temperature | 7ºC |
3 | 4 | LUT | Luton LA | 4 | Temperature | 7ºC |
4 | 5 | SEN | Stobart | 6 | Humidity | 18% |
Generally: