| SensorID | Place | Owner | |
|---|---|---|---|
| 0 | 1 | LHR | BAA |
| 1 | 2 | LGA | GIP |
| 2 | 3 | STA | MAG |
| 3 | 4 | LUT | Luton LA |
| 4 | 5 | SEN | Stobart |
Jon Reades - j.reades@ucl.ac.uk
1st October 2025
A join refers to the merging of two (or more) data tables using one (or more) matching columns:
| SensorID | Place | Owner | Parameter | Value | |
|---|---|---|---|---|---|
| 0 | 1 | LHR | BAA | Temperature | 5ºC |
| 1 | 1 | LHR | BAA | Humidity | 15% |
| 2 | 3 | STA | MAG | Temperature | 7ºC |
| 3 | 4 | LUT | Luton LA | Temperature | 7ºC |
This can also be written:
| SensorID | Place | Owner |
|---|---|---|
| 1 ⇒ | LHR | BAA |
| 2 ✘ | LGA | GIP |
| 3 ⇒ | STA | MAG |
| 4 ⇒ | LUT | Luton LA |
| 5 ✘ | SEN | Stobart |
| SensorID | Parameter | Value |
|---|---|---|
| 1 ⇐ | Temperature | 5ºC |
| 1 ⇐ | Humidity | 15% |
| 3 ⇐ | Temperature | 7ºC |
| 4 ⇐ | Temperature | 7ºC |
| 6 ✘ | Humidity | 18% |
If Data Set 2 had a SensorKey instead of a SensorID then:
On an Outer Join all rows are retained, including ones with no match:
On a Left Join all rows on the left table are retained, including ones with no match, but unmatched right rows are dropped:
Pandas has two additional join-like functions:
| SensorID | Place | Owner | |
|---|---|---|---|
| 0 | 1 | LHR | BAA |
| 1 | 2 | LGA | GIP |
| 2 | 3 | STA | MAG |
| 3 | 4 | LUT | Luton LA |
| 4 | 5 | SEN | Stobart |
| 5 | 2 | STA | BAA |
| 6 | 3 | LUT | Luton LA |
| 7 | 8 | BHX | ??? |
| 8 | 9 | MAN | ??? |
| 9 | 10 | INV | ??? |
to_append = [
{'SensorID': 0, 'Parameter': 'Humidity', 'Value': 0.45},
{'SensorID': 1, 'Parameter': 'Humidity', 'Value': 0.31},
{'SensorID': 2, 'Parameter': 'Temperature', 'Value': 2},
{'SensorID': 3, 'Parameter': 'Temperature', 'Value': 3}]
pd.concat([df2, pd.DataFrame(to_append)], ignore_index=True)| SensorID | Parameter | Value | |
|---|---|---|---|
| 0 | 1 | Temperature | 5ºC |
| 1 | 1 | Humidity | 15% |
| 2 | 3 | Temperature | 7ºC |
| 3 | 4 | Temperature | 7ºC |
| 4 | 6 | Humidity | 18% |
| 5 | 0 | Humidity | 0.45 |
| 6 | 1 | Humidity | 0.31 |
| 7 | 2 | Temperature | 2.0 |
| 8 | 3 | Temperature | 3.0 |
As usual, Stack Overflow to the rescue:
A very high level difference is that
merge()is used to combine two (or more) dataframes on the basis of values of common columns (indices can also be used, useleft_index=Trueand/orright_index=True), andconcat()is used to append one (or more) dataframes one below the other (or sideways, depending on whether the axis option is set to 0 or 1).
join()is used to merge 2 dataframes on the basis of the index; instead of usingmerge()with the optionleft_index=Truewe can usejoin().
Hint: axis=0 refers to the row index & axis=1 to the column index.
These achieve the same thing, but they are not always equivalent or what you want:
| SensorID | Place | Owner | SensorID_r | Parameter | Value | |
|---|---|---|---|---|---|---|
| 0 | 1 | LHR | BAA | 1 | Temperature | 5ºC |
| 1 | 2 | LGA | GIP | 1 | Humidity | 15% |
| 2 | 3 | STA | MAG | 3 | Temperature | 7ºC |
| 3 | 4 | LUT | Luton LA | 4 | Temperature | 7ºC |
| 4 | 5 | SEN | Stobart | 6 | Humidity | 18% |
Generally: