Connect with Google Drive to import files

from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive

Install modules

!apt install proj-bin libproj-dev libgeos-dev
!pip install https://github.com/matplotlib/basemap/archive/master.zip
!pip install netCDF4

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  libgdal-doc
The following NEW packages will be installed:
  libgeos-dev libproj-dev proj-bin
0 upgraded, 3 newly installed, 0 to remove and 31 not upgraded.
Need to get 305 kB of archives.
After this operation, 1,706 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libgeos-dev amd64 3.6.2-1build2 [73.1 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libproj-dev amd64 4.9.3-2 [199 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 proj-bin amd64 4.9.3-2 [32.3 kB]
Fetched 305 kB in 1s (295 kB/s)
Selecting previously unselected package libgeos-dev.
(Reading database ... 144439 files and directories currently installed.)
Preparing to unpack .../libgeos-dev_3.6.2-1build2_amd64.deb ...
Unpacking libgeos-dev (3.6.2-1build2) ...
Selecting previously unselected package libproj-dev:amd64.
Preparing to unpack .../libproj-dev_4.9.3-2_amd64.deb ...
Unpacking libproj-dev:amd64 (4.9.3-2) ...
Selecting previously unselected package proj-bin.
Preparing to unpack .../proj-bin_4.9.3-2_amd64.deb ...
Unpacking proj-bin (4.9.3-2) ...
Setting up libproj-dev:amd64 (4.9.3-2) ...
Setting up libgeos-dev (3.6.2-1build2) ...
Setting up proj-bin (4.9.3-2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Collecting https://github.com/matplotlib/basemap/archive/master.zip
  Downloading https://github.com/matplotlib/basemap/archive/master.zip
     - 202.8MB 449kB/s
Requirement already satisfied: matplotlib!=3.0.1,>=1.0.0 in /usr/local/lib/python3.6/dist-packages (from basemap==1.2.1) (3.2.1)
Requirement already satisfied: numpy>=1.2.1 in /usr/local/lib/python3.6/dist-packages (from basemap==1.2.1) (1.18.4)
Collecting pyproj>=1.9.3
  Downloading https://files.pythonhosted.org/packages/e5/c3/071e080230ac4b6c64f1a2e2f9161c9737a2bc7b683d2c90b024825000c0/pyproj-2.6.1.post1-cp36-cp36m-manylinux2010_x86_64.whl (10.9MB)
     |████████████████████████████████| 10.9MB 3.7MB/s 
Collecting pyshp>=1.2.0
  Downloading https://files.pythonhosted.org/packages/27/16/3bf15aa864fb77845fab8007eda22c2bd67bd6c1fd13496df452c8c43621/pyshp-2.1.0.tar.gz (215kB)
     |████████████████████████████████| 225kB 49.5MB/s 
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from basemap==1.2.1) (1.12.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib!=3.0.1,>=1.0.0->basemap==1.2.1) (1.2.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib!=3.0.1,>=1.0.0->basemap==1.2.1) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib!=3.0.1,>=1.0.0->basemap==1.2.1) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib!=3.0.1,>=1.0.0->basemap==1.2.1) (0.10.0)
Building wheels for collected packages: basemap, pyshp
  Building wheel for basemap (setup.py) ... done
  Created wheel for basemap: filename=basemap-1.2.1-cp36-cp36m-linux_x86_64.whl size=121756024 sha256=d648f5402903e60938da0d297627f2151e27e08be7d04106a6da27217394301a
  Stored in directory: /tmp/pip-ephem-wheel-cache-hz1j04f5/wheels/98/4a/fc/ce719b75d97e646645c225f3332b1b217536100314922e9572
  Building wheel for pyshp (setup.py) ... done
  Created wheel for pyshp: filename=pyshp-2.1.0-cp36-none-any.whl size=32609 sha256=c5fdf6e6f1bc72dfee08e02796392adad5739ab7064585cb62e70f74bd2133d9
  Stored in directory: /root/.cache/pip/wheels/a6/0c/de/321b5192ad416b328975a2f0385f72c64db4656501eba7cc1a
Successfully built basemap pyshp
Installing collected packages: pyproj, pyshp, basemap
Successfully installed basemap-1.2.1 pyproj-2.6.1.post1 pyshp-2.1.0

Collecting netCDF4
  Downloading https://files.pythonhosted.org/packages/35/4f/d49fe0c65dea4d2ebfdc602d3e3d2a45a172255c151f4497c43f6d94a5f6/netCDF4-1.5.3-cp36-cp36m-manylinux1_x86_64.whl (4.1MB)
     |████████████████████████████████| 4.1MB 4.9MB/s 
Collecting cftime
  Downloading https://files.pythonhosted.org/packages/0f/5e/ee154e2aabb0beea0c4c7dc3d93c6f64f96a2a2019bbd05afc905439d042/cftime-1.1.3-cp36-cp36m-manylinux1_x86_64.whl (322kB)
     |████████████████████████████████| 327kB 31.8MB/s 
Requirement already satisfied: numpy>=1.7 in /usr/local/lib/python3.6/dist-packages (from netCDF4) (1.18.4)
Installing collected packages: cftime, netCDF4
Successfully installed cftime-1.1.3 netCDF4-1.5.3

Import libraries and modules

from netCDF4 import Dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import matplotlib
from mpl_toolkits.basemap import Basemap
from google.colab import files

The Emissions Database for Global Atmospheric Research (EDGAR) provides global past and present day anthropogenic emissions of greenhouse gases and air pollutants by country and on spatial grid.

EDGARv5.0 provides emissions of CO2 per sector and country.

Source : https://edgar.jrc.ec.europa.eu/overview.php?v=50_GHG

CO2 emissions are provided separately:

CO2_excl_short-cycle_org_C includes all fossil CO2 sources, such as fossil fuel combustion, non-metallic mineral processes (e.g. cement production), metal (ferrous and non-ferrous) production processes, urea production, agricultural liming and solvents use. Large scale biomass burning with Savannah burning, forest fires, and sources and sinks from land-use, land-use change and forestry (LULUCF) are excluded.
CO2_short-cycle_org_C represent short-cycle biomass burning (such as agricultura waste burning and Savannah burning)

For the energy related sectors the activity data is mainly based on the energy balance statistics of IEA (2017) (http://www.oecd-ilibrary.org/energy/co2-emissions-from-fuel-combustion-2017_co2_fuel-2017-en), whereas the activity data for the agricultural sectors originates mainly from FAO (2018) (http://www.fao.org/faostat/en/#home). Additional information can be found in Crippa et al. (2019).

The methodology for spatial allocation of emissions on 0.1 degree by 0.1 degree grid cells is the following: A geographical database was built using spatial proxy datasets with the location of energy and manufacturing facilities, road networks, shipping routes, human and animal population density and agricultural land use, that vary over time. The input datasets where point, line and area grids at various resolutions and using GIS techniques for conversion, resampling and aggregation the dataset have been included on a 0.1°x0.1° grid. National sector totals are then distributed with the given percentages of the spatial proxies over the country’s area

Notes: Emission gridmaps are expressed in kg substance /m2 /s.

1.1 CO2 emissions excluding short-cycle biomass burning

CO2 emissions excluding short-cycle biomass burning (such as agricultural waste burning and Savannah burning) but including other biomass burning (such as forest fires, post-burn decay, peat fires and decay of drained peatlands).

NB: Short-cycle biomass burning are, by convention, excluded from inventories because of the supposed carbon rotation speed for these emissions. However it is a convention and not a physical reality.

Emissions are expressed in kg substance /m2 /s.

Annual data from 2013 to 2018.

1.1.1 Import data

path_Edgar_2015_v1 = "/content/drive/My Drive/Data For Good/Inputs/EDGAR data/v50_CO2_excl_short-cycle_org_C_2015.0.1x0.1.nc"
path_Edgar_2016_v1 = "/content/drive/My Drive/Data For Good/Inputs/EDGAR data/v50_CO2_excl_short-cycle_org_C_2016.0.1x0.1.nc"
path_Edgar_2017_v1 = "/content/drive/My Drive/Data For Good/Inputs/EDGAR data/v50_CO2_excl_short-cycle_org_C_2017.0.1x0.1.nc"
path_Edgar_2018_v1 = "/content/drive/My Drive/Data For Good/Inputs/EDGAR data/v50_CO2_excl_short-cycle_org_C_2018.0.1x0.1.nc"

dataset_Edgar_2015_v1 = Dataset(path_Edgar_2015_v1, 'r')
dataset_Edgar_2016_v1 = Dataset(path_Edgar_2016_v1, 'r')
dataset_Edgar_2017_v1 = Dataset(path_Edgar_2017_v1, 'r')
dataset_Edgar_2018_v1 = Dataset(path_Edgar_2018_v1, 'r')

dataset_Edgar_2015_v1.variables.keys()

odict_keys(['lat', 'lon', 'emi_co2'])

dataset_Edgar_2015_v1.variables['lat'].shape

(1800,)

dataset_Edgar_2015_v1.variables['lon'].shape

(3600,)

dataset_Edgar_2015_v1.variables['emi_co2'].shape

(1800, 3600)

1.1.2 Transform into dataframe

dataset_list_v1 = [dataset_Edgar_2015_v1,dataset_Edgar_2016_v1,dataset_Edgar_2017_v1,dataset_Edgar_2018_v1]

dataframe_list_v1 = list()

for dataset in dataset_list_v1:
  arr_C02 = np.array(dataset.variables['emi_co2'])
  arr_lat = np.array(dataset.variables['lat'])
  arr_lon = np.array(dataset.variables['lon'])
  preparatory_table = np.zeros((arr_lat.size*arr_lon.size,3))
  
  for j in range(3600-1): 
    for i in range(1800-1): 
      preparatory_table[i+(1800*j),0]=arr_lat[i]
      preparatory_table[i+(1800*j),1]=arr_lon[j]
      preparatory_table[i+(1800*j),2]=arr_C02[i,j]
      i+1
    j+1
  
  dataframe_list_v1.append(pd.DataFrame(preparatory_table))

def decale(a):
    if (a > 180):
      return a-360
    return a

for dataframe in dataframe_list_v1:
  dataframe.columns = ['latitude', 'longitude', 'CO2 emissions']
  dataframe['longitude'] = dataframe['longitude'].apply(decale)

dataframe_list_v1[0]

Création des dataframes

CO2_emissions_Edgar_2015_v1 = dataframe_list_v1[0]
CO2_emissions_Edgar_2016_v1 = dataframe_list_v1[1]
CO2_emissions_Edgar_2017_v1 = dataframe_list_v1[2]
CO2_emissions_Edgar_2018_v1 = dataframe_list_v1[3]

Vérification que les données changent d'une année à l'autre - seules les données 'CO2 emissions' sont sensées changer

CO2_emissions_Edgar_2015_v1.describe()

CO2_emissions_Edgar_2018_v1.describe()

1.1.3 Data visualisation

Rappel: l'unité est en kg substance /m2 /s

def draw_map_Edgar(data, titre, lon_min=-180, lon_max=180, lat_min=-90, lat_max=90, size_point=1, frontier=False):

    plt.figure(figsize=(15, 10), edgecolor='w')
    m = Basemap(llcrnrlat=lat_min, urcrnrlat=lat_max, llcrnrlon=lon_min, urcrnrlon=lon_max)
    
    m.shadedrelief()
        
    parallels = np.arange(-80.,81,10.)
    m.drawparallels(parallels,labels=[False,True,True,False])

    meridians = np.arange(10.,351.,20.)
    m.drawmeridians(meridians,labels=[True,False,False,True])

    normal = matplotlib.colors.LogNorm(vmax=data['CO2 emissions'].max())

    m.scatter(data['longitude'], data['latitude'], c=data['CO2 emissions'], cmap=plt.cm.jet, s=size_point, norm=normal)
    plt.colorbar()
    plt.title(titre)

    if (frontier):
      m.drawcountries(linewidth=0.5)
      m.drawcoastlines(linewidth=0.7)

    plt.show()

#dataframe_list_final_v1 = (CO2_emissions_Edgar_2015_v1,CO2_emissions_Edgar_2016_v1,CO2_emissions_Edgar_2017_v1,CO2_emissions_Edgar_2018_v1)
#title_list_Edgar_v1 = ('CO2 emissions of year 2015 v1','CO2 emissions of year 2016 v1','CO2 emissions of year 2017 v1','CO2 emissions of year 2018 v1')

draw_map_Edgar(CO2_emissions_Edgar_2018_v1, 'CO2 emissions excluding short-cycle biomass burning (2018)', frontier=True)

1.1.4 Export data

#CO2_emissions_Edgar_2015_v1.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2015_v1.csv")
#CO2_emissions_Edgar_2016_v1.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2016_v1.csv")
#CO2_emissions_Edgar_2017_v1.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2017_v1.csv")
#CO2_emissions_Edgar_2018_v1.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2018_v1.csv")

1.2. CO2 emissions from short-cycle biomass burning

CO2 emissions from large scale biomass burning with Savannah burning, forest fires, and sources and sinks from land-use, land-use change and forestry (LULUCF).

Annual data from 2013 to 2015 (no data available for the next years)

Emissions are expressed in kg substance /m2 /s.

1.2.1. Import data

path_Edgar_2015_v2 = "/content/drive/My Drive/Data For Good/Inputs/EDGAR data/v50_CO2_org_short-cycle_C_2015.0.1x0.1.nc"

dataset_Edgar_2015_v2 = Dataset(path_Edgar_2015_v2, 'r')

1.2.2 Transform into dataframe

dataset_list_v2 = [dataset_Edgar_2015_v2]

dataframe_list_v2 = list()

for dataset in dataset_list_v2:
  arr_C02 = np.array(dataset.variables['emi_co2'])
  arr_lat = np.array(dataset.variables['lat'])
  arr_lon = np.array(dataset.variables['lon'])
  preparatory_table = np.zeros((arr_lat.size*arr_lon.size,3))
  
  for j in range(3600-1): 
    for i in range(1800-1): 
      preparatory_table[i+(1800*j),0]=arr_lat[i]
      preparatory_table[i+(1800*j),1]=arr_lon[j]
      preparatory_table[i+(1800*j),2]=arr_C02[i,j]
      i+1
    j+1
  
  dataframe_list_v2.append(pd.DataFrame(preparatory_table))

for dataframe in dataframe_list_v2:
  dataframe.columns = ['latitude', 'longitude', 'CO2 emissions']
  dataframe['longitude'] = dataframe['longitude'].apply(decale)

Création des dataframes

CO2_emissions_Edgar_2015_v2 = dataframe_list_v2[0]

CO2_emissions_Edgar_2015_v2.describe()

Comparaison avec la v1

CO2_emissions_Edgar_2015_v1.describe()

On observe que les données en CO2 sont bien plus faibles dans la v2 par rapport à la v1 (facteur 10 pour la moyenne, facteur 100 pour le maximum)

1.2.3 Data visualisation

Rappel: l'unité est en kg substance /m2 /s

draw_map_Edgar(CO2_emissions_Edgar_2015_v2, 'CO2 emissions from short-cycle biomass burning (2015)', frontier=True)

1.2.4 Export data

#CO2_emissions_Edgar_2015_v2.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2015_v2.csv")

1.3. Total CO2 emissions (including short-cycle biomass burning) - version of the data to be used

This version of the data results in the sum of CO2 emissions excluding short-cycle biomass burning (CO2_excl_short-cycle_org_C) and CO2 emissions from short-cycle biomass burning (CO2_short-cycle_org_C) in order to get the total CO2 emissions.

NB: For year 2016 to 2018, CO2 emissions from short-cycle biomass burning are not available, hence can only provide CO2 emissions excluding short-cycle biomass burning for these years. This has not a big influence as data excluding short-cycle (v1) are 10 times bigger on average, and 100 times bigger as regards the maximum CO2 emissions compared to data concerning biomass burning (short-cycle, v2). Therefore, even when v2 has data, v3 ≈ v1

Emissions are expressed in kg substance /m2 /s.

1.3.1 Creation of the dataframes

For 2015 we sum the CO2 emissions excluding short-cycle biomass burning + CO2 emissions due to biomass burning.

For 2016, 2017 and 2018 we only have CO2 emissions excluding short-cycle biomass.

# 2015 

CO2_emissions_Edgar_2015_v3 = pd.DataFrame()
CO2_emissions_Edgar_2015_v3['latitude'] = CO2_emissions_Edgar_2015_v1['latitude']
CO2_emissions_Edgar_2015_v3['longitude'] = CO2_emissions_Edgar_2015_v1['longitude']
CO2_emissions_Edgar_2015_v3['CO2 emissions'] = CO2_emissions_Edgar_2015_v1['CO2 emissions'] + CO2_emissions_Edgar_2015_v2['CO2 emissions']

# 2016

CO2_emissions_Edgar_2016_v3 = CO2_emissions_Edgar_2016_v1

# 2017

CO2_emissions_Edgar_2017_v3 = CO2_emissions_Edgar_2017_v1

# 2018

CO2_emissions_Edgar_2018_v3 = CO2_emissions_Edgar_2018_v1

# Test colonne 'CO2 emissions' et leur moyenne
# On devrait obtenir des différences très petites, ce qui est le cas

print(CO2_emissions_Edgar_2015_v3.describe()['CO2 emissions'].iloc[1] -(CO2_emissions_Edgar_2015_v1.describe()['CO2 emissions'].iloc[1]+CO2_emissions_Edgar_2015_v2.describe()['CO2 emissions'].iloc[1]))

1.2407709188295415e-24

1.3.2 Data visualisation

draw_map_Edgar(CO2_emissions_Edgar_2015_v3,'Total CO2 emissions including short-cycle biomass burning (2015)', frontier=True)

1.3.3 Classification of CO2 emissions

Visualisation of the data

plt.figure(figsize=(15, 15))
plt.scatter(np.arange(0,6480000,1),CO2_emissions_Edgar_2015_v3['CO2 emissions'], s=1)

<matplotlib.collections.PathCollection at 0x7fc128a78470>

We observe that many data have a very small value. Let's have a better view on the data repartition

list_percentile_2015=[]
for i in range(100):
    list_percentile_2015.append(np.percentile(CO2_emissions_Edgar_2015_v3['CO2 emissions'],i+1))

list_percentile_2015

[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 5.290037400023931e-14,
 3.654687575791957e-12,
 8.743585716564084e-12,
 1.3949656516715147e-11,
 1.7852518768846057e-11,
 2.4661993827321264e-11,
 3.063932510483098e-11,
 3.7692256034274705e-11,
 4.5434782286701776e-11,
 5.423349573249747e-11,
 6.415975889462189e-11,
 7.479893671813323e-11,
 8.58178786212571e-11,
 9.831339160014437e-11,
 1.1312961709455432e-10,
 1.3064580859419725e-10,
 1.5113806048561618e-10,
 1.7705156727155254e-10,
 2.0630213684558977e-10,
 2.4003915260959857e-10,
 2.800839399297312e-10,
 3.31950116690355e-10,
 3.979949842998834e-10,
 4.805445269137627e-10,
 5.93037748262646e-10,
 7.59068667640581e-10,
 1.0160579597693135e-09,
 1.4373994915194754e-09,
 2.1035702352928314e-09,
 3.1024761981690546e-09,
 4.589653315179021e-09,
 6.898339006689038e-09,
 1.1221703410946228e-08,
 2.347051101025954e-08,
 0.00012477662050636162]

We get a better idea on how to classify our data

Let's check if this is the same thing for 2016

list_percentile_2016=[]
for i in range(100):
    list_percentile_2016.append(np.percentile(CO2_emissions_Edgar_2016_v3['CO2 emissions'],i+1))

list_percentile_2016

[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 6.366293164297418e-13,
 4.69950764067642e-12,
 9.325238827656497e-12,
 1.4069111153869327e-11,
 1.7156065752060857e-11,
 2.353322196835264e-11,
 2.967214685761774e-11,
 3.6049654997261065e-11,
 4.314634153429475e-11,
 5.096520663827598e-11,
 6.103007224589359e-11,
 7.078636982127406e-11,
 8.165488847577949e-11,
 9.288373542526429e-11,
 1.0680399155360566e-10,
 1.2323501025202072e-10,
 1.4132858772164525e-10,
 1.6463365115626548e-10,
 1.9254301902860122e-10,
 2.2441727642874554e-10,
 2.6164630823055637e-10,
 3.0888552826624505e-10,
 3.703807807675654e-10,
 4.488693583493162e-10,
 5.541750136561765e-10,
 7.116020683817229e-10,
 9.60907096114739e-10,
 1.3666786391475207e-09,
 2.0582332838081625e-09,
 3.121146239237761e-09,
 4.8214485737219324e-09,
 7.820935614688538e-09,
 1.65766681803347e-08,
 0.00011738662578864023]

Creation of the classification

We chose the following classification after observing the 100 percentiles of the data:

CO2 emissions = 0 (encoded 'Null' or 0)
0 < CO2 emissions <= 10^-10 (encoded 'Very low' or 1)
10^-10 < CO2 emissions <= 10^-9 (encoded 'Low' or 2)
10^-9 < CO2 emissions <= 10^-8 (encoded 'Medium' or 3)
CO2 emissions > 10^-8 (encoded 'High' or 4)

# 2015
CO2_emissions_Edgar_2015_v3.loc[CO2_emissions_Edgar_2015_v3['CO2 emissions']==0, 'CO2 classification'] = 0
CO2_emissions_Edgar_2015_v3.loc[(CO2_emissions_Edgar_2015_v3['CO2 emissions'] > 0) & (CO2_emissions_Edgar_2015_v3['CO2 emissions'] <= 1.0e-10), 'CO2 classification'] = 1
CO2_emissions_Edgar_2015_v3.loc[(CO2_emissions_Edgar_2015_v3['CO2 emissions'] > 1.0e-10) & (CO2_emissions_Edgar_2015_v3['CO2 emissions'] <= 1.0e-9), 'CO2 classification'] = 2
CO2_emissions_Edgar_2015_v3.loc[(CO2_emissions_Edgar_2015_v3['CO2 emissions'] > 1.0e-9) & (CO2_emissions_Edgar_2015_v3['CO2 emissions'] <= 1.0e-8), 'CO2 classification'] = 3
CO2_emissions_Edgar_2015_v3.loc[CO2_emissions_Edgar_2015_v3['CO2 emissions'] > 1.0e-8, 'CO2 classification'] = 4

# 2016
CO2_emissions_Edgar_2016_v3.loc[CO2_emissions_Edgar_2016_v3['CO2 emissions']==0, 'CO2 classification'] = 0
CO2_emissions_Edgar_2016_v3.loc[(CO2_emissions_Edgar_2016_v3['CO2 emissions'] > 0) & (CO2_emissions_Edgar_2016_v3['CO2 emissions'] <= 1.0e-10), 'CO2 classification'] = 1
CO2_emissions_Edgar_2016_v3.loc[(CO2_emissions_Edgar_2016_v3['CO2 emissions'] > 1.0e-10) & (CO2_emissions_Edgar_2016_v3['CO2 emissions'] <= 1.0e-9), 'CO2 classification'] = 2
CO2_emissions_Edgar_2016_v3.loc[(CO2_emissions_Edgar_2016_v3['CO2 emissions'] > 1.0e-9) & (CO2_emissions_Edgar_2016_v3['CO2 emissions'] <= 1.0e-8), 'CO2 classification'] = 3
CO2_emissions_Edgar_2016_v3.loc[CO2_emissions_Edgar_2016_v3['CO2 emissions'] > 1.0e-8, 'CO2 classification'] = 4

# 2017
CO2_emissions_Edgar_2017_v3.loc[CO2_emissions_Edgar_2017_v3['CO2 emissions']==0, 'CO2 classification'] = 0
CO2_emissions_Edgar_2017_v3.loc[(CO2_emissions_Edgar_2017_v3['CO2 emissions'] > 0) & (CO2_emissions_Edgar_2017_v3['CO2 emissions'] <= 1.0e-10), 'CO2 classification'] = 1
CO2_emissions_Edgar_2017_v3.loc[(CO2_emissions_Edgar_2017_v3['CO2 emissions'] > 1.0e-10) & (CO2_emissions_Edgar_2017_v3['CO2 emissions'] <= 1.0e-9), 'CO2 classification'] = 2
CO2_emissions_Edgar_2017_v3.loc[(CO2_emissions_Edgar_2017_v3['CO2 emissions'] > 1.0e-9) & (CO2_emissions_Edgar_2017_v3['CO2 emissions'] <= 1.0e-8), 'CO2 classification'] = 3
CO2_emissions_Edgar_2017_v3.loc[CO2_emissions_Edgar_2017_v3['CO2 emissions'] > 1.0e-8, 'CO2 classification'] = 4

# 2018
CO2_emissions_Edgar_2018_v3.loc[CO2_emissions_Edgar_2018_v3['CO2 emissions']==0, 'CO2 classification'] = 0
CO2_emissions_Edgar_2018_v3.loc[(CO2_emissions_Edgar_2018_v3['CO2 emissions'] > 0) & (CO2_emissions_Edgar_2018_v3['CO2 emissions'] <= 1.0e-10), 'CO2 classification'] = 1
CO2_emissions_Edgar_2018_v3.loc[(CO2_emissions_Edgar_2018_v3['CO2 emissions'] > 1.0e-10) & (CO2_emissions_Edgar_2018_v3['CO2 emissions'] <= 1.0e-9), 'CO2 classification'] = 2
CO2_emissions_Edgar_2018_v3.loc[(CO2_emissions_Edgar_2018_v3['CO2 emissions'] > 1.0e-9) & (CO2_emissions_Edgar_2018_v3['CO2 emissions'] <= 1.0e-8), 'CO2 classification'] = 3
CO2_emissions_Edgar_2018_v3.loc[CO2_emissions_Edgar_2018_v3['CO2 emissions'] > 1.0e-8, 'CO2 classification'] = 4

Test

CO2_emissions_Edgar_2017_v3[CO2_emissions_Edgar_2017_v3['CO2 classification'] == 3]

Check if all data points have been encoded

print(CO2_emissions_Edgar_2015_v3[CO2_emissions_Edgar_2015_v3['CO2 classification'] == 0].count()
+ CO2_emissions_Edgar_2015_v3[CO2_emissions_Edgar_2015_v3['CO2 classification'] == 1].count()+
CO2_emissions_Edgar_2015_v3[CO2_emissions_Edgar_2015_v3['CO2 classification'] == 2].count()+
CO2_emissions_Edgar_2015_v3[CO2_emissions_Edgar_2015_v3['CO2 classification'] == 3].count()+
CO2_emissions_Edgar_2015_v3[CO2_emissions_Edgar_2015_v3['CO2 classification'] == 4].count())

print(CO2_emissions_Edgar_2016_v3[CO2_emissions_Edgar_2016_v3['CO2 classification'] == 0].count()
+ CO2_emissions_Edgar_2016_v3[CO2_emissions_Edgar_2016_v3['CO2 classification'] == 1].count()+
CO2_emissions_Edgar_2016_v3[CO2_emissions_Edgar_2016_v3['CO2 classification'] == 2].count()+
CO2_emissions_Edgar_2016_v3[CO2_emissions_Edgar_2016_v3['CO2 classification'] == 3].count()+
CO2_emissions_Edgar_2016_v3[CO2_emissions_Edgar_2016_v3['CO2 classification'] == 4].count())

print(CO2_emissions_Edgar_2017_v3[CO2_emissions_Edgar_2017_v3['CO2 classification'] == 0].count()
+ CO2_emissions_Edgar_2017_v3[CO2_emissions_Edgar_2017_v3['CO2 classification'] == 1].count()+
CO2_emissions_Edgar_2017_v3[CO2_emissions_Edgar_2017_v3['CO2 classification'] == 2].count()+
CO2_emissions_Edgar_2017_v3[CO2_emissions_Edgar_2017_v3['CO2 classification'] == 3].count()+
CO2_emissions_Edgar_2017_v3[CO2_emissions_Edgar_2017_v3['CO2 classification'] == 4].count())

print(CO2_emissions_Edgar_2018_v3[CO2_emissions_Edgar_2018_v3['CO2 classification'] == 0].count()
+ CO2_emissions_Edgar_2018_v3[CO2_emissions_Edgar_2018_v3['CO2 classification'] == 1].count()+
CO2_emissions_Edgar_2018_v3[CO2_emissions_Edgar_2018_v3['CO2 classification'] == 2].count()+
CO2_emissions_Edgar_2018_v3[CO2_emissions_Edgar_2018_v3['CO2 classification'] == 3].count()+
CO2_emissions_Edgar_2018_v3[CO2_emissions_Edgar_2018_v3['CO2 classification'] == 4].count())

latitude              6480000
longitude             6480000
CO2 emissions         6480000
CO2 classification    6480000
dtype: int64
latitude              6480000
longitude             6480000
CO2 emissions         6480000
CO2 classification    6480000
dtype: int64
latitude              6480000
longitude             6480000
CO2 emissions         6480000
CO2 classification    6480000
dtype: int64
latitude              6480000
longitude             6480000
CO2 emissions         6480000
CO2 classification    6480000
dtype: int64

Add a non-numerical name for the classification for a better understanding

classification_Edgar = pd.DataFrame ({'CO2 classification': [0, 1, 2, 3, 4], 'CO2 classification mapping': ['Null', 'Very low', 'Low', 'Medium', 'High']})

CO2_emissions_Edgar_2015_v3 = CO2_emissions_Edgar_2015_v3.merge(classification_Edgar, on='CO2 classification', how='left')
CO2_emissions_Edgar_2016_v3 = CO2_emissions_Edgar_2016_v3.merge(classification_Edgar, on='CO2 classification', how='left')
CO2_emissions_Edgar_2017_v3 = CO2_emissions_Edgar_2017_v3.merge(classification_Edgar, on='CO2 classification', how='left')
CO2_emissions_Edgar_2018_v3 = CO2_emissions_Edgar_2018_v3.merge(classification_Edgar, on='CO2 classification', how='left')

Number of data points in the 'High' category

2015 will have more data points as the data from v2 of Edgar data (burning of biomass) are available

print("The category 'High' has", CO2_emissions_Edgar_2015_v3[CO2_emissions_Edgar_2015_v3['CO2 classification'] == 4].count()[0], "data points in 2015")
print("The category 'High' has", CO2_emissions_Edgar_2016_v3[CO2_emissions_Edgar_2016_v3['CO2 classification'] == 4].count()[0], "data points in 2016")
print("The category 'High' has", CO2_emissions_Edgar_2017_v3[CO2_emissions_Edgar_2017_v3['CO2 classification'] == 4].count()[0], "data points in 2017")
print("The catagory 'High' has", CO2_emissions_Edgar_2018_v3[CO2_emissions_Edgar_2018_v3['CO2 classification'] == 4].count()[0], "data points in 2018")

The category 'High' has 143114 data points in 2015
The category 'High' has 103345 data points in 2016
The category 'High' has 105158 data points in 2017
The catagory 'High' has 107279 data points in 2018

CO2_emissions_Edgar_2015_v3

1.3.4 Visualisation of the peaks

CO2_emissions_Edgar_peaks_2018 = CO2_emissions_Edgar_2018_v3[CO2_emissions_Edgar_2018_v3['CO2 classification']==4]

draw_map_Edgar(CO2_emissions_Edgar_peaks_2018,'CO2 emissions peaks (2018)', frontier=True)

1.3.5 Export data

CO2_emissions_Edgar_2015_v3.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2015_v3.csv")
CO2_emissions_Edgar_2016_v3.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2016_v3.csv")
CO2_emissions_Edgar_2017_v3.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2017_v3.csv")
CO2_emissions_Edgar_2018_v3.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_Edgar_2018_v3.csv")

2. Localized CO2 emissions peaks

All the following data points that we will find below could be assimilated to the category 'High' of the Edgar data, meaning that we will only focus on coordinates where the CO2 emissions is high (and even very high)

2.1 Power plants

Data from power plants burning fossil fuels.

Link: http://datasets.wri.org/dataset/globalpowerplantdatabase

The World Resource Institute provides a list of power plants producing electricity based on different primary energies. We filtered this list to keep only the fossil primary energies (gas, oil and coal), that release CO2 during their combustion.

The dataset gives the latitude and longitude of these power plants, as well as the electricity produced, in GWh.

We will cross this data with the emission factors (in gCO2/KWh) of the different types of power plants, in order to estimate their CO2 emissions. We consider the emissions factor from the ADEME (https://www.bilans-ges.ademe.fr/docutheque/docs/Documentation%20g%C3%A9n%C3%A9rale%20anglaise%20v17.0.pdf, page 111).

These emission factors include the upstream of fuels (extraction, processing, refining, transportation and distribution), and the combustion. Therefore, the value returned by crossing the electricity produced by the power plants with these emission factor will therefore be greater than the actual value because it includes an upstream part of emissions that did not take place directly at the site of the power plant generating electricity, and therefore CO2.

Therefore, our estimation of the power plants CO2 emissions will be be larger that the actual emissions (because it will attribute to the power plants the upstream part of emissions, that are in reality not emitted at the site of the power plant). However, given our goal is here to localize CO2 emission peaks, we leave this question aside.

2.1.1 Import data

path_power_plant1 = "/content/drive/My Drive/Data For Good/Inputs/WRI data/global_power_plant_database.csv"
df_source1 = pd.read_csv(path_power_plant1, sep=",")
df_source1.head()

2.1.2 Transform into dataframe

Work on the format of the dataframe

centrale_preparation = df_source1.copy()

# Removal of non-useful columns
centrale_preparation = centrale_preparation.drop(['country','commissioning_year', 'owner', 'source','gppd_idnr', 'other_fuel1','other_fuel2','other_fuel3', 'url','geolocation_source', 'wepp_id','year_of_capacity_data'],axis=1)

# Renaming the source of the CO2 emissions for more clarity

centrale_preparation.replace('Gas', 'Gas power plant', inplace=True)
centrale_preparation.replace('Oil', 'Oil power plant', inplace=True)
centrale_preparation.replace('Coal', 'Coal power plant', inplace=True)

# Observation of all power plant types
centrale_preparation['primary_fuel'].unique()

array(['Hydro', 'Gas power plant', 'Other', 'Oil power plant', 'Wind',
       'Nuclear', 'Coal power plant', 'Solar', 'Waste', 'Biomass',
       'Wave and Tidal', 'Petcoke', 'Geothermal', 'Cogeneration',
       'Storage'], dtype=object)

# We keep only fossil fuel power plants as they emit CO2
Drop_liste = ('Hydro', 'Other', 'Biomass', 'Petcoke','Wind', 'Nuclear', 'Solar', 'Waste', 'Wave and Tidal', 'Geothermal', 'Cogeneration', 'Storage')

for L in Drop_liste:
        IndexFuel = centrale_preparation[(centrale_preparation['primary_fuel'] == L)].index
        centrale_preparation.drop(IndexFuel, inplace = True)

# Test
centrale_preparation['primary_fuel'].unique()

array(['Gas power plant', 'Oil power plant', 'Coal power plant'],
      dtype=object)

# Visualisation of missing data

centrale_preparation.count()

country_long                8602
name                        8602
capacity_mw                 8602
latitude                    8602
longitude                   8602
primary_fuel                8602
generation_gwh_2013         3078
generation_gwh_2014         3117
generation_gwh_2015         3312
generation_gwh_2016         3274
generation_gwh_2017         3210
estimated_generation_gwh    5403
dtype: int64

Creation of the dataframe to add emissions factors (source: ADEME), and merge of both dataframes

gCO2_KWh = pd.DataFrame ({'primary_fuel': ['Gas power plant', 'Oil power plant', 'Coal power plant'], 'gCO2/KWh': [443, 778, 1050]})

# Merge of the dataframes

centrale_preparation = centrale_preparation.merge(gCO2_KWh, on='primary_fuel', how='left')
centrale_preparation.head()

centrale_preparation.describe()

Negative data for 'generation GWh' are observed. We will deal with it later

Data processing

CO2_emissions_centrale = centrale_preparation.copy()

What do the data "estimated_generation_gwh" represent ? Below is what the source indicates (https://wriorg.s3.amazonaws.com/s3fs-public/global-power-plant-database-technical-note-v1.0.pdf): "When (annual) generation is reported, we include it directly in the database. When generation information is not available for a specific plant, we estimate it, although estimating annual generation at the plant level is challenging. Electricity generated by each plant varies by time period and depends on factors such as the regulatory environment, level of demand, cost of fuels, and extent of planned and unplanned maintenance in addition to plant-level characteristics. We devised two potential ways to estimate annual plant-level electricity generation: scaling information on aggregate generation by plant size and a machine-learning approach"

For each year, the data 'generation_gwh_year' provide the data that has been actually measured.

For 2017, we also have the estimated data when no data has been measured. We therefore have two types of data:

Data 'generation_gwh_2017' (only actually measured)
Data 'generation_gwh_2017_with_estimated_data' selecting in priority measured data, and if no data has been measured, then it selects the estimated data

CO2_emissions_centrale['generation_gwh_2017_with_estimated_data'] = 0

for i in range (CO2_emissions_centrale.shape[0]):
    if math.isnan(CO2_emissions_centrale['generation_gwh_2017'].iloc[i]):                
      CO2_emissions_centrale['generation_gwh_2017_with_estimated_data'].iloc[i] = CO2_emissions_centrale['estimated_generation_gwh'].iloc[i]
    else:
      CO2_emissions_centrale['generation_gwh_2017_with_estimated_data'].iloc[i] = CO2_emissions_centrale['generation_gwh_2017'].iloc[i]

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py:671: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)

count = 0

for i in range(CO2_emissions_centrale.shape[0]):
    if math.isnan(CO2_emissions_centrale['generation_gwh_2017_with_estimated_data'].iloc[i]):
      count = count +1

print("The number of missing data of GWh electricity production for fossil fuel power plants (with estimated data) is", count, "out of the", CO2_emissions_centrale.shape[0], "power plants of the entire dataset")

The number of missing data of GWh electricity production for fossil fuel power plants (with estimated data) is 162 out of the 8602 power plants of the entire dataset

Creation of the CO2 emissions data. If the GWh of electricity production are negative, then the CO2 emisisons corresponding will be a 'NaN'

CO2_emissions_centrale['tCO2_emitted_in_2013'] = 0
CO2_emissions_centrale['tCO2_emitted_in_2014'] = 0
CO2_emissions_centrale['tCO2_emitted_in_2015'] = 0
CO2_emissions_centrale['tCO2_emitted_in_2016'] = 0
CO2_emissions_centrale['tCO2_emitted_in_2017'] = 0
CO2_emissions_centrale['tCO2_emitted_in_2017_with estimated_data'] = 0

for i in range(CO2_emissions_centrale.shape[0]):
    
    # 2013
    if CO2_emissions_centrale['generation_gwh_2013'].iloc[i] <= 0:
      CO2_emissions_centrale['tCO2_emitted_in_2013'].iloc[i] = np.nan
    else:
        CO2_emissions_centrale['tCO2_emitted_in_2013'].iloc[i] = CO2_emissions_centrale['generation_gwh_2013'].iloc[i]*CO2_emissions_centrale['gCO2/KWh'].iloc[i]
    
    # 2014
    if CO2_emissions_centrale['generation_gwh_2014'].iloc[i] <= 0:
      CO2_emissions_centrale['tCO2_emitted_in_2014'].iloc[i] = np.nan
    else:
        CO2_emissions_centrale['tCO2_emitted_in_2014'].iloc[i] = CO2_emissions_centrale['generation_gwh_2014'].iloc[i]*CO2_emissions_centrale['gCO2/KWh'].iloc[i]
    
    # 2015
    if CO2_emissions_centrale['generation_gwh_2015'].iloc[i] <= 0:
      CO2_emissions_centrale['tCO2_emitted_in_2015'].iloc[i] = np.nan
    else:
        CO2_emissions_centrale['tCO2_emitted_in_2015'].iloc[i] = CO2_emissions_centrale['generation_gwh_2015'].iloc[i]*CO2_emissions_centrale['gCO2/KWh'].iloc[i]
    
    # 2016
    if CO2_emissions_centrale['generation_gwh_2016'].iloc[i] <= 0:
      CO2_emissions_centrale['tCO2_emitted_in_2016'].iloc[i] = np.nan
    else:
        CO2_emissions_centrale['tCO2_emitted_in_2016'].iloc[i] = CO2_emissions_centrale['generation_gwh_2016'].iloc[i]*CO2_emissions_centrale['gCO2/KWh'].iloc[i]
    
    # 2017 
    if CO2_emissions_centrale['generation_gwh_2017'].iloc[i] <= 0:
      CO2_emissions_centrale['tCO2_emitted_in_2017'].iloc[i] = np.nan
    else:
        CO2_emissions_centrale['tCO2_emitted_in_2017'].iloc[i] = CO2_emissions_centrale['generation_gwh_2017'].iloc[i]*CO2_emissions_centrale['gCO2/KWh'].iloc[i]
    
    # 2017 with estimated data
    if CO2_emissions_centrale['generation_gwh_2017_with_estimated_data'].iloc[i] <= 0:
      CO2_emissions_centrale['tCO2_emitted_in_2017_with estimated_data'].iloc[i] = np.nan
    else:
        CO2_emissions_centrale['tCO2_emitted_in_2017_with estimated_data'].iloc[i] = CO2_emissions_centrale['generation_gwh_2017_with_estimated_data'].iloc[i]*CO2_emissions_centrale['gCO2/KWh'].iloc[i]

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py:671: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)

Extrapolation to get data for 2018 and 2019.

The (rough) hypothesis done is that these years will have the same data as 2017 with estimated data. The ratonal behind it is that these power plants have a "long" lifetime (e.g. > 20 years for coal power plants and often > 40 years, source: https://www.gem.wiki/Estimating_carbon_dioxide_emissions_from_coal_plants), and 2018 & 2019 years could be important for the CO2 emissions peak algorithm

CO2_emissions_centrale['tCO2_emitted_in_2018_with estimated_data_extrapolated'] = CO2_emissions_centrale['tCO2_emitted_in_2017_with estimated_data']
CO2_emissions_centrale['tCO2_emitted_in_2019_with estimated_data_extrapolated'] = CO2_emissions_centrale['tCO2_emitted_in_2017_with estimated_data']

CO2_emissions_centrale.describe()

plt.figure(figsize = (10,10))
plt.scatter(np.arange(0,CO2_emissions_centrale.shape[0],1),CO2_emissions_centrale['tCO2_emitted_in_2017_with estimated_data'].sort_values(ascending = True), s=1)
plt.xlabel('Plant ID')
plt.ylabel('Annual CO2 emissions in 2017 (tonnes) with estimated data')

Text(0, 0.5, 'Annual CO2 emissions in 2017 (tonnes) with estimated data')

The maximum is due to one single outlier. We will not remove this plant as it would remove its non-estimated data.

In fact, this outlier is not normal as the maximum of tonnes of CO2 emitted per year for power plant has an order of magnitude of 10^7 (sources: https://www.eurekalert.org/pub_releases/2007-11/cfgd-crc111207.php; https://www.theguardian.com/environment/2012/jan/12/america-top-polluting-power-stations)

2.1.2 Data visualisation

def draw_map_centrale(data, year, titre, lon_min=-180, lon_max=180, lat_min=-90, lat_max=90, size_point=1, frontier=False):

    plt.figure(figsize=(15, 10), edgecolor='w')
    m = Basemap(llcrnrlat=lat_min, urcrnrlat=lat_max, llcrnrlon=lon_min, urcrnrlon=lon_max)
    
    m.shadedrelief()
        
    parallels = np.arange(-80.,81,10.)
    m.drawparallels(parallels,labels=[False,True,True,False])

    meridians = np.arange(10.,351.,20.)
    m.drawmeridians(meridians,labels=[True,False,False,True])

    normal = matplotlib.colors.LogNorm(vmax=data[year].max())

    m.scatter(data['longitude'], data['latitude'], c=data[year], cmap=plt.cm.jet, s=size_point, norm=normal)
    plt.colorbar()
    plt.title(titre)

    if (frontier):
      m.drawcountries(linewidth=0.5)
      m.drawcoastlines(linewidth=0.7)

    plt.show()

draw_map_centrale(CO2_emissions_centrale, 'tCO2_emitted_in_2017_with estimated_data', 'Tonnes of CO2 emitted per power plant (2017)', frontier=True)

/usr/local/lib/python3.6/dist-packages/matplotlib/colors.py:1171: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0

2.1.3 Export data

CO2_emissions_centrale.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_centrale.csv")

2.2 Data from Global coal plant tracker

This dataset has been provided by the Global Energy Monitor (GEM). It lists coal power plants and calculates the tonnes of CO2 emitted per year. The methodology can be found there https://www.gem.wiki/Estimating_carbon_dioxide_emissions_from_coal_plants.

To summarize, four factors are used to estimate the CO2 emissions from coal plants:

Plant capacity
Plant capacity factor
Heat rate of plant (an expression of efficiency)
Emissions factor of the type of coal used in the plant

We added some columns in the csv file provided - 'Status 2016 (done by Data4Good)' for example - indicating if the power plant was operating or not at the corresponding year (this data was only available as of January 2020). The methodology was the following:

If the 'Status' column indicates 'Mothballed' then set to 'Not operating'
If the column 'Year' is inferior ou equal to the corresponding year and this data is not empty, then set to 'Not operating'
If the column 'RETIRED' is strictly superior to the corresponding year or the data is empty (when this data is empty, this is also the case for the column "RETIRED" meaning that the power plant will not operate, e.g. "Cancelled" status), then set to 'Not operating'
Else set to 'Operating'

We also added a column 'Total annual CO2 (million tonnes / annum)' that sums all the 'Annual CO2 (million tonnes / annum)' of all units per coal power plant - when this plant is operating - in order not to lose information when removing duplicates to get only one location per plant. Many plants have indeed several units, and all these units have the same coordinate

2.2.1 Import data

CO2_emissions_coal_plant = pd.read_csv(r"/content/drive/My Drive/Data For Good/Inputs/Global coal plant tracker/January 2020 Global Coal Plant Tracker.csv", sep=";", encoding='latin1')
CO2_emissions_coal_plant.head()

2.2.2 Transfom into dataframe

Removal of useless columns

CO2_emissions_coal_plant = CO2_emissions_coal_plant.drop(['Tracker ID', 'TrackerLOC', 'Wiki page', 'Subnational unit (province, state)','Unit', 'Chinese Name','Other names',
                                                          'Sponsor', 'Parent','Suspension policy', 'Pre-suspension status','Year','RETIRED', 'Planned Retire', 'Coal source', 
                                                          'Location', 'Local area (taluk, county)', 'Major area (prefecture, district)', 'Region', 'Accuracy', 
                                                          'Lifetime CO2 (million tonnes)', 'Permits', 'Coordinates', 'Annual CO2 (million tonnes / annum) for calculation'],axis=1)
CO2_emissions_coal_plant.head()

Drop of the duplicates corresponding to the several units of each power plants (when any)

CO2_emissions_coal_plant.drop_duplicates(['Latitude','Longitude'], keep = 'first', inplace=True)

Creation of new columns calculating the CO2 emitted for a specific year

# 2016

CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2016'] = CO2_emissions_coal_plant['Total annual CO2 (million tonnes / annum)']
CO2_emissions_coal_plant.loc[CO2_emissions_coal_plant['Status 2016 (done by Data4Good)'] == 'Not operating', 'Annual CO2 emissions (millions of tonnes) in 2016'] = np.nan

# 2017

CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2017'] = CO2_emissions_coal_plant['Total annual CO2 (million tonnes / annum)']
CO2_emissions_coal_plant.loc[CO2_emissions_coal_plant['Status 2017 (done by Data4Good)'] == 'Not operating', 'Annual CO2 emissions (millions of tonnes) in 2017'] = np.nan

# 2018

CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2018'] = CO2_emissions_coal_plant['Total annual CO2 (million tonnes / annum)']
CO2_emissions_coal_plant.loc[CO2_emissions_coal_plant['Status 2018 (done by Data4Good)'] == 'Not operating', 'Annual CO2 emissions (millions of tonnes) in 2018'] = np.nan

# 2019

CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2019'] = CO2_emissions_coal_plant['Total annual CO2 (million tonnes / annum)']
CO2_emissions_coal_plant.loc[CO2_emissions_coal_plant['Status 2019 (done by Data4Good)'] == 'Not operating', 'Annual CO2 emissions (millions of tonnes) in 2019'] = np.nan

CO2_emissions_coal_plant.head()

2.2.3 Data visualisation

Visualisation of the CO2 emissions data - example of 2016

plt.figure(figsize = (10,10))
plt.scatter(np.arange(0,CO2_emissions_coal_plant.shape[0],1),CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2016'].sort_values(ascending = True), s=1)
plt.xlabel('Coal plant ID')
plt.ylabel('Annual CO2 emissions (millions of tonnes) in 2016')

Text(0, 0.5, 'Annual CO2 emissions (millions of tonnes) in 2016')

2.2.4 Export data

CO2_emissions_coal_plant.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_coal_plant.csv")

2.3 Cities v1 - few cities/CO2 data available

The source of the data is: https://public.opendatasoft.com/explore/dataset/co2-emissions-cities/export/?sort=study_year_wb

It is comprised of the major cities in the world

In this dataset, we have the annual tCO2-eq emissions data for Scope 1 and Scope 2. We will only use Scope 1 data, as it concerns direct emissions occuring from the cities (e.g. direct emissions from fossil fuel burning generated from road traffic), whereas Scope 2 emissions are indirect emissions coming from the use of electricity, heat, steam or cold (therefore the CO2 emissions are not present where the energy is consumed, but where the energy has been produced, which is not the city).

Moreover, some data available are in CO2-eq, which is not only comprised of CO2, but also of CH4 and NO2 for instance. Therefore, the actual CO2 emissions would be lower than the data of this dataset for some data. We will find below that 'They are 44 cities with tCO2 data - and not tCO2-eq data - out of 187 cities in total'

At last, we do not have a comprehensive split by years. Therefore, we will assume that they are not major changes from 2016 to 2019, which seems to make sense as major cities does not 'evolve' a lot within a few years

2.3.1 Import data

path_cities_v1 = "/content/drive/My Drive/Data For Good/Inputs/Cities/co2-emissions-cities.csv"
cities_v1_preparation = pd.read_csv(path_cities_v1, sep=";")
cities_v1_preparation.head()

2.3.2 Transform into dataframe

Selection of useful columns

cities_v1 = cities_v1_preparation[['City name', 'Country', 'Scope-1 GHG emissions [tCO2 or tCO2-eq]', 'Scope-1 source dataset', 'Scope-1 GHG emissions units', 'Year of emission', 
                                   'City location (CDP) [degrees]', 'Population (CDP)', 'Population year (CDP)']]
cities_v1.head()

Split of the column 'City location (CDP) [degrees]' to get latitudes and longitudes

# Create two lists for the loop results to be placed
lat = []
lon = []

# For each row in a varible,
for row in cities_v1['City location (CDP) [degrees]']:
    # Try to,
    try:
        # Split the row by comma and append
        # everything before the comma to lat
        lat.append(row.split(',')[0])
        # Split the row by comma and append
        # everything after the comma to lon
        lon.append(row.split(',')[1])
    # But if you get an error
    except:
        # append a missing value to lat
        lat.append(np.NaN)
        # append a missing value to lon
        lon.append(np.NaN)

# Create two new columns from lat and lon
cities_v1['latitude'] = lat
cities_v1['longitude'] = lon

# Convert into numeric values
cities_v1[['latitude', 'longitude']] = cities_v1[['latitude', 'longitude']].apply(pd.to_numeric)

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:23: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:24: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py:2963: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]

cities_v1.head()

print("They are", cities_v1.shape[0], "cities in the dataset")

They are 343 cities in the dataset

Removal of the cities which latitude and longitude are NaN

cities_v1.dropna(subset = ['latitude', 'longitude'], inplace=True)
print("They are", cities_v1.shape[0], "cities after having removed cities without coordinate")

They are 187 cities after having removed cities without coordinate

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

count = cities_v1.loc[cities_v1['Scope-1 GHG emissions units'] == 'tCO2', 'Scope-1 GHG emissions units'].count()
print("They are", count, "cities with tCO2 data - and not tCO2-eq data - out of", cities_v1.shape[0], "cities with coordinates")

They are 44 cities with tCO2 data - and not tCO2-eq data - out of 187 cities with coordinates

2.3.3 Data visualisation

draw_map_centrale(cities_v1, 'Scope-1 GHG emissions [tCO2 or tCO2-eq]', 'Scope 1 CO2e cities emissions')

/usr/local/lib/python3.6/dist-packages/matplotlib/colors.py:1171: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0

2.3.4 Export data

cities_v1.to_csv("/content/drive/My Drive/Data For Good/Outputs/cities_v1.csv")

2.4 Cities v2 - lots of cities/no CO2 data

The source of the data is https://simplemaps.com/data/world-cities ("We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA")

"It was last refreshed in December of 2019". We will assume no major changes from 2016 to 2019 when later creating datasets for these years (as for cities_v1)

Here we have around 15 000 cities with the population data, but we do not have any CO2 data as opposed to cities_v1

2.4.1 Import data

path_cities_v2 = "/content/drive/My Drive/Data For Good/Inputs/Cities/worldcities.csv"
cities_v2_preparation = pd.read_csv(path_cities_v2, sep=";", encoding = 'latin1')
cities_v2_preparation['lat'] = cities_v2_preparation['lat'].apply(lambda x: x.replace(',','.')).astype('float')
cities_v2_preparation['lng'] = cities_v2_preparation['lng'].apply(lambda x: x.replace(',','.')).astype('float')

2.4.2 Transform into dataframe

cities_v2_preparation.head()

cities_v2 = pd.DataFrame()
cities_v2['city'] = cities_v2_preparation['city']
cities_v2['country'] = cities_v2_preparation['country']
cities_v2['latitude'] = cities_v2_preparation['lat']
cities_v2['longitude'] = cities_v2_preparation['lng']
cities_v2['pop'] = cities_v2_preparation['population']

cities_v2.head()

2.4.3 Data visualisation

draw_map_centrale(cities_v2, 'pop', 'Population of main cities')

/usr/local/lib/python3.6/dist-packages/matplotlib/colors.py:1171: RuntimeWarning: invalid value encountered in less_equal
  mask |= resdat <= 0

2.4.4 Export data

cities_v2.to_csv("/content/drive/My Drive/Data For Good/Outputs/cities_v2.csv")

2.5. ETS data

Data representing the declared CO2 equivalent emissions of all infrastructures subject to European carbon market (ETS). This data were provided by the think tank Sandbag (https://sandbag.be/) for the use of this project.

The location of the infrastructure was found using OpenStreetMap (more details in the script: https://github.com/dataforgoodfr/batch7_satellite_ges/blob/master/notebooks/Process%20Sandbag%20data%20-%20Find%20infrastructure%20location.ipynb)

2.5.1 Import data

path_sandbag = "/content/drive/My Drive/Data For Good/Inputs/SANDBAG data/sandbag_data_infrastructure_location.csv"

sandbag_data = pd.read_csv(path_sandbag, sep=";", encoding='latin1')
sandbag_data.head()

sandbag_data.shape

(9969, 26)

2.5.2 Transform into dataframe

# Remove installation from the aviation sector because emissions are not declared at their emission site
sandbag_data = sandbag_data[sandbag_data['SandbagSector']!='Aviation']

# Removal of the coma for latitudes and longitudes
sandbag_data['latitude'] = sandbag_data['latitude'].astype('str').apply(lambda x: x.replace(',','.')).astype('float')
sandbag_data['longitude'] = sandbag_data['longitude'].astype('str').apply(lambda x: x.replace(',','.')).astype('float')

sandbag_data.shape

(9969, 26)

2.5.3 Data visualisation

def draw_map_peaks(data, titre, lon_min=-180, lon_max=180, lat_min=-90, lat_max=90, size_point=1, frontier=False):

    plt.figure(figsize=(15, 10), edgecolor='w')
    m = Basemap(llcrnrlat=lat_min, urcrnrlat=lat_max, llcrnrlon=lon_min, urcrnrlon=lon_max)
    
    m.shadedrelief()
        
    parallels = np.arange(-80.,81,10.)
    m.drawparallels(parallels,labels=[False,True,True,False])

    meridians = np.arange(10.,351.,20.)
    m.drawmeridians(meridians,labels=[True,False,False,True])

    normal = matplotlib.colors.LogNorm(vmax=data['tCO2e'].max())

    m.scatter(data['longitude'], data['latitude'], c=data['tCO2e'], cmap=plt.cm.jet, s=size_point, norm=normal)
    plt.colorbar()
    plt.title(titre)

    if (frontier):
      m.drawcountries(linewidth=0.5)
      m.drawcoastlines(linewidth=0.7)

    plt.show()

draw_map_peaks(sandbag_data, 'CO2 emissions of European infrastructure (Sandbag 2019)')

draw_map_peaks(sandbag_data, 'CO2 emissions of European infrastructure (Sandbag 2019)', lon_min=-30, lon_max=30, lat_min=30, lat_max=60)

2.6 Merge of the CO2 emissions peaks data

Note that in the column 'CO2/CO2e emissions source', we kept the segmentation coal / gas / oil for power plants from Global Energy Monitor and the World Resource Institute. In Sandbag, the corresponding category is 'power and heat' withtout the granularity of the type of fossil fuel burnt

2.6.1 Creation of the dataframes to merge the data

Data from power plants (2.1)

2016

centrale_all_for_merge_2016 = pd.DataFrame()

centrale_all_for_merge_2016['latitude'] = CO2_emissions_centrale['latitude']
centrale_all_for_merge_2016['longitude'] = CO2_emissions_centrale['longitude']
centrale_all_for_merge_2016['Data source'] = 'World Resources Institute and ADEME for the emissions factors'
centrale_all_for_merge_2016['CO2/CO2e emissions source'] = CO2_emissions_centrale['primary_fuel']
centrale_all_for_merge_2016['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_centrale['tCO2_emitted_in_2016']
centrale_all_for_merge_2016['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
centrale_all_for_merge_2016.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

2017 (the CO2 emissions data selected here is 'with estimated data')

centrale_all_for_merge_2017 = pd.DataFrame()

centrale_all_for_merge_2017['latitude'] = CO2_emissions_centrale['latitude']
centrale_all_for_merge_2017['longitude'] = CO2_emissions_centrale['longitude']
centrale_all_for_merge_2017['Data source'] = 'World Resources Institute and ADEME for the emissions factors'
centrale_all_for_merge_2017['CO2/CO2e emissions source'] = CO2_emissions_centrale['primary_fuel']
centrale_all_for_merge_2017['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_centrale['tCO2_emitted_in_2017_with estimated_data']
centrale_all_for_merge_2017['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
centrale_all_for_merge_2017.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

2018

centrale_all_for_merge_2018 = pd.DataFrame()

centrale_all_for_merge_2018['latitude'] = CO2_emissions_centrale['latitude']
centrale_all_for_merge_2018['longitude'] = CO2_emissions_centrale['longitude']
centrale_all_for_merge_2018['Data source'] = 'World Resources Institute and ADEME for the emissions factors'
centrale_all_for_merge_2018['CO2/CO2e emissions source'] = CO2_emissions_centrale['primary_fuel']
centrale_all_for_merge_2018['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_centrale['tCO2_emitted_in_2018_with estimated_data_extrapolated']
centrale_all_for_merge_2018['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
centrale_all_for_merge_2018.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

2019

centrale_all_for_merge_2019 = pd.DataFrame()

centrale_all_for_merge_2019['latitude'] = CO2_emissions_centrale['latitude']
centrale_all_for_merge_2019['longitude'] = CO2_emissions_centrale['longitude']
centrale_all_for_merge_2019['Data source'] = 'World Resources Institute and ADEME for the emissions factors'
centrale_all_for_merge_2019['CO2/CO2e emissions source'] = CO2_emissions_centrale['primary_fuel']
centrale_all_for_merge_2019['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_centrale['tCO2_emitted_in_2019_with estimated_data_extrapolated']
centrale_all_for_merge_2019['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
centrale_all_for_merge_2019.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

centrale_all_for_merge_2019.head()

Data from coal power plants (2.2)

2016

coal_plant_for_merge_2016 = pd.DataFrame()

coal_plant_for_merge_2016['latitude'] = CO2_emissions_coal_plant['Latitude']
coal_plant_for_merge_2016['longitude'] = CO2_emissions_coal_plant['Longitude']
coal_plant_for_merge_2016['Data source'] = 'Global Energy Monitor'
coal_plant_for_merge_2016['CO2/CO2e emissions source'] = 'Coal power plant'
coal_plant_for_merge_2016['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2016']*1000000
coal_plant_for_merge_2016['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
coal_plant_for_merge_2016.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

2017

coal_plant_for_merge_2017 = pd.DataFrame()

coal_plant_for_merge_2017['latitude'] = CO2_emissions_coal_plant['Latitude']
coal_plant_for_merge_2017['longitude'] = CO2_emissions_coal_plant['Longitude']
coal_plant_for_merge_2017['Data source'] = 'Global Energy Monitor'
coal_plant_for_merge_2017['CO2/CO2e emissions source'] = 'Coal power plant'
coal_plant_for_merge_2017['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2017']*1000000
coal_plant_for_merge_2017['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
coal_plant_for_merge_2017.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

2018

coal_plant_for_merge_2018 = pd.DataFrame()

coal_plant_for_merge_2018['latitude'] = CO2_emissions_coal_plant['Latitude']
coal_plant_for_merge_2018['longitude'] = CO2_emissions_coal_plant['Longitude']
coal_plant_for_merge_2018['Data source'] = 'Global Energy Monitor'
coal_plant_for_merge_2018['CO2/CO2e emissions source'] = 'Coal power plant'
coal_plant_for_merge_2018['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2018']*1000000
coal_plant_for_merge_2018['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
coal_plant_for_merge_2018.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

2019

coal_plant_for_merge_2019 = pd.DataFrame()

coal_plant_for_merge_2019['latitude'] = CO2_emissions_coal_plant['Latitude']
coal_plant_for_merge_2019['longitude'] = CO2_emissions_coal_plant['Longitude']
coal_plant_for_merge_2019['Data source'] = 'Global Energy Monitor'
coal_plant_for_merge_2019['CO2/CO2e emissions source'] = 'Coal power plant'
coal_plant_for_merge_2019['CO2/CO2e emissions (in tonnes per year)'] = CO2_emissions_coal_plant['Annual CO2 emissions (millions of tonnes) in 2019']*1000000
coal_plant_for_merge_2019['CO2 or CO2e'] = 'CO2'

# Delete plants which CO2 emissions is a NaN
coal_plant_for_merge_2019.dropna(subset = ['CO2/CO2e emissions (in tonnes per year)'], inplace=True)

coal_plant_for_merge_2019.head()

Data from cities (2.3)

cities_v1_for_merge = pd.DataFrame()

cities_v1_for_merge['city'] = cities_v1['City name']
cities_v1_for_merge['latitude'] = cities_v1['latitude']
cities_v1_for_merge['longitude'] = cities_v1['longitude']
cities_v1_for_merge['Data source'] = 'Opendatasoft'
cities_v1_for_merge['CO2/CO2e emissions source'] = 'City'
cities_v1_for_merge['CO2/CO2e emissions (in tonnes per year)'] = cities_v1['Scope-1 GHG emissions [tCO2 or tCO2-eq]']
cities_v1_for_merge['CO2 or CO2e'] = 'CO2e'
cities_v1_for_merge.loc[cities_v1['Scope-1 GHG emissions units'] == 'tCO2', 'CO2 or CO2e'] = 'CO2'

cities_v1_for_merge.head()

Data from cities (2.4)

cities_v2_for_merge = pd.DataFrame()

# Filter the cities to get cities which population is significant ( > 500 000 habitants chosen here, we get around 1 000 cities this way) 
cities_v2_copy = cities_v2[cities_v2['pop'] > 500000]

cities_v2_for_merge['city'] = cities_v2_copy['city']
cities_v2_for_merge['latitude'] = cities_v2_copy['latitude']
cities_v2_for_merge['longitude'] = cities_v2_copy['longitude']
cities_v2_for_merge['Data source'] = 'Simplemaps'
cities_v2_for_merge['CO2/CO2e emissions source'] = 'City'
cities_v2_for_merge['CO2/CO2e emissions (in tonnes per year)'] = np.nan
cities_v2_for_merge['CO2 or CO2e'] = np.nan

cities_v2_for_merge.head()

Data from Sandbag (2.5)

sandbag_for_merge_preparation = sandbag_data.copy()
sandbag_for_merge_preparation.head()

# Drop all rows which SandbagSector has not been categorized
sandbag_for_merge_preparation.dropna(subset = ['SandbagSector'], inplace=True)

sandbag_for_merge_preparation['SandbagSector'].unique()

array(['Cement and Lime', 'Glass', 'Power and heat', 'Ceramics',
       'Pulp and paper', 'Chemicals', 'Iron and steel', 'Combustion',
       'Non ferrous metals', 'Mineral oil', 'Coke ovens'], dtype=object)

sandbag_for_merge = pd.DataFrame()

sandbag_for_merge['latitude'] = sandbag_for_merge_preparation['latitude']
sandbag_for_merge['longitude'] = sandbag_for_merge_preparation['longitude']
sandbag_for_merge['Data source'] = 'Sandbag'
sandbag_for_merge['CO2/CO2e emissions source'] = sandbag_for_merge_preparation['SandbagSector']
sandbag_for_merge['CO2/CO2e emissions (in tonnes per year)'] = sandbag_for_merge_preparation['tCO2e']
sandbag_for_merge['CO2 or CO2e'] = 'CO2e'

2.6.2 Merge of the data

Removal of the duplicates

For cities: we keep cities_v1 data over cities_v2 in priority as it provides CO2 data (reminder: most of the data are not in CO2 but in CO2-eq or CO2e). Moreover, cities_v2 has been filtered to get cities with more than 500 000 habitants
For plants: it concerns the coal power plants that are present in both dataset. The source from 'Global power plant database' has been selected in priority when there are some duplicates over 'Global Energy Monitor' as well as over Sandbag. The reason Sandbag has been selected in last is to have a great number of data from different sources, and Sandbag provides also a lot of data for ciment, glass etc.
In case some plants are located at the headquarters of the company - therefore at the city location - the "cities_merged" data will be selected in priority when there are some duplicates to remove this issue

Concerning the column 'CO2 emissions (in tonnes per year)', we are mixing data from different sources that use different methodologies to calculate the CO2 or CO2e. Therefore, this data has been kept here for a better understanding of the CO2/CO2e emissions peak, but should be used with caution

Merge of the cities datasets

cities_merged = pd.concat([cities_v1_for_merge, cities_v2_for_merge])
cities_merged.drop_duplicates(['city'], keep = 'first', inplace=True)
cities_merged = cities_merged.drop(columns = 'city')

nb_cities_duplicates = (cities_v1_for_merge.shape[0] + cities_v2_for_merge.shape[0]) - cities_merged.shape[0]
print("There are", nb_cities_duplicates, "duplicates")

There are 104 duplicates

cities_merged.head()

Merge of all peaks datasets

2016

CO2_emissions_peaks_merged_2016 = pd.concat([cities_merged, centrale_all_for_merge_2016, coal_plant_for_merge_2016, sandbag_for_merge])
CO2_emissions_peaks_merged_2016.drop_duplicates(['latitude','longitude'], keep = 'first', inplace=True)

nb_power_plant_duplicates = (centrale_all_for_merge_2016.shape[0] + coal_plant_for_merge_2016.shape[0] + cities_merged.shape[0] + sandbag_for_merge.shape[0]) - CO2_emissions_peaks_merged_2016.shape[0]
print("There are", nb_power_plant_duplicates, "duplicates in 2016")

There are 3442 duplicates in 2016

2017

CO2_emissions_peaks_merged_2017 = pd.concat([cities_merged, centrale_all_for_merge_2017, coal_plant_for_merge_2017, sandbag_for_merge])
CO2_emissions_peaks_merged_2017.drop_duplicates(['latitude','longitude'], keep = 'first', inplace=True)

nb_power_plant_duplicates = (centrale_all_for_merge_2017.shape[0] + coal_plant_for_merge_2017.shape[0] + cities_merged.shape[0] + sandbag_for_merge.shape[0]) - CO2_emissions_peaks_merged_2017.shape[0]
print("There are", nb_power_plant_duplicates, "duplicates in 2017")

There are 3787 duplicates in 2017

2018

CO2_emissions_peaks_merged_2018 = pd.concat([cities_merged, centrale_all_for_merge_2018, coal_plant_for_merge_2018, sandbag_for_merge])
CO2_emissions_peaks_merged_2018.drop_duplicates(['latitude','longitude'], keep = 'first', inplace=True)

nb_power_plant_duplicates = (centrale_all_for_merge_2018.shape[0] + coal_plant_for_merge_2018.shape[0] + cities_merged.shape[0] + sandbag_for_merge.shape[0]) - CO2_emissions_peaks_merged_2018.shape[0]
print("There are", nb_power_plant_duplicates, "duplicates in 2018")

There are 3784 duplicates in 2018

2019

CO2_emissions_peaks_merged_2019 = pd.concat([cities_merged, centrale_all_for_merge_2019, coal_plant_for_merge_2019, sandbag_for_merge])
CO2_emissions_peaks_merged_2019.drop_duplicates(['latitude','longitude'], keep = 'first', inplace=True)

nb_power_plant_duplicates = (centrale_all_for_merge_2019.shape[0] + coal_plant_for_merge_2019.shape[0] + cities_merged.shape[0] + sandbag_for_merge.shape[0]) - CO2_emissions_peaks_merged_2019.shape[0]
print("There are", nb_power_plant_duplicates, "duplicates in 2019")

There are 3783 duplicates in 2019

CO2_emissions_peaks_merged_2019

2.6.3 Export data

CO2_emissions_peaks_merged_2016.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_peaks_merged_2016.csv")
CO2_emissions_peaks_merged_2017.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_peaks_merged_2017.csv")
CO2_emissions_peaks_merged_2018.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_peaks_merged_2018.csv")
CO2_emissions_peaks_merged_2019.to_csv("/content/drive/My Drive/Data For Good/Outputs/CO2_emissions_peaks_merged_2019.csv")

	latitude	longitude	CO2 emissions
count	6.480000e+06	6.480000e+06	6.480000e+06
mean	-4.995834e-02	1.387778e-05	1.819788e-09
std	5.191101e+01	1.038942e+02	9.246275e-08
min	-8.995000e+01	-1.799500e+02	0.000000e+00
25%	-4.495000e+01	-8.995001e+01	0.000000e+00
50%	0.000000e+00	0.000000e+00	0.000000e+00
75%	4.485000e+01	8.995000e+01	4.200186e-11
max	8.985000e+01	1.799500e+02	1.247735e-04

	latitude	longitude	CO2 emissions
count	6.480000e+06	6.480000e+06	6.480000e+06
mean	-4.995834e-02	1.387778e-05	1.893313e-09
std	5.191101e+01	1.038942e+02	9.225375e-08
min	-8.995000e+01	-1.799500e+02	0.000000e+00
25%	-4.495000e+01	-8.995001e+01	0.000000e+00
50%	0.000000e+00	0.000000e+00	0.000000e+00
75%	4.485000e+01	8.995000e+01	4.428961e-11
max	8.985000e+01	1.799500e+02	1.208277e-04

	latitude	longitude	CO2 emissions
count	6.480000e+06	6.480000e+06	6.480000e+06
mean	-4.995834e-02	1.387778e-05	2.741960e-10
std	5.191101e+01	1.038942e+02	4.513424e-09
min	-8.995000e+01	-1.799500e+02	0.000000e+00
25%	-4.495000e+01	-8.995001e+01	0.000000e+00
50%	0.000000e+00	0.000000e+00	0.000000e+00
75%	4.485000e+01	8.995000e+01	0.000000e+00
max	8.985000e+01	1.799500e+02	2.792968e-06

	latitude	longitude	CO2 emissions
count	6.480000e+06	6.480000e+06	6.480000e+06
mean	-4.995834e-02	1.387778e-05	1.819788e-09
std	5.191101e+01	1.038942e+02	9.246275e-08
min	-8.995000e+01	-1.799500e+02	0.000000e+00
25%	-4.495000e+01	-8.995001e+01	0.000000e+00
50%	0.000000e+00	0.000000e+00	0.000000e+00
75%	4.485000e+01	8.995000e+01	4.200186e-11
max	8.985000e+01	1.799500e+02	1.247735e-04

	latitude	longitude	CO2 emissions	CO2 classification
957	5.750000	0.050000	7.177163e-09	3.0
958	5.850000	0.050000	1.921435e-09	3.0
959	5.950000	0.050000	2.693414e-09	3.0
960	6.050000	0.050000	2.264855e-09	3.0
961	6.150000	0.050000	5.650535e-09	3.0
...	...	...	...	...
6477854	55.450001	-0.149994	7.974968e-09	3.0
6477855	55.549999	-0.149994	1.785780e-09	3.0
6477856	55.650002	-0.149994	1.781414e-09	3.0
6477860	56.049999	-0.149994	1.024707e-09	3.0
6478178	87.849998	-0.149994	1.113390e-09	3.0

	latitude	longitude	CO2 emissions
0	-89.949997	0.05	0.0
1	-89.849998	0.05	0.0
2	-89.750000	0.05	0.0
3	-89.650002	0.05	0.0
4	-89.550003	0.05	0.0
...	...	...	...
6479995	0.000000	0.00	0.0
6479996	0.000000	0.00	0.0
6479997	0.000000	0.00	0.0
6479998	0.000000	0.00	0.0
6479999	0.000000	0.00	0.0

	latitude	longitude	CO2 emissions	CO2 classification	CO2 classification mapping
0	-89.949997	0.05	0.0	0.0	Null
1	-89.849998	0.05	0.0	0.0	Null
2	-89.750000	0.05	0.0	0.0	Null
3	-89.650002	0.05	0.0	0.0	Null
4	-89.550003	0.05	0.0	0.0	Null
...	...	...	...	...	...
6479995	0.000000	0.00	0.0	0.0	Null
6479996	0.000000	0.00	0.0	0.0	Null
6479997	0.000000	0.00	0.0	0.0	Null
6479998	0.000000	0.00	0.0	0.0	Null
6479999	0.000000	0.00	0.0	0.0	Null

	country	country_long	name	gppd_idnr	capacity_mw	latitude	longitude	primary_fuel	other_fuel1	other_fuel2	other_fuel3	commissioning_year	owner	source	url	geolocation_source	wepp_id	year_of_capacity_data	generation_gwh_2013	generation_gwh_2014	generation_gwh_2015	generation_gwh_2016	generation_gwh_2017	estimated_generation_gwh
0	AFG	Afghanistan	Kajaki Hydroelectric Power Plant Afghanistan	GEODB0040538	33.00	32.3220	65.1190	Hydro	NaN	NaN	NaN	NaN	NaN	GEODB	http://globalenergyobservatory.org	GEODB	1009793	2017.0	NaN	NaN	NaN	NaN	NaN	NaN
1	AFG	Afghanistan	Mahipar Hydroelectric Power Plant Afghanistan	GEODB0040541	66.00	34.5560	69.4787	Hydro	NaN	NaN	NaN	NaN	NaN	GEODB	http://globalenergyobservatory.org	GEODB	1009795	2017.0	NaN	NaN	NaN	NaN	NaN	NaN
2	AFG	Afghanistan	Naghlu Dam Hydroelectric Power Plant Afghanistan	GEODB0040534	100.00	34.6410	69.7170	Hydro	NaN	NaN	NaN	NaN	NaN	GEODB	http://globalenergyobservatory.org	GEODB	1009797	2017.0	NaN	NaN	NaN	NaN	NaN	NaN
3	AFG	Afghanistan	Nangarhar (Darunta) Hydroelectric Power Plant ...	GEODB0040536	11.55	34.4847	70.3633	Hydro	NaN	NaN	NaN	NaN	NaN	GEODB	http://globalenergyobservatory.org	GEODB	1009787	2017.0	NaN	NaN	NaN	NaN	NaN	NaN
4	AFG	Afghanistan	Northwest Kabul Power Plant Afghanistan	GEODB0040540	42.00	34.5638	69.1134	Gas	NaN	NaN	NaN	NaN	NaN	GEODB	http://globalenergyobservatory.org	GEODB	NaN	2017.0	NaN	NaN	NaN	NaN	NaN	NaN

	country_long	name	capacity_mw	latitude	longitude	primary_fuel	generation_gwh_2013	generation_gwh_2014	generation_gwh_2015	generation_gwh_2016	generation_gwh_2017	estimated_generation_gwh	gCO2/KWh
0	Afghanistan	Northwest Kabul Power Plant Afghanistan	42.0	34.5638	69.1134	Gas power plant	NaN	NaN	NaN	NaN	NaN	NaN	443
1	Algeria	Ain Djasser	520.0	35.8665	6.0262	Gas power plant	NaN	NaN	NaN	NaN	NaN	2152.249819	443
2	Algeria	Annaba	71.0	36.8924	7.7634	Gas power plant	NaN	NaN	NaN	NaN	NaN	293.864879	443
3	Algeria	Arbaa	560.0	36.5988	3.1375	Gas power plant	NaN	NaN	NaN	NaN	NaN	2317.807497	443
4	Algeria	Boufarik 1	100.0	36.5914	2.9223	Gas power plant	NaN	NaN	NaN	NaN	NaN	413.894196	443

	capacity_mw	latitude	longitude	generation_gwh_2013	generation_gwh_2014	generation_gwh_2015	generation_gwh_2016	generation_gwh_2017	estimated_generation_gwh	gCO2/KWh
count	8602.000000	8602.000000	8602.000000	3078.000000	3117.000000	3312.000000	3274.000000	3210.000000	5403.000000	8602.000000
mean	434.151369	27.287221	-5.885974	1045.620597	1091.104519	1120.312133	1058.376820	1134.772571	2265.492359	700.833062
std	690.388169	24.641181	86.714480	2578.122419	2715.041407	2675.889462	2632.241272	2674.701678	7140.625909	256.469725
min	1.000000	-77.847000	-179.977700	-2.653000	-262.902000	-141.014000	-127.507000	-132.033000	0.000000	443.000000
25%	16.073750	21.434975	-84.425525	0.035250	0.032000	0.025000	0.019000	0.341514	89.494709	443.000000
50%	132.250000	35.594300	-23.547500	18.245000	16.227000	22.749000	16.377500	36.623000	737.318085	778.000000
75%	600.000000	42.104150	79.832125	633.405479	630.858000	762.341497	583.308501	697.294500	2782.119122	1050.000000
max	8865.000000	71.292000	178.835900	27586.200000	28127.000000	32157.000000	30015.000000	35116.000000	450562.692350	1050.000000

	Tracker ID	TrackerLOC	Wiki page	Country	Subnational unit (province, state)	Unit	Plant	Chinese Name	Other names	Sponsor	Parent	Capacity (MW)	Status	Status 2016 (done by Data4Good)	Status 2017 (done by Data4Good)	Status 2018 (done by Data4Good)	Status 2019 (done by Data4Good)	Suspension policy	Pre-suspension status	Year	RETIRED	Combustion technology	Coal type	Planned Retire	Coal source	Location	Local area (taluk, county)	Major area (prefecture, district)	Region	Latitude	Longitude	Coordinates	Accuracy	Annual CO2 (million tonnes / annum)	Annual CO2 (million tonnes / annum) for calculation	Total annual CO2 (million tonnes / annum)	Lifetime CO2 (million tonnes)	Permits
0	G108106	L100001	http://bit.ly/Ur1VP1	Albania	NaN	Porto Romano Power Station Unit 1	Porto Romano Power Station	NaN	NaN	Enel	Enel	800	Cancelled	Not operating	Not operating	Not operating	Not operating	NaN	NaN	NaN	NaN	Ultra-super	Bituminous	NaN	NaN	Porto Romano	NaN	NaN	non-EU Europe	41.37114	19.4252	41.37114 19.4252	approximate	3.217105	NaN	0.000000	128.684218	NaN
1	G108585	L100003	http://bit.ly/1g7lD7Q	Argentina	Santa Cruz	Río Turbio power station Unit 1	Río Turbio power station	NaN	NaN	Isolux Corsán	Yacimientos Carboníferos de Río Turbio	120	Mothballed	Not operating	Not operating	Not operating	Not operating	NaN	NaN	2015	NaN	Subcritical	Bituminous	NaN	NaN	Río Turbio coal mine	NaN	Güer Aike	Latin America	-51.54600	-72.2313	-51.546 -72.2313	Exact	0.609181	NaN	0.000000	24.367240	NaN
2	G108586	L100003	http://bit.ly/1g7lD7Q	Argentina	Santa Cruz	Río Turbio power station Unit 2	Río Turbio power station	NaN	NaN	Isolux Corsán	Yacimientos Carboníferos de Río Turbio	120	Shelved	Not operating	Not operating	Not operating	Not operating	NaN	NaN	NaN	NaN	Subcritical	Bituminous	NaN	NaN	Río Turbio coal mine	NaN	Güer Aike	Latin America	-51.54600	-72.2313	-51.546 -72.2313	Exact	0.609181	NaN	0.000000	24.367240	NaN
3	G108783	L100002	http://bit.ly/1UqLug2	Argentina	Buenos Aires	San Nicolás-2 power station Unit 11	San Nicolás power station	NaN	NaN	AES Argentina	AES	350	Operating	Operating	Operating	Operating	Operating	NaN	NaN	1983	NaN	Subcritical	Bituminous	NaN	NaN	San Nicolás de los Arroyos	NaN	San Nicolás	Latin America	-33.35490	-60.1729	-33.3549 -60.1729	exact	1.998875	1.998875	1.998875	5.996626	NaN
4	G100202	L100031	http://bit.ly/1T2gvtt	Australia	Victoria	Anglesea power station Unit 1	Anglesea power station	NaN	NaN	Alcoa of Australia	Alcoa	160	Retired	Not operating	Not operating	Not operating	Not operating	NaN	NaN	1969	2015.0	Subcritical	Lignite	NaN	NaN	Anglesea	NaN	NaN	Australia/NZ	-38.38650	144.1821	-38.3865 144.1821	exact	1.047857	NaN	0.000000	0.000000	NaN

	City name	City name (CDP)	City name (carbonn)	City name (PKU)	City name (GEA)	City name (UITP)	City name (WB)	Definition (CDP)	Definition (carbonn)	Definition (WB)	Study year (WB)	Reporting year (CDP)	Scope-1 GHG emissions [tCO2 or tCO2-eq]	Scope-1 source dataset	Scope-1 GHG emissions units	Year of emission	Emissions protocol	Gases included (CDP)	Methodology details (CDP)	Increase/Decrease from last year (CDP)	Reason for increase/decrease in emissions (CDP)	Scope-2 (CDP) [tCO2-eq]	Total emissions (CDP) [tCO2-eq]	CDP2016 data edited (CDP)	Emissions Quality Flag (CDP)	S1 lower bound [tCO2] (CDP)	S1 upper bound [tCO2] (CDP)	S1 mean (CDP) [tCO2]	TOT lower bound [tCO2] (CDP)	TOT upper bound [tCO2] (CDP)	TOT mean (CDP) [tCO2]	Scope fraction (CDP)	CO2 emissions per capita (PKU) [tCO2/capita]	City location (CDP) [degrees]	Coordinate source (others)	Country	Region	Population (CDP)	Population year (CDP)	Population (carbonn)	...	CIMI performance (IESE) [dimensionless]	Urban area name (UEX)	Low BUA - 1990 (UEX) [km2]	Low BUA - 2000 (UEX) [km2]	Low BUA - 2014 (UEX) [km2]	High BUA - 1990 (UEX) [km2]	High BUA - 2000 (UEX) [km2]	High BUA - 2014 (UEX) [km2]	Low BUA % - 1990 (UEX) [percent]	Low BUA % - 2000 (UEX) [percent]	Low BUA % - 2014 (UEX) [percent]	High BUA % - 1990 (UEX) [percent]	High BUA % - 2000 (UEX) [percent]	High BUA % - 2014 (UEX) [percent]	Low BUA population density - 1990 (UEX) [people/km2]	High BUA population density - 1990 (UEX) [people/km2]	Low BUA population density - 2000 (UEX) [people/km2]	High BUA population density - 2000 (UEX) [people/km2]	Low BUA population density - 2014 (UEX) [people/km2]	High BUA population density - 2014 (UEX) [people/km2]	Exports (others) [m3]	Production (others) [m3]	Natgas Export/Production ratio (others)	CH4_waste/capita (others) [tCH4/capita]	CH4_waste+natgas/capita (others) [tCH4/capita]	Corrected CH4_(waste+natgas)/capita (others) [tCH4/capita]	Year from CIA (others)	AQF (CDP/GEA)	AQF (CDP/WB)	AQF (CDP/OTHERS)	AQF (PKU/GEA)	AQF (PKU/WB)	AQF (PKU/OTHERS)	PQF (CDP)	PQF (carbonn)	PQF (WB)	PQF (WB2010)	PQF (OTHERS)	HQF (GEA+)	HQF (OTHERS)
0	Toronto	City of Toronto	NaN	NaN	Toronto	Toronto	Toronto	Administrative boundary of a local government	NaN	Greater Toronto Area	2005.0	2016.0	1.615102e+07	CDP	tCO2	2013	Global Protocol for Community-Scale Greenhouse...	CO2	Currently, Toronto’s emissions calculations fo...	Decreased	a. Eliminating the use of coal in the electri...	2169947.0	18320966.0	NaN	A	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	43.653226,-79.3831843	https://en.wikipedia.org/wiki/Toronto	Canada	North America	2753100.0	2011.0	NaN	...	RA	Toronto Urban Area	571.4375	575.4375	579.9375	1514.5625	1690.1875	1882.3750	0.273939	0.253986	0.235526	0.726061	0.746014	0.764474	431.432118	2031.859282	458.892395	2785.230164	217.583540	3310.007300	7.796000e+10	1.515000e+11	0.51	0.046860	0.101728	0.073494	NaN	0.0	0.0	1.0	NaN	NaN	NaN	1.0	NaN	0.0	1.0	1.0	1.0	1.0
1	Santiago de Cali	Santiago de Cali	NaN	NaN	NaN	NaN	NaN	Combination of administrative divisions	NaN	NaN	NaN	2016.0	NaN	CDP	tCO2-eq	2010	2006 IPCC Guidelines for National Greenhouse G...	CO2;CH4;N2O	NaN	NaN	NaN	NaN	4174608.0	NaN	D	NaN	NaN	NaN	3.158859e+06	3.725714e+06	3.442286e+06	NaN	NaN	3.451647,-76.531985	https://en.wikipedia.org/wiki/Cali	Colombia	Latin America & Caribbean	2369829.0	2015.0	NaN	...	NaN	Santiago de Cali Urban Area	56.0000	57.8125	62.6250	119.6875	122.3125	126.5000	0.318748	0.320958	0.331130	0.681252	0.679042	0.668870	845.544916	12504.360651	1581.022747	17057.840645	1663.541202	18606.553425	0.000000e+00	1.000000e+00	0.00	0.006765	0.015308	0.015308	NaN	NaN	NaN	1.0	NaN	NaN	NaN	0.0	NaN	NaN	NaN	1.0	NaN	NaN
2	Kunming	NaN	NaN	Kuiming	Kunming	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2.649622e+06	PKU	tCO2	2010	Other	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.610822	NaN	wikipedia	China	East Asia	NaN	NaN	NaN	...	NaN	Kunming Urban Area	181.0000	204.6875	233.7500	277.7500	296.1250	337.0625	0.394550	0.408711	0.409504	0.605450	0.591289	0.590496	1152.958930	8718.324791	1071.142381	9830.286863	936.034510	9992.153340	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	NaN	NaN	NaN	NaN	NaN	1.0	NaN
3	Baotou	NaN	NaN	Baotou	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	3.363195e+06	PKU	tCO2	2010	Other	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.596385	NaN	wikipedia	China	East Asia	NaN	NaN	NaN	...	NaN	Baotou Urban Area	142.5000	152.5000	174.6875	196.5625	234.0625	263.8125	0.420276	0.394503	0.398375	0.579724	0.605497	0.601625	1370.690465	9537.698290	935.066160	6904.531867	1066.308145	7608.402629	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	Milano	Comune di Milano	NaN	NaN	Milano (Milan)	Milan	NaN	Administrative boundary of a local government	NaN	NaN	NaN	2016.0	3.728678e+06	CDP	tCO2	2013	Other	CO2	We didn't adopt any specific Protocol, anyway,...	NaN	The most reliable comparison (for data complet...	2248100.0	5976779.0	NaN	A	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	45.802578,9.086356	https://en.wikipedia.org/wiki/Milan	Italy	Europe	1350680.0	2014.0	NaN	...	RA	Comune di Milano Urban Area	650.5000	668.4375	675.3125	1338.0625	1395.5625	1478.6250	0.327121	0.323855	0.313525	0.672879	0.676145	0.686475	831.378768	9446.754296	299.898277	3373.213272	307.961815	3413.239571	6.800000e+07	7.800000e+09	0.01	0.011383	0.015146	0.015113	NaN	NaN	NaN	1.0	NaN	NaN	NaN	1.0	NaN	NaN	NaN	1.0	1.0	1.0

	city	city_ascii	lat	lng	country	iso2	iso3	admin_name	capital	population	id
0	Tokyo	Tokyo	35.6850	139.7514	Japan	JP	JPN	T?ky?	primary	35676000.0	1392685764
1	New York	New York	40.6943	-73.9249	United States	US	USA	New York	NaN	19354922.0	1840034016
2	Mexico City	Mexico City	19.4424	-99.1310	Mexico	MX	MEX	Ciudad de México	primary	19028000.0	1484247881
3	Mumbai	Mumbai	19.0170	72.8570	India	IN	IND	Mah?r?shtra	admin	18978000.0	1356226629
4	São Paulo	Sao Paulo	-23.5587	-46.6250	Brazil	BR	BRA	São Paulo	admin	18845000.0	1076532519

	Unnamed: 0	InstallationNumber	InstallationName	Key	RegCtryName	SandbagSector	tCO2e	InstallationLatitude	InstallationLongtitude	InstallationPostCode	InstallationCity	InstallationAddress1	InstallationAddress2	MainActivityType	ClassDescription	NACE	Period	RecScr	full_address	city_address	full_address_latitude	full_address_longitude	city_address_latitude	city_address_longitude	latitude	longitude
0	0	1	Baumit Baustoffe Bad Ischl	AT 1	Austria	Cement and Lime	46415	NaN	NaN	4820	Bad Ischl	Rettenbach 143	NaN	Production of lime, or calcination of dolomite...	Manufacture of lime and plaster	23,52	2019	20200415 Data Release	Rettenbach 143 4820 Bad Ischl Austria	4820 Bad Ischl Austria	47,7159529	13,636275	NaN	NaN	47,7159529	13,636275
1	1	10	Vetropack Pöchlarn	AT 10	Austria	Glass	59982	NaN	NaN	3380	Pöchlarn	Manker Strasse 49	NaN	Manufacture of glass	Manufacture of hollow glass	23,13	2019	20200415 Data Release	Manker Strasse 49 3380 Pöchlarn Austria	3380 Pöchlarn Austria	48,20632765	15,21905925	NaN	NaN	48,20632765	15,21905925
2	2	101	Verbund FHKW Werndorf 2 Wildon	AT 101	Austria	Power and heat	187	NaN	NaN	8410	Neudorf ob Wildon	Weisseneggweg 1	NaN	Combustion of fuels	Production of electricity	35,11	2019	20200415 Data Release	Weisseneggweg 1 8410 Neudorf ob Wildon Austria	8410 Neudorf ob Wildon Austria	46,9091688	15,48467955	NaN	NaN	46,9091688	15,48467955
3	3	103	Kelag Wärme St. Magdalen	AT 103	Austria	Power and heat	6218	NaN	NaN	9524	St. Magdalen	Europastrasse 2	NaN	Combustion of fuels	Steam and air conditioning supply	35,3	2019	20200415 Data Release	Europastrasse 2 9524 St. Magdalen Austria	9524 St. Magdalen Austria	46,6124745	13,88263573	NaN	NaN	46,6124745	13,88263573
4	4	104	Kelag Wärme Prolactal Hartberg	AT 104	Austria	Power and heat	1461	NaN	NaN	8230	Hartberg	Ferd. Leihsstrasse 40	NaN	Combustion of fuels	Steam and air conditioning supply	35,3	2019	20200415 Data Release	Ferd. Leihsstrasse 40 8230 Hartberg Austria	8230 Hartberg Austria	NaN	NaN	47,2809371	15,9691769	47,2809371	15,9691769

	latitude	longitude	Data source	CO2/CO2e emissions source	CO2/CO2e emissions (in tonnes per year)	CO2 or CO2e
1	35.8665	6.0262	World Resources Institute and ADEME for the em...	Gas power plant	9.534467e+05	CO2
2	36.8924	7.7634	World Resources Institute and ADEME for the em...	Gas power plant	1.301821e+05	CO2
3	36.5988	3.1375	World Resources Institute and ADEME for the em...	Gas power plant	1.026789e+06	CO2
4	36.5914	2.9223	World Resources Institute and ADEME for the em...	Gas power plant	1.833551e+05	CO2
5	36.5960	2.8780	World Resources Institute and ADEME for the em...	Gas power plant	8.250981e+05	CO2

	latitude	longitude	Data source	CO2/CO2e emissions source	CO2/CO2e emissions (in tonnes per year)	CO2 or CO2e
3	-33.3549	-60.1729	Global Energy Monitor	Coal power plant	1.998875e+06	CO2
8	-32.3940	150.9501	Global Energy Monitor	Coal power plant	1.396040e+07	CO2
12	-33.3322	116.2280	Global Energy Monitor	Coal power plant	2.603427e+06	CO2
21	-33.3431	116.2608	Global Energy Monitor	Coal power plant	1.955364e+06	CO2
37	-33.0622	151.5214	Global Energy Monitor	Coal power plant	1.522953e+07	CO2

	city	latitude	longitude	Data source	CO2/CO2e emissions source	CO2/CO2e emissions (in tonnes per year)	CO2 or CO2e
0	Toronto	43.653226	-79.383184	Opendatasoft	City	16151019.0	CO2
1	Santiago de Cali	3.451647	-76.531985	Opendatasoft	City	NaN	CO2e
4	Milano	45.802578	9.086356	Opendatasoft	City	3728678.0	CO2
5	Hayward	37.668900	-122.080800	Opendatasoft	City	861854.0	CO2e
6	Tokyo	35.689634	139.692101	Opendatasoft	City	27611000.0	CO2e