Predicting the eolian energy production#

This Notebook aims at predicting the energy producte by wind turbines.

It uses weather data extracted from the MeteoFrance numerical models, as well as history of productions provided by RTE.

[1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

[2]:

filename_wind_regions = "../../data/silver/mean_daily_wind_j0.csv"

filename_energy_preduction = "../../data/rte_agg_daily_2014_2024.csv"

[3]:

df_ssrd_regions = pd.read_csv(filename_wind_regions, parse_dates=["time"]).set_index(
    "time"
)
# sanitise the column names
region_names = [
    col.replace(" ", "_").replace("'", "_").replace("-", "_").lower()
    for col in df_ssrd_regions.columns
]
df_ssrd_regions.columns = region_names
region_names = df_ssrd_regions.columns
df_ssrd_regions.plot(figsize=(15, 10))
df_ssrd_regions["days_from_start"] = [
    (date - df_ssrd_regions.index[0]).days for date in df_ssrd_regions.index
]
df_ssrd_regions.head()

[3]:

	auvergne_rhône_alpes	bourgogne_franche_comté	bretagne	centre_val_de_loire	corse	grand_est	hauts_de_france	normandie	nouvelle_aquitaine	occitanie	pays_de_la_loire	provence_alpes_côte_d_azur	île_de_france	days_from_start
time
2022-02-01	4.205852	4.060221	5.199642	4.778342	3.561165	5.454614	6.432882	6.416779	3.152589	6.048412	4.748808	5.744050	4.977423	0
2022-02-02	3.156469	3.178754	3.410363	3.398772	2.383473	4.102714	4.987001	4.406740	2.294659	4.454043	3.110102	5.584663	3.663799	1
2022-02-03	2.234994	2.016168	3.373047	2.801909	2.217947	2.946621	4.726071	4.218961	2.206917	2.502523	3.111282	2.187303	3.396305	2
2022-02-04	2.453711	3.737635	4.975356	4.658530	2.061253	5.098637	6.986221	6.097625	3.139985	2.935683	4.232595	2.280711	5.306362	3
2022-02-05	2.609659	2.231746	4.131737	3.149658	1.859197	3.742508	5.961911	5.229400	2.279602	3.726291	3.058013	3.321954	3.946324	4

../../_images/user_guide_notebooks_2_eolen_prediction_3_1.png

[4]:

df_energy_preduction = pd.read_csv(filename_energy_preduction, index_col=0)[
    ["Eolien", "Solaire"]
]
df_energy_preduction.index = pd.to_datetime(df_energy_preduction.index)
df_energy_preduction.head(), df_energy_preduction.tail()

[4]:

(              Eolien  Solaire
 Date
 2015-01-01   51127.0  11370.5
 2015-01-02   78933.0   8297.5
 2015-01-03  105299.0   5860.5
 2015-01-04   30061.0   6926.0
 2015-01-05   16004.0   9786.5,
               Eolien  Solaire
 Date
 2024-04-04  285321.0  76581.5
 2024-04-05  232208.5  72847.5
 2024-04-06  225106.0  61577.5
 2024-04-07  138049.5  46718.5
 2024-04-08   53990.0  26677.0)

[5]:

df_energy_preduction.index

[5]:

DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04',
               '2015-01-05', '2015-01-06', '2015-01-07', '2015-01-08',
               '2015-01-09', '2015-01-10',
               ...
               '2024-03-30', '2024-03-31', '2024-04-01', '2024-04-02',
               '2024-04-03', '2024-04-04', '2024-04-05', '2024-04-06',
               '2024-04-07', '2024-04-08'],
              dtype='datetime64[ns]', name='Date', length=3386, freq=None)

[6]:

# align the indexes of the two dataframes
data = pd.concat([df_ssrd_regions, df_energy_preduction], join="inner", axis=1)
data.head()

[6]:

	auvergne_rhône_alpes	bourgogne_franche_comté	bretagne	centre_val_de_loire	corse	grand_est	hauts_de_france	normandie	nouvelle_aquitaine	occitanie	pays_de_la_loire	provence_alpes_côte_d_azur	île_de_france	days_from_start	Eolien	Solaire
2022-02-01	4.205852	4.060221	5.199642	4.778342	3.561165	5.454614	6.432882	6.416779	3.152589	6.048412	4.748808	5.744050	4.977423	0	227954.0	21938.5
2022-02-02	3.156469	3.178754	3.410363	3.398772	2.383473	4.102714	4.987001	4.406740	2.294659	4.454043	3.110102	5.584663	3.663799	1	138768.0	21271.0
2022-02-03	2.234994	2.016168	3.373047	2.801909	2.217947	2.946621	4.726071	4.218961	2.206917	2.502523	3.111282	2.187303	3.396305	2	63557.5	20527.5
2022-02-04	2.453711	3.737635	4.975356	4.658530	2.061253	5.098637	6.986221	6.097625	3.139985	2.935683	4.232595	2.280711	5.306362	3	178764.0	19051.0
2022-02-05	2.609659	2.231746	4.131737	3.149658	1.859197	3.742508	5.961911	5.229400	2.279602	3.726291	3.058013	3.321954	3.946324	4	145138.0	41271.5

[8]:

from statsmodels.formula.api import ols

# split test for time series
from sklearn.model_selection import TimeSeriesSplit

Modeling#

4 models are tested :

Only Total wind speed (no region details)
Only regions Wind Speed
Total Wind Speed + time
Regions wind Speed + tim

[9]:

exo_vars = region_names
data["mean_wind"] = data[exo_vars].mean(axis=1)
endog_var = "Eolien"

[10]:

tscv = TimeSeriesSplit(n_splits=60, test_size=1)  # testing on 3 days forcast

[11]:

def test_model(formula="Eolien ~ mean_wind -1"):
    mod_1_mape = []
    for i, (train_index, test_index) in enumerate(tscv.split(data)):
        model_1 = ols(formula, data=data.iloc[train_index]).fit()
        if i == 0:
            first_test_index = test_index
            first_model_1 = model_1
        predictions = model_1.predict(data.iloc[test_index])
        error = data.iloc[test_index]["Eolien"] - predictions
        mape = (error.abs() / data.iloc[test_index]["Eolien"]).mean()
        mod_1_mape.append(mape)
    last_test_index = test_index
    last_model_1 = model_1
    return mod_1_mape, first_test_index, first_model_1, last_test_index, last_model_1


formula_1 = "Eolien ~ mean_wind -1"
mod_1_mape, first_test_index, first_model_1, last_test_index, last_model_1 = test_model(
    formula=formula_1
)

[12]:

ax = data.plot(y="Eolien", label="True")
first_model_1.predict(data.iloc[first_test_index]).plot(
    ax=ax, label="First Test Predicted"
)
last_model_1.predict(data.iloc[last_test_index]).plot(
    ax=ax, label="Last Test Predicted"
)
ax.legend()

[12]:

<matplotlib.legend.Legend at 0x705b33af1a80>

../../_images/user_guide_notebooks_2_eolen_prediction_12_1.png

[13]:

fig, ax = plt.subplots()
ax.hist(mod_1_mape, bins=20)
ax.set_title("MAPE distribution for model 1")
ax.set_xlabel("MAPE")

[13]:

Text(0.5, 0, 'MAPE')

../../_images/user_guide_notebooks_2_eolen_prediction_13_1.png

[14]:

formula_2 = f"Eolien ~ {' + '.join(exo_vars)} -1"
print(formula_2)
mod_2_mape, first_test_index, first_model_2, last_test_index, last_model_2 = test_model(
    formula_2
)

Eolien ~ auvergne_rhône_alpes + bourgogne_franche_comté + bretagne + centre_val_de_loire + corse + grand_est + hauts_de_france + normandie + nouvelle_aquitaine + occitanie + pays_de_la_loire + provence_alpes_côte_d_azur + île_de_france -1

[15]:

fig, ax = plt.subplots()
ax.hist(mod_2_mape, bins=20)
ax.set_title("MAPE distribution for model 2")

[15]:

Text(0.5, 1.0, 'MAPE distribution for model 2')

../../_images/user_guide_notebooks_2_eolen_prediction_15_1.png

[16]:

formula_3 = formula_1 + " + days_from_start"
mod_3_mape, first_test_index, first_model_3, last_test_index, last_model_3 = test_model(
    formula_3
)
formula_4 = formula_2 + " + days_from_start"
mod_4_mape, first_test_index, first_model_4, last_test_index, last_model_4 = test_model(
    formula_4
)

[17]:

# display the MAPE distribution for all models (KDE)
fig, ax = plt.subplots()
for i, mape in enumerate([mod_1_mape, mod_2_mape, mod_3_mape, mod_4_mape]):
    pd.Series(mape).plot.kde(ax=ax, label=f"Model {i+1}")
ax.set_title("MAPE distribution for all models")
ax.legend()

[17]:

<matplotlib.legend.Legend at 0x705b1e782e00>

../../_images/user_guide_notebooks_2_eolen_prediction_17_1.png

[18]:

# print mean MAPE for all models
for i, mape in enumerate([mod_1_mape, mod_2_mape, mod_3_mape, mod_4_mape]):
    print(f"Model {i+1} mean MAPE: {np.mean(mape):.2%}")

Model 1 mean MAPE: 27.37%
Model 2 mean MAPE: 18.63%
Model 3 mean MAPE: 24.38%
Model 4 mean MAPE: 19.03%

Conclusion#

In contrast with the photo-voltaic power prediction, the eolien is a bit more consistent with the expected trend :

using regional data features is better than global wind values (even with the time trend added to the global value)
adding the time trend to the model improve the performances

The mean performance of model 4 (11% error) is quite good !

[19]:

data[["Eolien", "Solaire"]].mean()

[19]:

Eolien     121818.205577
Solaire     55192.725032
dtype: float64

As the production of the wind turbine is around 2 time higher than the Sun production, the performance of the wind energy prediction model is more important for the overall performance of the project.

[20]:

last_model_2.params.to_csv("wind_model_2_params.csv")

Table of Contents

Predicting the eolian energy production#

Modeling#

Conclusion#

This Page