ERA5 Analysis Process#

This Jupyter notebook provides a brief overview of how to use the geodata package to download ERA5 data from the Copernicus Data Store, create geographic-temporal subsets called cutouts, and use those cutouts to generate standalone datasets for separate analysis.

The following guide assumes you have installed and configured geodata and all required dependencies.

Step 1 - Setup#

Import the package first.

import geodata

Notifications in geodata are implemented using loggers from the logging library. It is recommended to always launch a logger to get information on what is going on. For debugging, you can use the more verbose level=logging.DEBUG:

import logging

logging.basicConfig(level=logging.INFO)

Step 2 - Download and Create Cutout#

Assuming you have previously created a CDS account and set up the CDS API credentials, you can download ERA5 data from the CDS API as follows.

First, define a dataset object for the data you wish to download:

## For ERA5, pass geographic bounds in array as follows:
## bounds = [North, West, South, East]
## Omitting bounds will default to global file of 20+ GB per month
DS = geodata.Dataset(
    module="era5",
    weather_data_config="wind_solar_hourly",
    years=slice(2005, 2005),
    months=slice(1, 2),
    bounds=[50, -3, 45, 3],
)
  • Use module to specify the data source. In this example, it is “era5”.

  • Use weather_data_config to specifiy the dataset. In this example, hourly data is used, as specified by the "wind_solar_hourly" value.

  • Use years=slice() and months=slice() to specify the years and months for download. In each parameter, the first value indicates the start period, and the second value the end period.

  • Use bounds to specify the geographic bounds to which you wish to limit your download data. bounds should be set as follows: bounds = [North, West, South, East]. Omitting bounds will default to downloading a global file of 20+ GB per month.

Use the code block below to begin the download.

When a dataset object is created, geodata performs a check to see if the data specified has already been downloaded by checking for the existence of ERA5 datafiles in the era5 directory configured in src/geodata/config.py (downloaded data is placed into subdirectories by year and then - for daily files - by month, ie 2011/01, 2011/02, 2012/01, etc). Monthly files are simply placed in the month’s folder. If downloaded data is found, the prepared attribute is set to True upon dataset object declaration.

Accordingly, the snippet below saves you the trouble of accidentally redownloading data if it is already present in the correct subdirectories.

if DS.prepared == False:
    DS.get_data()

Finally, in order to use the downloaded ERA5 data with geodata, run:

DS.trim_variables()

trim_variables() subsets and resaves the downloaded files so that only those variables needed to generate geodata outputs are kept.

Step 3 - Create Cutout#

A cutout is a subset of downloaded data based on specified time periods and geographic coordinates. Cutouts are saved to the cutout directory specified in src/geodata/config.py and can be used to generate multiple outputs.

cutout = geodata.Cutout(
    name="era5-europe-test-2011-02",
    module="era5",
    weather_data_config="wind_solar_hourly",
    xs=slice(1, 2),
    ys=slice(48, 46),
    years=slice(2005, 2005),
    months=slice(1, 1),
)

The above code creates a cutout for January 2011 for a geographic area corresponding to a portion of Europe. Walking through the parameters:

  • name will be the name of the directory created in the cutouts folder where geodata will place the data files corresponding to the cutout.

  • module indicates the source for the data from which the cutout is created.

  • Use xs=slice() and ys=slice() to define a geographical range for the cutout. These para

  • Use years=slice() and months=slice() to define a temporal range for the cutout.

geodata.Cutout() only defines the cutout object in memory. To actually create the cutout files, run prepare():

cutout.prepare()

Running cutout.prepare() as above will create the cutout by downloading and then subsetting the ERA5 data. Accordingly, the above code block could take a while to finish processing.

prepare() will first perform a check to see if a cutout has already been created at the specified directory, and will exit the download. creation process if a cutout already exists. To override this behavior and force a redownload and recalculation of the cutout, run prepare(overwrite=True).

To verify the results of the cutout, you can print some attributes to the console as follows.

Basic information:

cutout

Name:

cutout.name

Coordinates:

cutout.coords

All metadata:

cutout.meta

Step 4 - Generate Outputs#

geodata currently supports the following outputs using ERA5 data from the Copernicus Data Store.

Wind#

  • Wind generation time-series (wind)

  • Wind speed time-series (windspd)

Solar#

  • Solar photovoltaic generation time-series (pv)

Wind Generation Time-series#

Convert wind speeds for turbine to wind energy generation using the following code:

ds_wind = geodata.convert.wind(cutout, turbine="Suzlon_S82_1.5_MW", smooth=True)

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

  • smooth - bool or dict - If True smooth power curve with a gaussian kernel as determined for the Danish wind fleet to Delta_v = 1.27 and sigma = 2.29. A dict allows to tune these values.

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_wind

To convert this array to a more conventional dataframe, run:

df_wind = ds_wind.to_dataframe(name="wind")

which converts the xarray dataset into a pandas dataframe:

df_wind

To output the data to a csv for separate analysis:

df_wind.to_csv("era5_wind_data.csv")

Wind Speed Density Time-series#

Extract wind speeds at given height (ms-1)

ds_windspd = geodata.convert.windspd(cutout, turbine="Vestas_V66_1750kW")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • **params - Must have 1 of the following:

    • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

    • hub-height - num - Extrapolation height (m)

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_windspd

To convert this array to a more conventional dataframe, run:

df_windspd = ds_windspd.to_dataframe(name="windspd")

which converts the xarray dataset into a pandas dataframe:

df_windspd

To output the data to a csv for separate analysis:

df_windspd.to_csv("era_windspd_data.csv")

Solar Photovoltaic Generation Time-series#

Convert downward-shortwave, upward-shortwave radiation flux and ambient temperature into a pv generation time-series.

ds_pv = geodata.convert.pv(cutout, panel="KANEKA", orientation="latitude_optimal")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • panel - string - Specify a solar panel type on which to base the calculation. geodata contains an internal solar panel dictionary with keys defining several solar panel characteristics used for the time-series calculation. For a complete list of included panel types, see the list of panel types here.

  • orientation - str, dict or callback - Panel orientation can be chosen from either latitude_optimal, a constant orientation such as {'slope': 0.0,'azimuth': 0.0}, or a callback function with the same signature as the callbacks generated by the geodata.pv.orientation.make_* functions.

  • (optional) clearsky_model - string or None - Either the simple or the enhanced Reindl clearsky model. The default choice of None will choose dependending on data availability, since the enhanced model also incorporates ambient air temperature and relative humidity.

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_pv

To convert this array to a more conventional dataframe, run:

df_pv = ds_pv.to_dataframe(name="pv")

which converts the xarray dataset into a pandas dataframe:

df_pv

To output the data to a csv for separate analysis:

df_pv.to_csv("era_pv_data.csv")