MERRA2 Analysis Process#

This Jupyter notebook provides a brief overview of how to use the geodata package to download MERRA2 climate data, create geographic-temporal subsets called cutouts, and use those cutouts to generate standalone datasets for separate analysis.

The following guide assumes you have installed and configured geodata and all required dependencies.

Step 1 - Setup#

Import the package first.

import geodata

Notifications in geodata are implemented using loggers from the logging library. It is recommended to always launch a logger to get information on what is going on. For debugging, you can use the more verbose level=logging.DEBUG:

import logging

logging.basicConfig(level=logging.INFO)

Step 2 - Download#

Assuming you have previously created an Earthdata Login profile and approved the GES DISC app, you can download MERRA2 data from the source as follows.

First, define a dataset object for the data you wish to download:

DS = geodata.Dataset(
    module="merra2",
    weather_data_config="surface_flux_monthly",
    years=slice(2010, 2010),
    months=slice(1, 7),
)

Use the code block below to begin the download.

When a dataset object is created, geodata performs a check to see if the data specified has already been downloaded by checking for the existence of MERRA2 datafiles in the merra2 directory configured in src/geodata/config.py (downloaded data is placed into subdirectories by year and then - for daily files - by month, ie 2011/01, 2011/02, 2012/01, etc). Monthly files are simply placed in the month’s folder. If downloaded data is found, the prepared attribute is set to True upon dataset object declaration.

Accordingly, the snippet below saves you the trouble of accidentally redownloading data if it is already present in the correct subdirectories.

if DS.prepared == False:
    DS.get_data()

Finally, in order to use the downloaded MERRA2 data with geodata, run:

DS.trim_variables()

trim_variables() subsets and resaves the downloaded files so that only those variables needed to generate geodata outputs are kept.

Step 3 - Create Cutout#

A cutout is a subset of downloaded data based on specified time periods and geographic coordinates. Cutouts are saved to the cutout directory specified in src/geodata/config.py and can be used to generate multiple outputs.

*Note: 04/02/2020 - There is a known issue with MERRA2-based cutouts where running cutout.prepare(overwrite=True) on an existing cutout prevents the cutout from being used to generate outputs. A workaround is to manually delete the problem cutout and recreate it from scratch. A fix is planned pending investigation.

To create a cutout, run the following:

cutout = geodata.Cutout(
    name="tokyo-2010-test",
    module="merra2",
    weather_data_config="surface_flux_monthly",
    xs=slice(138.5, 139.5),
    ys=slice(35, 36),
    years=slice(2010, 2010),
    months=slice(7, 7),
)
cutout.prepare()

The above code creates a cutout for July 2010 for a geographic area roughly corresponding to the Tokyo metropolitan area. Walking through the parameters:

  • name will be the name of the directory created in the cutouts folder where geodata will place the data files corresponding to the cutout.

  • module indicates the source for the data from which the cutout is created.

  • weather_data_config indicates the specific dataset from the source. For MERRA2, the available options are surface_flux_hourly and surface_flux_monthly.

  • Use xs=slice() and ys=slice() to define a geographical range for the cutout.

  • Use years=slice() and months=slice() to define a temporal range for the cutout. Naturally, the indicated time range must be present within the source data.

geodata.Cutout() only defines the cutout object in memory. To actually create the cutout files, run prepare().
As with get_data(), prepare() will first perform a check to see if a cutout has already been created at the same specified, and will exit the creation process if a cutout already exists. To override this behavior and force a recalculation of the cutout, run prepare(overwrite=True).

To verify the results of the cutout, you can print some attributes to the console as follows.

Basic information:

cutout

Name:

cutout.name

Coordinates:

cutout.coords

All metadata:

cutout.meta

Information about the variable config used to download the data:

cutout.dataset_module.weather_data_config

For Merra2, you can confirm variables downloaded this way:

cutout.dataset_module.weather_data_config["surface_flux_monthly"]["variables"]

Step 4 - Generate Outputs#

geodata currently supports the following wind outputs using MERRA2 surface flux diagnostic data.

  • Wind generation time-series (wind)

  • Wind speed time-series (windspd)

  • Wind power density time-series (windpwd)

Wind Generation Time-series#

Convert wind speeds for turbine to wind energy generation using the following code:

ds_wind = geodata.convert.wind(cutout, turbine="Suzlon_S82_1.5_MW", smooth=True, var_height="lml")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

  • smooth - bool or dict - If True smooth power curve with a gaussian kernel as determined for the Danish wind fleet to Delta_v = 1.27 and sigma = 2.29. A dict allows to tune these values.

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_wind

To convert this array to a more conventional dataframe, run:

df_wind = ds_wind.to_dataframe(name="wind")

which converts the xarray dataset into a pandas dataframe:

df_wind

To output the data to a csv for separate analysis:

df_wind.to_csv("merra2_wind_data.csv")

Extract wind speeds at given height (ms-1)

ds_windspd = geodata.convert.windspd(cutout, turbine="Vestas_V66_1750kW", var_height="lml")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • **params - Must have 1 of the following:

    • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

    • hub-height - num - Extrapolation height (m)

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_windspd

To convert this array to a more conventional dataframe, run:

df_windspd = ds_windspd.to_dataframe(name="windspd")

which converts the xarray dataset into a pandas dataframe:

df_windspd

To output the data to a csv for separate analysis:

df_windspd.to_csv("merra2_windspd_data.csv")

Wind Power Density Time-series#

Extract wind power density at given height, according to: WPD = 0.5 * Density * Windspd^3

ds_windwpd = geodata.convert.windwpd(cutout, turbine="Vestas_V66_1750kW", var_height="lml")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • **params - Must have 1 of the following:

    • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

    • hub-height - num - Extrapolation height (m)

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_windwpd

To convert this array to a more conventional dataframe, run:

df_windwpd = ds_windwpd.to_dataframe(name="windwpd")

which converts the xarray dataset into a pandas dataframe:

df_windwpd

To output the data to a csv for separate analysis:

df_windwpd.to_csv("merra2_windwpd_data.csv")