MERRA2 Analysis Process#
This Jupyter notebook provides a brief overview of how to use the geodata package to download MERRA2 climate data, create geographic-temporal subsets called cutouts, and use those cutouts to generate standalone datasets for separate analysis.
The following guide assumes you have installed and configured geodata and all required dependencies.
Step 1 - Setup#
Import the package first.
import geodata
Notifications in geodata are implemented using loggers from the logging library.
It is recommended to always launch a logger to get information on what is going on. For debugging, you can use the more verbose level=logging.DEBUG:
import logging
logging.basicConfig(level=logging.INFO)
Step 2 - Download#
Assuming you have previously created an Earthdata Login profile and approved the GES DISC app, you can download MERRA2 data from the source as follows.
First, define a dataset object for the data you wish to download:
DS = geodata.Dataset(
module="merra2",
weather_data_config="surface_flux_monthly",
years=slice(2010, 2010),
months=slice(1, 7),
)
Use
moduleto specify the data source. In this example, it is “merra2”.Use
weather_data_configto specifiy the dataset. In this example, it is the MERRA2 monthly mean, single-level surface flux diagnosticsTo download the MERRA2 hourly, single-level surface flux diagnostics, specify
weather_data_config = "surface_flux_hourly".
Use
years=slice()andmonths=slice()to specify the years and months for download. In each parameter, the first value indicates the start period, and the second value the end period.
Use the code block below to begin the download.
When a dataset object is created, geodata performs a check to see if the data specified has already been downloaded by checking for the existence of MERRA2 datafiles in the merra2 directory configured in src/geodata/config.py (downloaded data is placed into subdirectories by year and then - for daily files - by month, ie 2011/01, 2011/02, 2012/01, etc). Monthly files are simply placed in the month’s folder. If downloaded data is found, the prepared attribute is set to True upon dataset object declaration.
Accordingly, the snippet below saves you the trouble of accidentally redownloading data if it is already present in the correct subdirectories.
if DS.prepared == False:
DS.get_data()
Finally, in order to use the downloaded MERRA2 data with geodata, run:
DS.trim_variables()
trim_variables() subsets and resaves the downloaded files so that only those variables needed to generate geodata outputs are kept.
Step 3 - Create Cutout#
A cutout is a subset of downloaded data based on specified time periods and geographic coordinates. Cutouts are saved to the cutout directory specified in src/geodata/config.py and can be used to generate multiple outputs.
*Note: 04/02/2020 - There is a known issue with MERRA2-based cutouts where running cutout.prepare(overwrite=True) on an existing cutout prevents the cutout from being used to generate outputs. A workaround is to manually delete the problem cutout and recreate it from scratch. A fix is planned pending investigation.
To create a cutout, run the following:
cutout = geodata.Cutout(
name="tokyo-2010-test",
module="merra2",
weather_data_config="surface_flux_monthly",
xs=slice(138.5, 139.5),
ys=slice(35, 36),
years=slice(2010, 2010),
months=slice(7, 7),
)
cutout.prepare()
The above code creates a cutout for July 2010 for a geographic area roughly corresponding to the Tokyo metropolitan area. Walking through the parameters:
namewill be the name of the directory created in the cutouts folder where geodata will place the data files corresponding to the cutout.moduleindicates the source for the data from which the cutout is created.weather_data_configindicates the specific dataset from the source. For MERRA2, the available options aresurface_flux_hourlyandsurface_flux_monthly.Use
xs=slice()andys=slice()to define a geographical range for the cutout.Use
years=slice()andmonths=slice()to define a temporal range for the cutout. Naturally, the indicated time range must be present within the source data.
geodata.Cutout() only defines the cutout object in memory. To actually create the cutout files, run prepare().
As with get_data(), prepare() will first perform a check to see if a cutout has already been created at the same specified, and will exit the creation process if a cutout already exists. To override this behavior and force a recalculation of the cutout, run prepare(overwrite=True).
To verify the results of the cutout, you can print some attributes to the console as follows.
Basic information:
cutout
Name:
cutout.name
Coordinates:
cutout.coords
All metadata:
cutout.meta
Information about the variable config used to download the data:
cutout.dataset_module.weather_data_config
For Merra2, you can confirm variables downloaded this way:
cutout.dataset_module.weather_data_config["surface_flux_monthly"]["variables"]
Step 4 - Generate Outputs#
geodata currently supports the following wind outputs using MERRA2 surface flux diagnostic data.
Wind generation time-series (
wind)Wind speed time-series (
windspd)Wind power density time-series (
windpwd)
Wind Generation Time-series#
Convert wind speeds for turbine to wind energy generation using the following code:
ds_wind = geodata.convert.wind(cutout, turbine="Suzlon_S82_1.5_MW", smooth=True, var_height="lml")
Going over the parameters:
cutout- string - A cutout created bygeodata.Cutout()turbine- string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.smooth- bool or dict - If True smooth power curve with a gaussian kernel as determined for the Danish wind fleet to Delta_v = 1.27 and sigma = 2.29. A dict allows to tune these values.
Note -
You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').
The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.
ds_wind
To convert this array to a more conventional dataframe, run:
df_wind = ds_wind.to_dataframe(name="wind")
which converts the xarray dataset into a pandas dataframe:
df_wind
To output the data to a csv for separate analysis:
df_wind.to_csv("merra2_wind_data.csv")
Extract wind speeds at given height (ms-1)
ds_windspd = geodata.convert.windspd(cutout, turbine="Vestas_V66_1750kW", var_height="lml")
Going over the parameters:
cutout- string - A cutout created bygeodata.Cutout()**params- Must have 1 of the following:turbine- string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.hub-height- num - Extrapolation height (m)
Note -
You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').
The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.
ds_windspd
To convert this array to a more conventional dataframe, run:
df_windspd = ds_windspd.to_dataframe(name="windspd")
which converts the xarray dataset into a pandas dataframe:
df_windspd
To output the data to a csv for separate analysis:
df_windspd.to_csv("merra2_windspd_data.csv")
Wind Power Density Time-series#
Extract wind power density at given height, according to: WPD = 0.5 * Density * Windspd^3
ds_windwpd = geodata.convert.windwpd(cutout, turbine="Vestas_V66_1750kW", var_height="lml")
Going over the parameters:
cutout- string - A cutout created bygeodata.Cutout()**params- Must have 1 of the following:turbine- string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.hub-height- num - Extrapolation height (m)
Note -
You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').
The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.
ds_windwpd
To convert this array to a more conventional dataframe, run:
df_windwpd = ds_windwpd.to_dataframe(name="windwpd")
which converts the xarray dataset into a pandas dataframe:
df_windwpd
To output the data to a csv for separate analysis:
df_windwpd.to_csv("merra2_windwpd_data.csv")