MERRA2 Analysis Process#

This Jupyter notebook provides a brief overview of how to use the geodata package to download MERRA2 climate data, create geographic-temporal subsets called cutouts, and use those cutouts to generate standalone datasets for separate analysis.

The following guide assumes you have installed and configured geodata and all required dependencies.

Step 1 - Setup#

Import the package first.

import geodata

Notifications in geodata are implemented using loggers from the logging library. It is recommended to always launch a logger to get information on what is going on. For debugging, you can use the more verbose level=logging.DEBUG:

import logging

logging.basicConfig(level=logging.INFO)

Step 2 - Download#

Assuming you have previously created an Earthdata Login profile and approved the GES DISC app, you can download MERRA2 data from the source as follows.

First, define a dataset object for the data you wish to download:

DS = geodata.Dataset(
    module="merra2",
    weather_data_config="surface_flux_monthly",
    years=slice(2010, 2010),
    months=slice(1, 7),
)
2024-11-06 15:34:49,730 - geodata.dataset - INFO - Bounds was not specified, default to global bounds.
2024-11-06 15:34:49,732 - geodata.dataset - INFO - Directory /Users/geodata/.local/geodata/merra2 found, checking for completeness.
2024-11-06 15:34:49,733 - geodata.dataset - INFO - File `/Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201001.nc4` not found!
2024-11-06 15:34:49,733 - geodata.dataset - INFO - File `/Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201002.nc4` not found!
2024-11-06 15:34:49,734 - geodata.dataset - INFO - File `/Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201003.nc4` not found!
2024-11-06 15:34:49,735 - geodata.dataset - INFO - File `/Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201004.nc4` not found!
2024-11-06 15:34:49,735 - geodata.dataset - INFO - File `/Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201005.nc4` not found!
2024-11-06 15:34:49,736 - geodata.dataset - INFO - File `/Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201006.nc4` not found!
2024-11-06 15:34:49,736 - geodata.dataset - INFO - File `/Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201007.nc4` not found!
2024-11-06 15:34:49,737 - geodata.dataset - WARNING - Arguments `xs` and `ys` not used in preparing dataset. Defaulting to global.
2024-11-06 15:34:49,738 - geodata.dataset - INFO - 7 files not completed.

Use the code block below to begin the download.

When a dataset object is created, geodata performs a check to see if the data specified has already been downloaded by checking for the existence of MERRA2 datafiles in the merra2 directory configured in src/geodata/config.py (downloaded data is placed into subdirectories by year and then - for daily files - by month, ie 2011/01, 2011/02, 2012/01, etc). Monthly files are simply placed in the month’s folder. If downloaded data is found, the prepared attribute is set to True upon dataset object declaration.

Accordingly, the snippet below saves you the trouble of accidentally redownloading data if it is already present in the correct subdirectories.

if not DS.prepared:
    DS.get_data()
2024-11-06 15:34:53,326 - geodata - INFO - Preparing API calls for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201001.nc4
2024-11-06 15:34:53,328 - geodata - INFO - Making request to https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/2010/MERRA2_300.tavgM_2d_flx_Nx.201001.nc4
2024-11-06 15:35:18,885 - geodata - INFO - Successfully downloaded data for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201001.nc4
2024-11-06 15:35:18,888 - geodata - INFO - Preparing API calls for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201002.nc4
2024-11-06 15:35:18,889 - geodata - INFO - Making request to https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/2010/MERRA2_300.tavgM_2d_flx_Nx.201002.nc4
2024-11-06 15:35:58,417 - geodata - INFO - Successfully downloaded data for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201002.nc4
2024-11-06 15:35:58,419 - geodata - INFO - Preparing API calls for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201003.nc4
2024-11-06 15:35:58,421 - geodata - INFO - Making request to https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/2010/MERRA2_300.tavgM_2d_flx_Nx.201003.nc4
2024-11-06 15:36:35,820 - geodata - INFO - Successfully downloaded data for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201003.nc4
2024-11-06 15:36:35,822 - geodata - INFO - Preparing API calls for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201004.nc4
2024-11-06 15:36:35,823 - geodata - INFO - Making request to https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/2010/MERRA2_300.tavgM_2d_flx_Nx.201004.nc4
2024-11-06 15:37:13,549 - geodata - INFO - Successfully downloaded data for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201004.nc4
2024-11-06 15:37:13,552 - geodata - INFO - Preparing API calls for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201005.nc4
2024-11-06 15:37:13,553 - geodata - INFO - Making request to https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/2010/MERRA2_300.tavgM_2d_flx_Nx.201005.nc4
2024-11-06 15:37:58,141 - geodata - INFO - Successfully downloaded data for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201005.nc4
2024-11-06 15:37:58,142 - geodata - INFO - Preparing API calls for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201006.nc4
2024-11-06 15:37:58,144 - geodata - INFO - Making request to https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/2010/MERRA2_300.tavgM_2d_flx_Nx.201006.nc4
2024-11-06 15:38:38,069 - geodata - INFO - Successfully downloaded data for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201006.nc4
2024-11-06 15:38:38,071 - geodata - INFO - Preparing API calls for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201007.nc4
2024-11-06 15:38:38,072 - geodata - INFO - Making request to https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/2010/MERRA2_300.tavgM_2d_flx_Nx.201007.nc4
2024-11-06 15:39:17,346 - geodata - INFO - Successfully downloaded data for /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201007.nc4
100%|██████████| 7/7 [04:24<00:00, 37.72s/it]

Finally, in order to use the downloaded MERRA2 data with geodata, run:

DS.trim_variables()

trim_variables() subsets and resaves the downloaded files so that only those variables needed to generate geodata outputs are kept.

Step 3 - Create Cutout#

A cutout is a subset of downloaded data based on specified time periods and geographic coordinates. Cutouts are saved to the cutout directory the GEODATA_ROOT directory specified in src/geodata/config.py and can be used to generate multiple outputs.

*Note: 04/02/2020 - There is a known issue with MERRA2-based cutouts where running cutout.prepare(overwrite=True) on an existing cutout prevents the cutout from being used to generate outputs. A workaround is to manually delete the problem cutout and recreate it from scratch. A fix is planned pending investigation.

To create a cutout, run the following:

cutout = geodata.Cutout(
    name="tokyo-2010-test",
    module="merra2",
    weather_data_config="surface_flux_monthly",
    xs=slice(138.5, 139.5),
    ys=slice(35, 36),
    years=slice(2010, 2010),
    months=slice(7, 7),
)
cutout.prepare()
2024-11-06 15:41:50,342 - geodata.cutout - INFO - Cutout (tokyo-2010-test, /Users/geodata/.local/geodata/cutouts) not found or incomplete.
2024-11-06 15:41:50,457 - geodata.preparation - INFO - Starting preparation of cutout 'tokyo-2010-test'
2024-11-06 15:41:50,457 - geodata - INFO - MultiIndex([(2010, 7)],
           names=['year', 'month'])
2024-11-06 15:41:50,458 - geodata - INFO - [(2010, 7)]
2024-11-06 15:41:52,063 - geodata - INFO - Opening /Users/geodata/.local/geodata/merra2/2010/MERRA2_300.tavgM_2d_flx_Nx.201007.nc4
2024-11-06 15:41:52,086 - geodata.preparation - INFO - Merging variables into monthly compound files
2024-11-06 15:41:52,087 - geodata.preparation - INFO - Cutout 'tokyo-2010-test' has been successfully prepared

The above code creates a cutout for July 2010 for a geographic area roughly corresponding to the Tokyo metropolitan area. Walking through the parameters:

  • name will be the name of the directory created in the cutouts folder where geodata will place the data files corresponding to the cutout.

  • module indicates the source for the data from which the cutout is created.

  • weather_data_config indicates the specific dataset from the source. For MERRA2, the available options are surface_flux_hourly and surface_flux_monthly.

  • Use xs=slice() and ys=slice() to define a geographical range for the cutout.

  • Use years=slice() and months=slice() to define a temporal range for the cutout. Naturally, the indicated time range must be present within the source data.

geodata.Cutout() only defines the cutout object in memory. To actually create the cutout files, run prepare().
As with get_data(), prepare() will first perform a check to see if a cutout has already been created at the same specified, and will exit the creation process if a cutout already exists. To override this behavior and force a recalculation of the cutout, run prepare(overwrite=True).

To verify the results of the cutout, you can print some attributes to the console as follows.

Basic information:

cutout
<Cutout tokyo-2010-test x=138.75-139.38 y=35.00-36.00 time=2010/7-2010/7 prepared>

Name:

cutout.name
'tokyo-2010-test'

Coordinates:

cutout.coords
Coordinates:
  * x           (x) float64 16B 138.8 139.4
  * y           (y) float64 24B 35.0 35.5 36.0
    lon         (x) float64 16B 138.8 139.4
    lat         (y) float64 24B 35.0 35.5 36.0
  * time        (time) datetime64[ns] 8B 2010-07-01
  * year-month  (year-month) object 8B MultiIndex
  * year        (year-month) int64 8B 2010
  * month       (year-month) int64 8B 7

All metadata:

cutout.meta
<xarray.Dataset> Size: 112B
Dimensions:     (x: 2, y: 3, time: 1, year-month: 1)
Coordinates:
  * x           (x) float64 16B 138.8 139.4
  * y           (y) float64 24B 35.0 35.5 36.0
    lon         (x) float64 16B 138.8 139.4
    lat         (y) float64 24B 35.0 35.5 36.0
  * time        (time) datetime64[ns] 8B 2010-07-01
  * year-month  (year-month) object 8B MultiIndex
  * year        (year-month) int64 8B 2010
  * month       (year-month) int64 8B 7
Data variables:
    *empty*
Attributes: (12/31)
    History:                           Original file generated: Fri Jul  3 01...
    Filename:                          MERRA2_300.tavgM_2d_flx_Nx.201001.nc4
    Comment:                           GMAO filename: d5124_m2_jan00.tavg1_2d...
    Conventions:                       CF-1
    Institution:                       NASA Global Modeling and Assimilation ...
    References:                        http://gmao.gsfc.nasa.gov
    ...                                ...
    Source:                            CVS tag: GEOSadas-5_12_4
    Contact:                           http://gmao.gsfc.nasa.gov
    identifier_product_doi:            10.5067/0JRLVL8YV2Y4
    RangeBeginningTime:                00:00:00.000000
    RangeEndingTime:                   23:59:59.000000
    module:                            merra2

Information about the variable config used to download the data:

cutout.dataset_module.weather_data_config
{'surface_flux_hourly': {'api_func': <function geodata.datasets.merra2.api_merra2(toDownload, fileGranularity, downloadedFiles)>,
  'file_granularity': 'daily',
  'tasks_func': <function geodata.datasets.merra2.tasks_daily_merra2(xs, ys, yearmonths, prepare_func, **meta_attrs)>,
  'meta_prepare_func': <function geodata.datasets.merra2.prepare_meta_merra2(xs, ys, year, month, template, module, **params)>,
  'prepare_func': <function geodata.datasets.merra2.prepare_month_surface_flux(fn, year, month, xs, ys)>,
  'template': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_*.tavg1_2d_flx_Nx.*.nc4',
  'url': 'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2T1NXFLX.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_flx_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'url_opendap': 'https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXFLX.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_flx_Nx.{year}{month:0>2}{day:0>2}.nc4.nc4',
  'fn': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_flx_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'variables': ['ustar',
   'z0m',
   'disph',
   'rhoa',
   'ulml',
   'vlml',
   'tstar',
   'hlml',
   'tlml',
   'pblh',
   'hflux',
   'eflux']},
 'slv_flux_hourly': {'api_func': <function geodata.datasets.merra2.api_merra2(toDownload, fileGranularity, downloadedFiles)>,
  'file_granularity': 'daily_multiple',
  'tasks_func': <function geodata.datasets.merra2.tasks_daily_merra2(xs, ys, yearmonths, prepare_func, **meta_attrs)>,
  'meta_prepare_func': <function geodata.datasets.merra2.prepare_meta_merra2(xs, ys, year, month, template, module, **params)>,
  'prepare_func': <function geodata.datasets.merra2.prepare_month_surface_flux(fn, year, month, xs, ys)>,
  'template': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_*.tavg1_2d_slv_flx_Nx.*.nc4',
  'url': ['https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2T1NXFLX.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_flx_Nx.{year}{month:0>2}{day:0>2}.nc4',
   'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2T1NXSLV.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_slv_Nx.{year}{month:0>2}{day:0>2}.nc4'],
  'url_opendap': ['https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXFLX.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_flx_Nx.{year}{month:0>2}{day:0>2}.nc4',
   'https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_slv_Nx.{year}{month:0>2}{day:0>2}.nc4'],
  'fn': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_slv_flx_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'variables': ['ustar',
   'z0m',
   'disph',
   'rhoa',
   'ulml',
   'vlml',
   'tstar',
   'hlml',
   'tlml',
   'pblh',
   'hflux',
   'eflux',
   'u2m',
   'v2m',
   'u10m',
   'v10m',
   'u50m',
   'v50m'],
  'variables_list': [['ustar',
    'z0m',
    'disph',
    'rhoa',
    'ulml',
    'vlml',
    'tstar',
    'hlml',
    'tlml',
    'pblh',
    'hflux',
    'eflux'],
   ['u2m', 'v2m', 'u10m', 'v10m', 'u50m', 'v50m']]},
 'surface_flux_monthly': {'api_func': <function geodata.datasets.merra2.api_merra2(toDownload, fileGranularity, downloadedFiles)>,
  'file_granularity': 'monthly',
  'tasks_func': <function geodata.datasets.merra2.tasks_monthly_merra2(xs, ys, yearmonths, prepare_func, **meta_attrs)>,
  'meta_prepare_func': <function geodata.datasets.merra2.prepare_meta_merra2(xs, ys, year, month, template, module, **params)>,
  'prepare_func': <function geodata.datasets.merra2.prepare_month_surface_flux(fn, year, month, xs, ys)>,
  'template': '/Users/geodata/.local/geodata/merra2/{year}/MERRA2_*.tavgM_2d_flx_Nx.*.nc4',
  'url': 'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXFLX.5.12.4/{year}/MERRA2_{spinup}.tavgM_2d_flx_Nx.{year}{month:0>2}.nc4',
  'fn': '/Users/geodata/.local/geodata/merra2/{year}/MERRA2_{spinup}.tavgM_2d_flx_Nx.{year}{month:0>2}.nc4',
  'variables': ['ustar',
   'z0m',
   'disph',
   'rhoa',
   'ulml',
   'vlml',
   'tstar',
   'hlml',
   'tlml',
   'pblh',
   'hflux',
   'eflux'],
  'meta_attrs': {'History': 'Original file generated: Fri Jul  3 01:41:08 2015 GMT',
   'Filename': 'MERRA2_300.tavgM_2d_flx_Nx.201001.nc4',
   'Comment': 'GMAO filename: d5124_m2_jan00.tavg1_2d_flx_Nx.monthly.201001.nc4',
   'Conventions': 'CF-1',
   'Institution': 'NASA Global Modeling and Assimilation Office',
   'References': 'http://gmao.gsfc.nasa.gov',
   'Format': 'NetCDF-4/HDF-5',
   'SpatialCoverage': 'global',
   'VersionID': '5.12.4',
   'TemporalRange': '1980-01-01 -> 2016-12-31',
   'identifier_product_doi_authority': 'http://dx.doi.org/',
   'ShortName': 'M2TMNXFLX',
   'RangeBeginningDate': '2010-01-01',
   'RangeEndingDate': '2010-01-31',
   'GranuleID': 'MERRA2_300.tavgM_2d_flx_Nx.201001.nc4',
   'ProductionDateTime': 'Original file generated: Fri Jul  3 01:41:08 2015 GMT',
   'LongName': 'MERRA2 tavg1_2d_flx_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Surface Flux Diagnostics Monthly Mean',
   'Title': 'MERRA2 tavg1_2d_flx_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Surface Flux Diagnostics Monthly Mean',
   'SouthernmostLatitude': '-90.0',
   'NorthernmostLatitude': '90.0',
   'WesternmostLongitude': '-180.0',
   'EasternmostLongitude': '179.375',
   'LatitudeResolution': '0.5',
   'LongitudeResolution': '0.625',
   'DataResolution': '0.5 x 0.625',
   'Source': 'CVS tag: GEOSadas-5_12_4',
   'Contact': 'http://gmao.gsfc.nasa.gov',
   'identifier_product_doi': '10.5067/0JRLVL8YV2Y4',
   'RangeBeginningTime': '00:00:00.000000',
   'RangeEndingTime': '23:59:59.000000',
   'module': 'merra2'}},
 'surface_flux_dailymeans': {'api_func': <function geodata.datasets.merra2.api_merra2(toDownload, fileGranularity, downloadedFiles)>,
  'file_granularity': 'dailymeans',
  'tasks_func': <function geodata.datasets.merra2.tasks_daily_merra2(xs, ys, yearmonths, prepare_func, **meta_attrs)>,
  'meta_prepare_func': <function geodata.datasets.merra2.prepare_meta_merra2(xs, ys, year, month, template, module, **params)>,
  'prepare_func': <function geodata.datasets.merra2.prepare_dailymeans_surface_flux(fn, year, month, xs, ys)>,
  'template': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_*.statD_2d_slv_Nx.*.nc4',
  'url': 'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2SDNXSLV.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.statD_2d_slv_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'fn': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_{spinup}.statD_2d_slv_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'variables': ['hournorain', 'tprecmax', 't2mmax', 't2mmean', 't2mmin']},
 'slv_radiation_hourly': {'api_func': <function geodata.datasets.merra2.api_merra2(toDownload, fileGranularity, downloadedFiles)>,
  'file_granularity': 'daily_multiple',
  'tasks_func': <function geodata.datasets.merra2.tasks_daily_merra2(xs, ys, yearmonths, prepare_func, **meta_attrs)>,
  'meta_prepare_func': <function geodata.datasets.merra2.prepare_meta_merra2(xs, ys, year, month, template, module, **params)>,
  'prepare_func': <function geodata.datasets.merra2.prepare_slv_radiation(fn, year, month, xs, ys)>,
  'template': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_*.tavg1_2d_slv_rad_Nx.*.nc4',
  'url': ['https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2T1NXSLV.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_slv_Nx.{year}{month:0>2}{day:0>2}.nc4',
   'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2T1NXRAD.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_rad_Nx.{year}{month:0>2}{day:0>2}.nc4'],
  'url_opendap': ['https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXSLV.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_slv_Nx.{year}{month:0>2}{day:0>2}.nc4.nc4',
   'https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/M2T1NXRAD.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_rad_Nx.{year}{month:0>2}{day:0>2}.nc4.nc4'],
  'fn': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_slv_rad_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'variables': ['albedo', 'swgdn', 'swtdn', 't2m'],
  'variables_list': [['t2m'], ['albedo', 'swgdn', 'swtdn']]},
 'slv_radiation_monthly': {'api_func': <function geodata.datasets.merra2.api_merra2(toDownload, fileGranularity, downloadedFiles)>,
  'file_granularity': 'monthly_multiple',
  'tasks_func': <function geodata.datasets.merra2.tasks_monthly_merra2(xs, ys, yearmonths, prepare_func, **meta_attrs)>,
  'meta_prepare_func': <function geodata.datasets.merra2.prepare_meta_merra2(xs, ys, year, month, template, module, **params)>,
  'prepare_func': <function geodata.datasets.merra2.prepare_slv_radiation(fn, year, month, xs, ys)>,
  'template': '/Users/geodata/.local/geodata/merra2/{year}/MERRA2_*.tavgM_2d_slv_rad_Nx.*.nc4',
  'url': ['https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/{year}/MERRA2_{spinup}.tavgM_2d_slv_Nx.{year}{month:0>2}.nc4',
   'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2_MONTHLY/M2TMNXRAD.5.12.4/{year}/MERRA2_{spinup}.tavgM_2d_rad_Nx.{year}{month:0>2}.nc4'],
  'fn': '/Users/geodata/.local/geodata/merra2/{year}/MERRA2_{spinup}.tavgM_2d_slv_rad_Nx.{year}{month:0>2}.nc4',
  'variables': ['albedo', 'swgdn', 'swtdn', 't2m']},
 'surface_aerosol_hourly': {'api_func': <function geodata.datasets.merra2.api_merra2(toDownload, fileGranularity, downloadedFiles)>,
  'file_granularity': 'daily',
  'tasks_func': <function geodata.datasets.merra2.tasks_daily_merra2(xs, ys, yearmonths, prepare_func, **meta_attrs)>,
  'meta_prepare_func': <function geodata.datasets.merra2.prepare_meta_merra2(xs, ys, year, month, template, module, **params)>,
  'prepare_func': <function geodata.datasets.merra2.prepare_month_aerosol(fn, year, month, xs, ys)>,
  'template': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_*.tavg1_2d_aer_Nx.*.nc4',
  'url': 'https://goldsmr4.gesdisc.eosdis.nasa.gov/data/MERRA2/M2T1NXAER.5.12.4/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_aer_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'fn': '/Users/geodata/.local/geodata/merra2/{year}/{month:0>2}/MERRA2_{spinup}.tavg1_2d_aer_Nx.{year}{month:0>2}{day:0>2}.nc4',
  'variables': ['bcsmass', 'dusmass25', 'ocsmass', 'so4smass', 'sssmass25']}}

For Merra2, you can confirm variables downloaded this way:

cutout.dataset_module.weather_data_config["surface_flux_monthly"]["variables"]
['ustar',
 'z0m',
 'disph',
 'rhoa',
 'ulml',
 'vlml',
 'tstar',
 'hlml',
 'tlml',
 'pblh',
 'hflux',
 'eflux']

Step 4 - Generate Outputs#

geodata currently supports the following wind outputs using MERRA2 surface flux diagnostic data.

  • Wind generation time-series (wind)

  • Wind speed time-series (windspd)

  • Wind power density time-series (windpwd)

Wind Generation Time-series#

Convert wind speeds for turbine to wind energy generation using the following code:

ds_wind = cutout.wind(turbine="Suzlon_S82_1.5_MW", smooth=True, var_height="lml")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

  • smooth - bool or dict - If True smooth power curve with a gaussian kernel as determined for the Danish wind fleet to Delta_v = 1.27 and sigma = 2.29. A dict allows to tune these values.

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_wind
<xarray.DataArray (time: 1, y: 3, x: 2)> Size: 48B
array([[[0.08232222, 0.20032592],
        [0.01580369, 0.07699983],
        [0.00229215, 0.01120574]]])
Coordinates:
  * x        (x) float64 16B 138.8 139.4
  * y        (y) float64 24B 35.0 35.5 36.0
  * time     (time) datetime64[ns] 8B 2010-07-01T00:30:00
    lon      (x) float64 16B 138.8 139.4
    lat      (y) float64 24B 35.0 35.5 36.0

To convert this array to a more conventional dataframe, run:

df_wind = ds_wind.to_dataframe(name="wind")

which converts the xarray dataset into a pandas dataframe:

df_wind
lon lat wind
time y x
2010-07-01 00:30:00 35.0 138.750 138.750 35.0 0.082322
139.375 139.375 35.0 0.200326
35.5 138.750 138.750 35.5 0.015804
139.375 139.375 35.5 0.077000
36.0 138.750 138.750 36.0 0.002292
139.375 139.375 36.0 0.011206

To output the data to a csv for separate analysis:

df_wind.to_csv("merra2_wind_data.csv")

Extract wind speeds at given height (ms-1)

ds_windspd = cutout.windspd(turbine="Vestas_V66_1750kW", var_height="lml")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • **params - Must have 1 of the following:

    • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

    • hub-height - num - Extrapolation height (m)

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_windspd
<xarray.DataArray 'wnd67m' (time: 1, y: 3, x: 2)> Size: 24B
array([[[4.89272  , 6.517427 ],
        [2.7624094, 4.864064 ],
        [0.9258263, 2.434551 ]]], dtype=float32)
Coordinates:
  * x        (x) float64 16B 138.8 139.4
  * y        (y) float64 24B 35.0 35.5 36.0
  * time     (time) datetime64[ns] 8B 2010-07-01T00:30:00
    lon      (x) float64 16B 138.8 139.4
    lat      (y) float64 24B 35.0 35.5 36.0
Attributes:
    long_name:  extrapolated 67 m wind speed using log ratio, from variable h...
    units:      m s**-1

To convert this array to a more conventional dataframe, run:

df_windspd = ds_windspd.to_dataframe(name="windspd")

which converts the xarray dataset into a pandas dataframe:

df_windspd
lon lat windspd
time y x
2010-07-01 00:30:00 35.0 138.750 138.750 35.0 4.892720
139.375 139.375 35.0 6.517427
35.5 138.750 138.750 35.5 2.762409
139.375 139.375 35.5 4.864064
36.0 138.750 138.750 36.0 0.925826
139.375 139.375 36.0 2.434551

To output the data to a csv for separate analysis:

df_windspd.to_csv("merra2_windspd_data.csv")

Wind Power Density Time-series#

Extract wind power density at given height, according to: WPD = 0.5 * Density * Windspd^3

ds_windwpd = cutout.windwpd(turbine="Vestas_V66_1750kW", var_height="lml")

Going over the parameters:

  • cutout - string - A cutout created by geodata.Cutout()

  • **params - Must have 1 of the following:

    • turbine - string or dict - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys ‘hub_height’ for the hub height and ‘V’, ‘POW’ defining the power curve. For a full list of currently supported turbines, see the list of Turbines here.

    • hub-height - num - Extrapolation height (m)

Note - You can also specify all of the general conversion arguments documented in the convert_and_aggregate function (e.g. var_height='lml').

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

ds_windwpd
<xarray.DataArray (time: 1, y: 3, x: 2)> Size: 24B
array([[[ 66.65265   , 160.12375   ],
        [ 11.40101   ,  65.33937   ],
        [  0.42476463,   8.158394  ]]], dtype=float32)
Coordinates:
  * x        (x) float64 16B 138.8 139.4
  * y        (y) float64 24B 35.0 35.5 36.0
  * time     (time) datetime64[ns] 8B 2010-07-01T00:30:00
    lon      (x) float64 16B 138.8 139.4
    lat      (y) float64 24B 35.0 35.5 36.0

To convert this array to a more conventional dataframe, run:

df_windwpd = ds_windwpd.to_dataframe(name="windwpd")

which converts the xarray dataset into a pandas dataframe:

df_windwpd
lon lat windwpd
time y x
2010-07-01 00:30:00 35.0 138.750 138.750 35.0 66.652649
139.375 139.375 35.0 160.123749
35.5 138.750 138.750 35.5 11.401010
139.375 139.375 35.5 65.339371
36.0 138.750 138.750 36.0 0.424765
139.375 139.375 36.0 8.158394

To output the data to a csv for separate analysis:

df_windwpd.to_csv("merra2_windwpd_data.csv")