Tutorial: Incorporating Mask into Cutout Workflow#
Introduction#
Geodata is able to process geospatial data to extract cutouts over specified geographies. Built off the rasterio library, the mask module imports rasters and shapefiles, merges and flattens multiple layers together, and extracts subsetted cutout data from merged masks and shapefiles.
After we create a mask, we can incorporate the suitability mask object/file into the Cutout. The cutouts are subsets of data based on specific time and geographic ranges. For more information on the creation of cutout, refer to these tutorials: Creating Cutouts with MERRA2 Data, Downloading and Creating Cutouts with ERA5 Data
Setup#
To start, import the geodata package and required libraries.
import matplotlib.pyplot as plt
import xarray as xr
import geodata
Download Data#
We will use a Cutout object created from a downloaded dataset. If you have already created a cutout, load it here and skip to step 3.
We first download the dataset through geodata.Dataset(). In get_data(), if we specify testing=True, the program downloads only first file in download list (e.g., first day of month)
dataset_test = geodata.Dataset(
module="merra2",
years=slice(2011, 2011),
months=slice(1, 1),
weather_data_config="slv_radiation_hourly",
)
if not dataset_test.prepared:
dataset_test.get_data(testing=True)
Extract the cutout from the trimmed dataset.
cutout = geodata.Cutout(
name="china-2011-slv-hourly-test",
module="merra2",
weather_data_config="slv_radiation_hourly",
xs=slice(73, 136),
ys=slice(18, 54),
years=slice(2011, 2011),
months=slice(1, 1),
)
cutout.prepare()
Load Mask#
In this tutorial, we use the china mask, created in this documentation: mask_creation_workflow
# View the contents of the china mask
geodata.mask.load_mask("china")
Adding Mask Variables to a Cutout#
Adding Masking Variables#
The add_mask method will add attribute merged_mask and shape_mask from the Mask object to the Cutout object. Once the mask is added to the Cutout object, the merged_mask or shape_mask from the Mask object will be stored in the format of xarray.DataArray in the Cutout object, and their dimensions will be coarsened to the same dimension with the Cutout metadata.
The add_mask method will look for both merged_mask and shape_mask attribute saved for the loaded mask, unless the user set the parameter merged_mask=False, or shape_mask=False.
cutout.add_mask("china")
Plot the merged mask, coarsened to cutout resolution
cutout.merged_mask.plot()
Adding Area Variable#
To calculate and add the variation of grid cell areas by latitude to the cutout, use the add_grid_area method. Keeping track of the area for each grid cell is necessary for analyses such as calculating the weighted sum of the grid cells based on their area.
cutout.add_grid_area()
Creating PV Data Through Cutout Conversion#
The code block below will use the geodata.convert.pv method to generate ds_cutout, an xarray Dataset that contains the pv variable for the cutout.
We transform the xarray DataArray into a xarray DataSet (which can contain multiple DataArray).
ds_cutout = geodata.convert.pv(cutout, panel="KANEKA", orientation="latitude_optimal").to_dataset(
name="solar"
)
len(ds_cutout.time)
We also need to remove the time dimension by calculating daily means via ds_cutout.coarsen(time=24, boundary="exact").mean(), which aggregates the values over its 24 timestamps.
ds_cutout_mean = ds_cutout.coarsen(time=24, boundary="exact").mean()
Combining PV Data with Mask#
The mask method for the Cutout will mask converted xarray.Dataset variable, such as ds_cutout and ds_cutout_mean created above, by combining it with merged_mask or shape_mask in the Cutout object. It will return a dictionary of xarray Dataset. Each key in the dictionary is one unique mask from either the merged_mask or shape_mask variable from the Cutout object, and each value is an xarray dataset containing the dataSet variable (ds_cutout or ds_cutout_mean) with the mask and area values.
The program will automatically search for merged_mask and shape_mask to combine with the xarray.Dataset, unless the user specify merged_mask=False or shape_mask=False. The masks in shape_mask will have the same key as it has in the shape_mask attribute, and the mask for merged_mask will have the same key name merged_mask, as merged_mask is unique to each mask.
Daily averaged PV values#
ds_mask_mean = cutout.mask(dataset=ds_cutout_mean)
ds_mask_mean.keys()
From the output variable ds_mask_mean, check out the combined xarray.Dataset for the Jiangsu province, and plot each of its xarray.DataArray.
ds_mask_mean["Jiangsu"]
Visualize the averaged PV value for each grid cell in the Cutout. Note that the data is the aggregated value for the date.
ds_mask_mean["Jiangsu"]["solar"].plot()
Visualize the masking value for each grid cell in the Cutout.
ds_mask_mean["Jiangsu"]["mask"].plot()
Area and Mask-Weighted Hourly PV Values#
We use the raw hourly output generated by cutout to create time-series PV plots weighted by the mask and area. Note that we transposed ds_cutout so that time is set as the first dimension, which ease the following calculation since we want to aggregate the array spatially from each grid cell.
ds_mask = cutout.mask(ds_cutout)
ds_mask.keys()
Calculate the aggregated mean solar PV for each provinces, at each time point. We will apply this equation below to calculate the area-weighted average. We save the result into a dictionary PV_dict, where its keys are the provinces, and the corresponding values are the PV series.
PV_dict = {}
for prov_name in list(ds_mask)[1:]:
PV_dict[prov_name] = (
(ds_mask[prov_name]["solar"] * ds_mask[prov_name]["mask"] * ds_mask[prov_name]["area"])
.sum(axis=1)
.sum(axis=1)
) / (ds_mask[prov_name]["mask"] * ds_mask[prov_name]["area"]).sum()
The aggregated PV time-series for Zhejiang province.
PV_dict["Zhejiang"]
Finally, for each province, plot the solar series weighted by mask * area.
for prov_name, series in PV_dict.items():
plt.plot(series, label=prov_name)
plt.title(f"Solar series weighted by area for Chinese provinces.")
plt.grid()
plt.legend()
plt.xlabel("2011-01-01 Hour")
plt.ylabel("Aggregated weighted PV value for suitable area")