import icechunk
from obstore.store import HTTPStore
from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
Hands-on - Virtualize NetCDF from ESGF
This example uses data from the Earth System Grid Federation THREDDS Data Server.
This is a quicker example for hands-on experience. For a full walkthrough, view previous example on the NASA-NEX-GDDP CMIP6 data.
Thank you to Raphael Hagen for contributing this example!
Step 1: Import necessary functions and classes
Step 2: Define data location
= "https://esgf-data.ucar.edu"
bucket = "thredds/fileServer/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/historical/r3i1p1f1/day/tas/gn/v20190308/tas_day_CESM2_historical_r3i1p1f1_gn_19200101-19291231.nc" path
Step 3: Create an ObjectStore and an ObjectStoreRegistry
= HTTPStore.from_url(bucket)
store = ObjectStoreRegistry({bucket: store}) registry
Step 4: Create an instance of the HDFParser
= HDFParser() parser
Step 5: Create a virtual dataset via open_virtual_dataset
= open_virtual_dataset(
vds =f"{bucket}/{path}",
url=parser,
parser=registry,
registry=["lat", "lon", "time", "time_bnds", "lat_bnds", "lon_bnds"],
loadable_variables )
/home/jovyan/Code/virtual-zarr/esip-2025/.pixi/envs/default/lib/python3.13/site-packages/numcodecs/zarr3.py:145: UserWarning: Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations.
super().__init__(**codec_config)
/home/jovyan/Code/virtual-zarr/esip-2025/.pixi/envs/default/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'tas' has multiple fill values {np.float64(1.0000000200408773e+20), np.float64(1e+20)} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
vds
<xarray.Dataset> Size: 807MB Dimensions: (lat: 192, lon: 288, time: 3650, nbnd: 2) Coordinates: * lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0 * lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8 * time (time) object 29kB 1920-01-01 00:00:00 ... 1929-12-31 00:00:00 Dimensions without coordinates: nbnd Data variables: time_bnds (time, nbnd) object 58kB ... lat_bnds (lat, nbnd) float32 2kB ... lon_bnds (lon, nbnd) float32 2kB ... tas (time, lat, lon) float32 807MB ManifestArray<shape=(3650, 192,... Attributes: (12/45) Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP case_id: 17 cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.003 contact: cesm_cmip6@ucar.edu creation_date: 2019-01-18T16:46:02Z ... ... sub_experiment: none sub_experiment_id: none branch_time_in_parent: 240900.0 branch_time_in_child: 674885.0 branch_method: standard further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...
Step 6: Serialize as an (in-memory) Icechunk store
Usually you’d want to use a persistent storage format, but let’s not waste disk space on an example.
= icechunk.in_memory_storage()
icechunk_store = icechunk.Repository.create(icechunk_store)
repo = repo.writable_session("main")
session
vds.vz.to_icechunk(session.store)"Create virtual store") session.commit(
'Y308PPAWNJ1Q054MB8JG'