Hands-on - Virtualize NetCDF from ESGF

This example uses data from the Earth System Grid Federation THREDDS Data Server.

This is a quicker example for hands-on experience. For a full walkthrough, view previous example on the NASA-NEX-GDDP CMIP6 data.

Thank you to Raphael Hagen for contributing this example!

Step 1: Import necessary functions and classes

import icechunk
from obstore.store import HTTPStore
from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

Step 2: Define data location

bucket = "https://esgf-data.ucar.edu"
path = "thredds/fileServer/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/historical/r3i1p1f1/day/tas/gn/v20190308/tas_day_CESM2_historical_r3i1p1f1_gn_19200101-19291231.nc"

Step 3: Create an ObjectStore and an ObjectStoreRegistry

store = HTTPStore.from_url(bucket)
registry = ObjectStoreRegistry({bucket: store})

Step 4: Create an instance of the HDFParser

parser = HDFParser()

Step 5: Create a virtual dataset via open_virtual_dataset

vds = open_virtual_dataset(
    url=f"{bucket}/{path}",
    parser=parser,
    registry=registry,
    loadable_variables=["lat", "lon", "time", "time_bnds", "lat_bnds", "lon_bnds"],
)
/home/jovyan/Code/virtual-zarr/esip-2025/.pixi/envs/default/lib/python3.13/site-packages/numcodecs/zarr3.py:145: UserWarning: Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations.
  super().__init__(**codec_config)
/home/jovyan/Code/virtual-zarr/esip-2025/.pixi/envs/default/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'tas' has multiple fill values {np.float64(1.0000000200408773e+20), np.float64(1e+20)} defined, decoding all values to NaN.
  var = coder.decode(var, name=name)
vds
<xarray.Dataset> Size: 807MB
Dimensions:    (lat: 192, lon: 288, time: 3650, nbnd: 2)
Coordinates:
  * lat        (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
  * time       (time) object 29kB 1920-01-01 00:00:00 ... 1929-12-31 00:00:00
Dimensions without coordinates: nbnd
Data variables:
    time_bnds  (time, nbnd) object 58kB ...
    lat_bnds   (lat, nbnd) float32 2kB ...
    lon_bnds   (lon, nbnd) float32 2kB ...
    tas        (time, lat, lon) float32 807MB ManifestArray<shape=(3650, 192,...
Attributes: (12/45)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    case_id:                17
    cesm_casename:          b.e21.BHIST.f09_g17.CMIP6-historical.003
    contact:                cesm_cmip6@ucar.edu
    creation_date:          2019-01-18T16:46:02Z
    ...                     ...
    sub_experiment:         none
    sub_experiment_id:      none
    branch_time_in_parent:  240900.0
    branch_time_in_child:   674885.0
    branch_method:          standard
    further_info_url:       https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...

Step 6: Serialize as an (in-memory) Icechunk store

Usually you’d want to use a persistent storage format, but let’s not waste disk space on an example.

icechunk_store = icechunk.in_memory_storage()
repo = icechunk.Repository.create(icechunk_store)
session = repo.writable_session("main")
vds.vz.to_icechunk(session.store)
session.commit("Create virtual store")
'Y308PPAWNJ1Q054MB8JG'