import icechunk
from obstore.store import HTTPStore
from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistryHands-on - Virtualize NetCDF from ESGF
This example uses data from the Earth System Grid Federation THREDDS Data Server.
This is a quicker example for hands-on experience. For a full walkthrough, view previous example on the NASA-NEX-GDDP CMIP6 data.
Thank you to Raphael Hagen for contributing this example!
Step 1: Import necessary functions and classes
Step 2: Define data location
bucket = "https://esgf-data.ucar.edu"
path = "thredds/fileServer/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/historical/r3i1p1f1/day/tas/gn/v20190308/tas_day_CESM2_historical_r3i1p1f1_gn_19200101-19291231.nc"Step 3: Create an ObjectStore and an ObjectStoreRegistry
store = HTTPStore.from_url(bucket)
registry = ObjectStoreRegistry({bucket: store})Step 4: Create an instance of the HDFParser
parser = HDFParser()Step 5: Create a virtual dataset via open_virtual_dataset
vds = open_virtual_dataset(
url=f"{bucket}/{path}",
parser=parser,
registry=registry,
loadable_variables=["lat", "lon", "time", "time_bnds", "lat_bnds", "lon_bnds"],
)/home/jovyan/Code/virtual-zarr/esip-2025/.pixi/envs/default/lib/python3.13/site-packages/numcodecs/zarr3.py:145: UserWarning: Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations.
super().__init__(**codec_config)
/home/jovyan/Code/virtual-zarr/esip-2025/.pixi/envs/default/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'tas' has multiple fill values {np.float64(1.0000000200408773e+20), np.float64(1e+20)} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
vds<xarray.Dataset> Size: 807MB
Dimensions: (lat: 192, lon: 288, time: 3650, nbnd: 2)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
* time (time) object 29kB 1920-01-01 00:00:00 ... 1929-12-31 00:00:00
Dimensions without coordinates: nbnd
Data variables:
time_bnds (time, nbnd) object 58kB ...
lat_bnds (lat, nbnd) float32 2kB ...
lon_bnds (lon, nbnd) float32 2kB ...
tas (time, lat, lon) float32 807MB ManifestArray<shape=(3650, 192,...
Attributes: (12/45)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
case_id: 17
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.003
contact: cesm_cmip6@ucar.edu
creation_date: 2019-01-18T16:46:02Z
... ...
sub_experiment: none
sub_experiment_id: none
branch_time_in_parent: 240900.0
branch_time_in_child: 674885.0
branch_method: standard
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h...Step 6: Serialize as an (in-memory) Icechunk store
Usually you’d want to use a persistent storage format, but let’s not waste disk space on an example.
icechunk_store = icechunk.in_memory_storage()
repo = icechunk.Repository.create(icechunk_store)
session = repo.writable_session("main")
vds.vz.to_icechunk(session.store)
session.commit("Create virtual store")'Y308PPAWNJ1Q054MB8JG'