Virtualize NISAR GUNW
Virtualize NISAR GUNW to Icechunk¶
NISAR measures ice sheet velocity and surface deformation using synthetic aperture radar. Each GUNW (Geocoded Unwrapped Interferogram) granule is a large HDF5 file with chunked arrays and zlib compression.
This notebook creates virtual Zarr references to a NISAR granule on NASA’s S3 storage and persists them in an Icechunk store.
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="earthaccess")
warnings.filterwarnings("ignore", message="Numcodecs codecs are not in the Zarr")
warnings.filterwarnings(
"ignore", category=UserWarning, message=".*does not have a Zarr V3 specification.*"
)Find a NISAR GUNW granule¶
import earthaccess
earthaccess.login()
results = earthaccess.search_data(
short_name="NISAR_L2_GUNW_BETA_V1",
point=(174.1192, -39.3379), # lon, lat
temporal=("2026-01-01", "2026-01-04"),
)
print(f"Found {len(results)} granule(s)")
granule = results[0]
s3_links = granule.data_links(access="direct")
s3_url = [link for link in s3_links if link.endswith(".h5")][0]
print(f"S3 URL: {s3_url}")Create virtual references with VirtualiZarr¶
open_virtual_dataset reads HDF5 metadata for a specific group and
builds virtual Zarr references. Only metadata is transferred, no
array data.
from urllib.parse import urlparse
import obstore
from obstore.auth.earthdata import NasaEarthdataCredentialProvider
import virtualizarr as vz
from obspec_utils.registry import ObjectStoreRegistry
# Parse S3 URL
parsed = urlparse(s3_url)
bucket = f"s3://{parsed.netloc}"
# Get credential endpoint for this collection
credential_url = granule.get_s3_credentials_endpoint()
# Create authenticated S3 store
cp = NasaEarthdataCredentialProvider(credential_url)
store = obstore.store.S3Store(
bucket=parsed.netloc,
region="us-west-2",
credential_provider=cp,
)
registry = ObjectStoreRegistry({bucket: store})
# Virtualize the unwrapped interferogram group
parser = vz.parsers.HDFParser(
group="science/LSAR/GUNW/grids/frequencyA/unwrappedInterferogram/HH",
)
vds = vz.open_virtual_dataset(
url=s3_url,
parser=parser,
registry=registry,
)
vdsPersist to Icechunk¶
Write the virtual references to a local Icechunk repository. This
creates a versioned store that anyone can open with xr.open_zarr.
import icechunk
# Create local Icechunk repository
storage = icechunk.local_filesystem_storage("nisar-icechunk")
config = icechunk.RepositoryConfig.default()
# Tell Icechunk where the actual data lives (NASA S3)
config.set_virtual_chunk_container(
icechunk.VirtualChunkContainer(
bucket + "/",
icechunk.s3_store(region="us-west-2"),
),
)
# Get S3 credentials for Icechunk to read virtual chunks at read-time
s3_creds = earthaccess.get_s3_credentials(results=results)
virtual_credentials = icechunk.containers_credentials(
{
bucket + "/": icechunk.s3_credentials(
access_key_id=s3_creds["accessKeyId"],
secret_access_key=s3_creds["secretAccessKey"],
session_token=s3_creds["sessionToken"],
)
}
)
repo = icechunk.Repository.create(
storage,
config,
authorize_virtual_chunk_access=virtual_credentials,
)
session = repo.writable_session("main")
vds.vz.to_icechunk(session.store)
snapshot = session.commit("Virtualize NISAR GUNW unwrapped interferogram")
print(f"Committed snapshot: {snapshot}")The Icechunk store now contains virtual references, chunk byte offsets into the original HDF5 on S3. The actual data stays at NASA. When someone reads from this store, Icechunk fetches only the requested chunks on demand.