Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Query via Icechunk

Query NISAR GUNW via Icechunk

Open the virtual Icechunk store created in the previous notebook and extract a spatial subset. Only the chunks overlapping the subset are fetched from NASA’s S3 storage.

import warnings

warnings.filterwarnings("ignore", message="Numcodecs codecs are not in the Zarr")
warnings.filterwarnings(
    "ignore", category=UserWarning, message=".*does not have a Zarr V3 specification.*"
)
warnings.filterwarnings("ignore", category=FutureWarning, module="earthaccess")

Open the Icechunk store

import time

import earthaccess
import icechunk
import xarray as xr
import zarr

zarr.config.set({"async.concurrency": 128})

earthaccess.login()

# Search for the same granule to get credentials and bucket
results = earthaccess.search_data(
    short_name="NISAR_L2_GUNW_BETA_V1",
    point=(174.1192, -39.3379),
    temporal=("2026-01-01", "2026-01-04"),
)
s3_url = [
    link for link in results[0].data_links(access="direct") if link.endswith(".h5")
][0]
bucket = "/".join(s3_url.split("/")[:3])

storage = icechunk.local_filesystem_storage("nisar-icechunk")
config = icechunk.RepositoryConfig.default()

config.set_virtual_chunk_container(
    icechunk.VirtualChunkContainer(
        bucket + "/",
        icechunk.s3_store(region="us-west-2"),
    ),
)

s3_creds = earthaccess.get_s3_credentials(results=results)
virtual_credentials = icechunk.containers_credentials(
    {
        bucket + "/": icechunk.s3_credentials(
            access_key_id=s3_creds["accessKeyId"],
            secret_access_key=s3_creds["secretAccessKey"],
            session_token=s3_creds["sessionToken"],
        )
    }
)

repo = icechunk.Repository.open(
    storage=storage,
    config=config,
    authorize_virtual_chunk_access=virtual_credentials,
)

session = repo.readonly_session("main")
t0 = time.perf_counter()
ds = xr.open_zarr(session.store, consolidated=False, zarr_format=3)
print(f"Opened in {time.perf_counter() - t0:.2f}s")
ds

Extract a spatial subset

Instead of downloading the full file, we select a 500x500 pixel region from the center of the scene. Icechunk fetches only the chunks that overlap this window.

import numpy as np

ny, nx = ds.dims["yCoordinates"], ds.dims["xCoordinates"]

# 500x500 pixel window from the center of the scene
cy, cx = ny // 2, nx // 2
subset_slice = dict(
    yCoordinates=slice(cy - 250, cy + 250), xCoordinates=slice(cx - 250, cx + 250)
)

t0 = time.perf_counter()
subset = ds.isel(**subset_slice).load()
elapsed = time.perf_counter() - t0

print(f"Loaded 500x500 subset in {elapsed:.2f}s (from {ny}x{nx} full scene)")
subset

Visualize the subset

import pygmt

fig = pygmt.Figure()
pygmt.makecpt(
    cmap="SCM/romaO", series=[0, 2 * np.pi, np.pi / 2], cyclic=True, continuous=True
)
fig.grdimage(grid=subset.unwrappedPhase, cmap=True, frame=True)
fig.colorbar()
fig.show()

Visualize coherence magnitude

fig = pygmt.Figure()
fig.grdimage(grid=subset.coherenceMagnitude, cmap="gray", frame=True)
fig.colorbar()
fig.show()

How does this compare?

The h5netcdf baseline notebook loads the same 500x500 subset from the same granule. With h5netcdf, the HDF5 chunking layout means reading a spatial subset may pull far more data than needed. With Icechunk, only the chunks overlapping the subset are fetched from S3.

Run the h5netcdf baseline notebook and compare the “Loaded 500x500 subset in ...” times.