arviz.dict_to_dataset#
- arviz.dict_to_dataset(data, *, attrs=None, library=None, coords=None, dims=None, default_dims=None, index_origin=None, skip_event_dims=None)[source]#
Convert a dictionary or pytree of numpy arrays to an xarray.Dataset.
ArviZ itself supports conversion of flat dictionaries. Suport for pytrees requires
dm-tree
which is an optional dependency. See https://jax.readthedocs.io/en/latest/pytrees.html for what a pytree is, but this inclues at least dictionaries and tuple types.- Parameters:
- data
dict
of {str
array_like ordict
} orpytree
Data to convert. Keys are variable names.
- attrs
dict
, optional Json serializable metadata to attach to the dataset, in addition to defaults.
- library
module
, optional Library used for performing inference. Will be attached to the attrs metadata.
- coords
dict
of {str
ndarray
}, optional Coordinates for the dataset
- dims
dict
of {str
list
ofstr
}, optional Dimensions of each variable. The keys are variable names, values are lists of coordinates.
- default_dims
list
ofstr
, optional Passed to
numpy_to_data_array()
- index_origin
int
, optional Passed to
numpy_to_data_array()
- skip_event_dimsbool, optional
If True, cut extra dims whenever present to match the shape of the data. Necessary for PPLs which have the same name in both observed data and log likelihood groups, to account for their different shapes when observations are multivariate.
- data
- Returns:
xarray.Dataset
In case of nested pytrees, the variable name will be a tuple of individual names.
Notes
This function is available through two aliases:
dict_to_dataset
orpytree_to_dataset
.Examples
Convert a dictionary with two 2D variables to a Dataset.
In [1]: import arviz as az ...: import numpy as np ...: az.dict_to_dataset({'x': np.random.randn(4, 100), 'y': np.random.rand(4, 100)}) ...: Out[1]: <xarray.Dataset> Size: 7kB Dimensions: (chain: 4, draw: 100) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 800B 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99 Data variables: x (chain, draw) float64 3kB -1.318 -0.9759 0.1915 ... -0.1124 0.3929 y (chain, draw) float64 3kB 0.6283 0.7501 0.1688 ... 0.528 0.8412 Attributes: created_at: 2025-03-06T15:41:49.734915+00:00 arviz_version: 0.21.0
Note that unlike the
xarray.Dataset
constructor, ArviZ has added extra information to the generated Dataset such as default dimension names for sampled dimensions and some attributes.The function is also general enough to work on pytrees such as nested dictionaries:
In [2]: az.pytree_to_dataset({'top': {'second': 1.}, 'top2': 1.}) Out[2]: <xarray.Dataset> Size: 32B Dimensions: (chain: 1, draw: 1) Coordinates: * chain (chain) int64 8B 0 * draw (draw) int64 8B 0 Data variables: ('top', 'second') (chain, draw) float64 8B 1.0 top2 (chain, draw) float64 8B 1.0 Attributes: created_at: 2025-03-06T15:41:49.753432+00:00 arviz_version: 0.21.0
which has two variables (as many as leafs) named
('top', 'second')
andtop2
.Dimensions and co-ordinates can be defined as usual:
In [3]: datadict = { ...: "top": {"a": np.random.randn(100), "b": np.random.randn(1, 100, 10)}, ...: "d": np.random.randn(100), ...: } ...: az.dict_to_dataset( ...: datadict, ...: coords={"c": np.arange(10)}, ...: dims={("top", "b"): ["c"]} ...: ) ...: Out[3]: <xarray.Dataset> Size: 10kB Dimensions: (chain: 1, draw: 100, c: 10) Coordinates: * chain (chain) int64 8B 0 * draw (draw) int64 800B 0 1 2 3 4 5 6 7 ... 92 93 94 95 96 97 98 99 * c (c) int64 80B 0 1 2 3 4 5 6 7 8 9 Data variables: d (chain, draw) float64 800B 0.3705 0.5437 ... 1.189 0.6675 ('top', 'a') (chain, draw) float64 800B 0.02016 -0.3345 ... 0.09341 0.07866 ('top', 'b') (chain, draw, c) float64 8kB -1.584 -1.414 ... -0.204 0.1507 Attributes: created_at: 2025-03-06T15:41:49.763960+00:00 arviz_version: 0.21.0