arviz.summary#
- arviz.summary(data, var_names=None, filter_vars=None, group=None, fmt='wide', kind='all', round_to=None, circ_var_names=None, stat_focus='mean', stat_funcs=None, extend=True, hdi_prob=None, skipna=False, labeller=None, coords=None, index_origin=None, order=None)[source]#
Create a data frame with summary statistics.
- Parameters
- data: obj
Any object that can be converted to an
arviz.InferenceData
object Refer to documentation ofarviz.convert_to_dataset()
for details- var_names: list
Names of variables to include in summary. Prefix the variables by
~
when you want to exclude them from the summary:["~beta"]
instead of["beta"]
(see examples below).- filter_vars: {None, “like”, “regex”}, optional, default=None
If
None
(default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names. A lapandas.filter
.- coords: Dict[str, List[Any]], optional
Coordinate subset for which to calculate the summary.
- group: str
Select a group for summary. Defaults to “posterior”, “prior” or first group in that order, depending what groups exists.
- fmt: {‘wide’, ‘long’, ‘xarray’}
Return format is either pandas.DataFrame {‘wide’, ‘long’} or xarray.Dataset {‘xarray’}.
- kind: {‘all’, ‘stats’, ‘diagnostics’}
Whether to include the
stats
:mean
,sd
,hdi_3%
,hdi_97%
, or thediagnostics
:mcse_mean
,mcse_sd
,ess_bulk
,ess_tail
, andr_hat
. Default to includeall
of them.- round_to: int
Number of decimals used to round results. Defaults to 2. Use “none” to return raw numbers.
- circ_var_names: list
A list of circular variables to compute circular stats for
- stat_focus
str
, default “mean” Select the focus for summary.
- stat_funcs: dict
A list of functions or a dict of functions with function names as keys used to calculate statistics. By default, the mean, standard deviation, simulation standard error, and highest posterior density intervals are included.
The functions will be given one argument, the samples for a variable as an nD array, The functions should be in the style of a ufunc and return a single number. For example,
numpy.mean()
, orscipy.stats.var
would both work.- extend: boolean
If True, use the statistics returned by
stat_funcs
in addition to, rather than in place of, the default statistics. This is only meaningful whenstat_funcs
is not None.- hdi_prob: float, optional
Highest density interval to compute. Defaults to 0.94. This is only meaningful when
stat_funcs
is None.- skipna: bool
If true ignores nan values when computing the summary statistics, it does not affect the behaviour of the functions passed to
stat_funcs
. Defaults to false.- labeller
labeller
instance
, optional Class providing the method
make_label_flat
to generate the labels in the plot titles. For more details onlabeller
usage see Label guide- credible_interval: float, optional
deprecated: Please see hdi_prob
- order
deprecated: order is now ignored.
- index_origin
deprecated: index_origin is now ignored, modify the coordinate values to change the value used in summary.
- Returns
pandas.DataFrame
orxarray.Dataset
Return type dicated by
fmt
argument.Return value will contain summary statistics for each variable. Default statistics depend on the value of
stat_focus
:stat_focus="mean"
:mean
,sd
,hdi_3%
,hdi_97%
,mcse_mean
,mcse_sd
,ess_bulk
,ess_tail
, andr_hat
stat_focus="median"
:median
,mad
,eti_3%
,eti_97%
,mcse_median
,ess_median
,ess_tail
, andr_hat
r_hat
is only computed for traces with 2 or more chains.
See also
waic
Compute the widely applicable information criterion.
loo
Compute Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV).
ess
Calculate estimate of the effective sample size (ess).
rhat
Compute estimate of rank normalized splitR-hat for a set of traces.
mcse
Calculate Markov Chain Standard Error statistic.
Examples
In [1]: import arviz as az ...: data = az.load_arviz_data("centered_eight") ...: az.summary(data, var_names=["mu", "tau"]) ...: Out[1]: mean sd hdi_3% hdi_97% ... mcse_sd ess_bulk ess_tail r_hat mu 4.486 3.487 -1.623 10.693 ... 0.160 241.0 659.0 1.02 tau 4.124 3.102 0.896 9.668 ... 0.186 67.0 38.0 1.06 [2 rows x 9 columns]
You can use
filter_vars
to select variables without having to specify all the exact names. Usefilter_vars="like"
to select based on partial naming:In [2]: az.summary(data, var_names=["the"], filter_vars="like") Out[2]: mean sd hdi_3% ... ess_bulk ess_tail r_hat theta[Choate] 6.460 5.868 -4.564 ... 365.0 710.0 1.01 theta[Deerfield] 5.028 4.883 -4.311 ... 427.0 851.0 1.01 theta[Phillips Andover] 3.938 5.688 -7.769 ... 515.0 730.0 1.01 theta[Phillips Exeter] 4.872 5.012 -4.490 ... 337.0 869.0 1.01 theta[Hotchkiss] 3.667 4.956 -6.470 ... 365.0 1034.0 1.01 theta[Lawrenceville] 3.975 5.187 -7.041 ... 521.0 1031.0 1.01 theta[St. Paul's] 6.581 5.105 -3.093 ... 276.0 586.0 1.01 theta[Mt. Hermon] 4.772 5.737 -5.858 ... 452.0 754.0 1.01 [8 rows x 9 columns]
Use
filter_vars="regex"
to select based on regular expressions, and prefix the variables you want to exclude by~
. Here, we exclude from the summary all the variables starting with the letter t:In [3]: az.summary(data, var_names=["~^t"], filter_vars="regex") Out[3]: mean sd hdi_3% hdi_97% ... mcse_sd ess_bulk ess_tail r_hat mu 4.486 3.487 -1.623 10.693 ... 0.16 241.0 659.0 1.02 [1 rows x 9 columns]
Other statistics can be calculated by passing a list of functions or a dictionary with key, function pairs.
In [4]: import numpy as np ...: def median_sd(x): ...: median = np.percentile(x, 50) ...: sd = np.sqrt(np.mean((x-median)**2)) ...: return sd ...: ...: func_dict = { ...: "std": np.std, ...: "median_std": median_sd, ...: "5%": lambda x: np.percentile(x, 5), ...: "median": lambda x: np.percentile(x, 50), ...: "95%": lambda x: np.percentile(x, 95), ...: } ...: az.summary( ...: data, ...: var_names=["mu", "tau"], ...: stat_funcs=func_dict, ...: extend=False ...: ) ...: Out[4]: std median_std 5% median 95% mu 3.486 3.486 -1.152 4.548 10.020 tau 3.101 3.217 1.054 3.269 10.106
Use
stat_focus
to change the focus of summary statistics obatined to median:In [5]: az.summary(data, stat_focus="median") Out[5]: median mad eti_3% ... ess_median ess_tail r_hat mu 4.548 2.283 -1.968 ... 199.205 659.0 1.02 theta[Choate] 6.082 3.130 -3.511 ... 383.402 710.0 1.01 theta[Deerfield] 5.011 3.347 -4.066 ... 320.345 851.0 1.01 theta[Phillips Andover] 4.227 3.150 -7.771 ... 258.296 730.0 1.01 theta[Phillips Exeter] 5.022 3.251 -4.777 ... 197.764 869.0 1.01 theta[Hotchkiss] 3.892 3.171 -6.187 ... 272.506 1034.0 1.01 theta[Lawrenceville] 4.136 3.133 -6.549 ... 321.125 1031.0 1.01 theta[St. Paul's] 6.065 3.014 -2.254 ... 278.395 586.0 1.01 theta[Mt. Hermon] 4.706 3.340 -6.348 ... 245.548 754.0 1.01 tau 3.269 1.600 0.927 ... 119.695 38.0 1.06 [10 rows x 8 columns]