arviz.plot_bpv¶
-
arviz.
plot_bpv
(data, kind='u_value', t_stat='median', bpv=True, plot_mean=True, reference='analytical', mse=False, n_ref=100, hdi_prob=0.94, color='C0', grid=None, figsize=None, textsize=None, labeller=None, data_pairs=None, var_names=None, filter_vars=None, coords=None, flatten=None, flatten_pp=None, ax=None, backend=None, plot_ref_kwargs=None, backend_kwargs=None, group='posterior', show=None)[source]¶ Plot Bayesian p-value for observed data and Posterior/Prior predictive.
- Parameters
- dataaz.InferenceData object
InferenceData object containing the observed and posterior/prior predictive data.
- kindstr
Type of plot to display (“p_value”, “u_value”, “t_stat”). Defaults to u_value. For “p_value” we compute p := p(y* ≤ y | y). This is the probability of the data y being larger or equal than the predicted data y*. The ideal value is 0.5 (half the predictions below and half above the data). For “u_value” we compute pi := p(yi* ≤ yi | y). i.e. like a p_value but per observation yi. This is also known as marginal p_value. The ideal distribution is uniform. This is similar to the LOO-pit calculation/plot, the difference is than in LOO-pit plot we compute pi = p(yi* r ≤ yi | y-i ), where y-i, is all other data except yi. For “t_stat” we compute := p(T(y)* ≤ T(y) | y) where T is any T statistic. See t_stat argument below for details of available options.
- t_statstr, float, or callable
T statistics to compute from the observations and predictive distributions. Allowed strings are “mean”, “median” or “std”. Defaults to “median”. Alternative a quantile can be passed as a float (or str) in the interval (0, 1). Finally a user defined function is also acepted, see examples section for details.
- bpvbool
If True (default) add the bayesian p_value to the legend when kind = t_stat.
- plot_meanbool
Whether or not to plot the mean T statistic. Defaults to True.
- referencestr
How to compute the distributions used as reference for u_values or p_values. Allowed values are “analytical” (default) and “samples”. Use None to do not plot any reference. Defaults to “samples”.
- mse :bool
Show scaled mean square error between uniform distribution and marginal p_value distribution. Defaults to False.
- n_refint, optional
Number of reference distributions to sample when reference=samples. Defaults to 100.
- hdi_prob: float, optional
Probability for the highest density interval for the analytical reference distribution when computing u_values. Should be in the interval (0, 1]. Defaults to 0.94.
- colorstr
Matplotlib color
- gridtuple
Number of rows and columns. Defaults to None, the rows and columns are automatically inferred.
- figsizetuple
Figure size. If None it will be defined automatically.
- textsizefloat
Text size scaling factor for labels, titles and lines. If None it will be autoscaled based on figsize.
- data_pairsdict
Dictionary containing relations between observed data and posterior/prior predictive data. Dictionary structure:
key = data var_name
value = posterior/prior predictive var_name
For example, data_pairs = {‘y’ : ‘y_hat’} If None, it will assume that the observed data and the posterior/prior predictive data have the same variable name.
- labellerlabeller instance, optional
Class providing the method make_pp_label to generate the labels in the plot titles. Read the Label guide for more details and usage examples.
- var_nameslist of variable names
Variables to be plotted, if None all variable are plotted. Prefix the variables by ~ when you want to exclude them from the plot.
- filter_vars{None, “like”, “regex”}, optional, default=None
If None (default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names. A la pandas.filter.
- coordsdict
Dictionary mapping dimensions to selected coordinates to be plotted. Dimensions without a mapping specified will include all coordinates for that dimension. Defaults to including all coordinates for all dimensions if None.
- flattenlist
List of dimensions to flatten in observed_data. Only flattens across the coordinates specified in the coords argument. Defaults to flattening all of the dimensions.
- flatten_pplist
List of dimensions to flatten in posterior_predictive/prior_predictive. Only flattens across the coordinates specified in the coords argument. Defaults to flattening all of the dimensions. Dimensions should match flatten excluding dimensions for data_pairs parameters. If flatten is defined and flatten_pp is None, then flatten_pp=flatten.
- legendbool
Add legend to figure. By default True.
- axnumpy array-like of matplotlib axes or bokeh figures, optional
A 2D array of locations into which to plot the densities. If not supplied, Arviz will create its own array of plot areas (and return it).
- backendstr, optional
Select plotting backend {“matplotlib”,”bokeh”}. Default “matplotlib”.
- plot_ref_kwargsdict, optional
Extra keyword arguments to control how reference is represented. Passed to plt.plot or plt.axhspan`(when `kind=u_value and reference=analytical).
- backend_kwargsbool, optional
These are kwargs specific to the backend being used. For additional documentation check the plotting method of the backend.
- group{“prior”, “posterior”}, optional
Specifies which InferenceData group should be plotted. Defaults to ‘posterior’. Other value can be ‘prior’.
- showbool, optional
Call backend show function.
- Returns
- axes: matplotlib axes or bokeh figures
References
Gelman et al. (2013) see http://www.stat.columbia.edu/~gelman/book/ pages 151-153 for details
Examples
Plot Bayesian p_values.
>>> import arviz as az >>> data = az.load_arviz_data("regression1d") >>> az.plot_bpv(data, kind="p_value")
Plot custom t statistic comparison.
>>> import arviz as az >>> data = az.load_arviz_data("regression1d") >>> az.plot_bpv(data, kind="t_stat", t_stat=lambda x:np.percentile(x, q=50, axis=-1))