arviz.plot_ppc#
- arviz.plot_ppc(data, kind='kde', alpha=None, mean=True, observed=None, observed_rug=False, color=None, colors=None, grid=None, figsize=None, textsize=None, data_pairs=None, var_names=None, filter_vars=None, coords=None, flatten=None, flatten_pp=None, num_pp_samples=None, random_seed=None, jitter=None, animated=False, animation_kwargs=None, legend=True, labeller=None, ax=None, backend=None, backend_kwargs=None, group='posterior', show=None)[source]#
Plot for posterior/prior predictive checks.
- Parameters:
- data
InferenceData
arviz.InferenceData
object containing the observed and posterior/prior predictive data.- kind
str
, default “kde” Type of plot to display (“kde”, “cumulative”, or “scatter”).
- alpha
float
, optional Opacity of posterior/prior predictive density curves. Defaults to 0.2 for
kind = kde
and cumulative, for scatter defaults to 0.7.- meanbool, default
True
Whether or not to plot the mean posterior/prior predictive distribution.
- observedbool, optional
Whether or not to plot the observed data. Defaults to True for
group = posterior
and False forgroup = prior
.- observed_rugbool, default
False
Whether or not to plot a rug plot for the observed data. Only valid if
observed
isTrue
and for kindkde
orcumulative
.- color
list
, optional List with valid matplotlib colors corresponding to the posterior/prior predictive distribution, observed data and mean of the posterior/prior predictive distribution. Defaults to [“C0”, “k”, “C1”].
- grid
tuple
, optional Number of rows and columns. Defaults to None, the rows and columns are automatically inferred.
- figsize
tuple
, optional Figure size. If None, it will be defined automatically.
- textsize
float
, optional Text size scaling factor for labels, titles and lines. If None, it will be autoscaled based on
figsize
.- data_pairs
dict
, optional Dictionary containing relations between observed data and posterior/prior predictive data. Dictionary structure:
key = data var_name
value = posterior/prior predictive var_name
For example,
data_pairs = {'y' : 'y_hat'}
If None, it will assume that the observed data and the posterior/prior predictive data have the same variable name.- var_names
list
ofstr
, optional Variables to be plotted, if
None
all variable are plotted. Prefix the variables by~
when you want to exclude them from the plot.- filter_vars{
None
, “like”, “regex”}, defaultNone
If
None
(default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names. A lapandas.filter
.- coords
dict
, optional Dictionary mapping dimensions to selected coordinates to be plotted. Dimensions without a mapping specified will include all coordinates for that dimension. Defaults to including all coordinates for all dimensions if None.
- flatten
list
List of dimensions to flatten in
observed_data
. Only flattens across the coordinates specified in thecoords
argument. Defaults to flattening all of the dimensions.- flatten_pp
list
List of dimensions to flatten in posterior_predictive/prior_predictive. Only flattens across the coordinates specified in the
coords
argument. Defaults to flattening all of the dimensions. Dimensions should match flatten excluding dimensions fordata_pairs
parameters. Ifflatten
is defined andflatten_pp
is None, thenflatten_pp = flatten
.- num_pp_samples
int
The number of posterior/prior predictive samples to plot. For
kind
= ‘scatter’ andanimation = False
if defaults to a maximum of 5 samples and will set jitter to 0.7. unless defined. Otherwise it defaults to all provided samples.- random_seed
int
Random number generator seed passed to
numpy.random.seed
to allow reproducibility of the plot. By default, no seed will be provided and the plot will change each call if a random sample is specified bynum_pp_samples
.- jitter
float
, default 0 If
kind
is “scatter”, jitter will add random uniform noise to the height of the ppc samples and observed data.- animatedbool, default
False
Create an animation of one posterior/prior predictive sample per frame. Only works with matploblib backend. To run animations inside a notebook you have to use the
nbAgg
matplotlib’s backend. Try with%matplotlib notebook
or%matplotlib nbAgg
. You can switch back to the default matplotlib’s backend with%matplotlib inline
or%matplotlib auto
. If switching back and forth between matplotlib’s backend, you may need to run twice the cell with the animation. If you experience problems rendering the animation try settinganimation_kwargs({'blit':False})
or changing the matplotlib’s backend (e.g. to TkAgg) If you run the animation from a script writeax, ani = az.plot_ppc(.)
- animation_kwargs
dict
Keywords passed to
matplotlib.animation.FuncAnimation
. Ignored with matplotlib backend.- legendbool, default
True
Add legend to figure.
- labeller
labeller
, optional Class providing the method
make_pp_label
to generate the labels in the plot titles. Read the Label guide for more details and usage examples.- ax
numpy
array_like ofmatplotlib Axes
orbokeh
figures
, optional A 2D array of locations into which to plot the densities. If not supplied, Arviz will create its own array of plot areas (and return it).
- backend
str
, optional Select plotting backend {“matplotlib”,”bokeh”}. Default to “matplotlib”.
- backend_kwargs
dict
, optional These are kwargs specific to the backend being used, passed to
matplotlib.pyplot.subplots()
orbokeh.plotting.figure()
. For additional documentation check the plotting method of the backend.- group{“prior”, “posterior”}, optional
Specifies which InferenceData group should be plotted. Defaults to ‘posterior’. Other value can be ‘prior’.
- showbool, optional
Call backend show function.
- data
- Returns:
- axes
matplotlib Axes
orbokeh_figures
- ani
matplotlib.animation.FuncAnimation
, optional Only provided if
animated
isTrue
.
- axes
See also
plot_bpv
Plot Bayesian p-value for observed data and Posterior/Prior predictive.
plot_loo_pit
Plot for posterior predictive checks using cross validation.
plot_lm
Posterior predictive and mean plots for regression-like data.
plot_ts
Plot timeseries data.
Examples
Plot the observed data KDE overlaid on posterior predictive KDEs.
>>> import arviz as az >>> data = az.load_arviz_data('radon') >>> az.plot_ppc(data, data_pairs={"y":"y"})
Plot the overlay with empirical CDFs.
>>> az.plot_ppc(data, kind='cumulative')
Use the
coords
andflatten
parameters to plot selected variable dimensions across multiple plots. We will now modify the dimensionobs_id
to contain indicate the name of the county where the measure was taken. The change has to be done on bothposterior_predictive
andobserved_data
groups, which is why we will usemap()
to apply the same function to both groups. Afterwards, we will select the counties to be plotted with thecoords
arg.>>> obs_county = data.posterior["County"][data.constant_data["county_idx"]] >>> data = data.assign_coords(obs_id=obs_county, groups="observed_vars") >>> az.plot_ppc(data, coords={'obs_id': ['ANOKA', 'BELTRAMI']}, flatten=[])
Plot the overlay using a stacked scatter plot that is particularly useful when the sample sizes are small.
>>> az.plot_ppc(data, kind='scatter', flatten=[], >>> coords={'obs_id': ['AITKIN', 'BELTRAMI']})
Plot random posterior predictive sub-samples.
>>> az.plot_ppc(data, num_pp_samples=30, random_seed=7)