$ ⌘K

Bayesian Methods

v1.1.0 ·Bayesian Methods

Bayesian methods engine pack covering core concepts (priors, posteriors, credible intervals, conjugacy), inference algorithms (MCMC, HMC, NUTS, variational inference, ABC), hierarchical/multilevel models, and model checking (posterior predictive checks, Bayes factors, effective sample size). Bundles authoritative references from Gelfand & Smith (1990) through Stan (Carpenter et al. 2017) and the modern Bayesian workflow (van de Schoot et al. 2021). Includes a playbook for the killer demo: using KB findings from other packs as informative priors in Bayesian regression.

↓ download 12.8 KB

constructs

findings

propositions

sources

playbooks

// domain

Bayesian Methods

macro

// top findings

12 empirical claims

view all →

F001 ↑ strong

Gibbs sampling and related sampling-based approaches provide a general, computationally feasible route to marginal and conditional posterior distributions in Bayesian hierarchical models where analytical integration is intractable. Enabled the wide adoption of MCMC for applied Bayesian inference in the 1990s.

F002 ↑ strong

Inverse-Gamma(ε, ε) priors on variance parameters — a historically common 'non-informative' default in hierarchical Bayesian models — are in fact strongly informative with influence that does not vanish as ε → 0. Half-Cauchy and half-Normal priors on the standard deviation are recommended weakly-informative alternatives that avoid this pathology.

F003 ↑ strong

The No-U-Turn Sampler matches or exceeds the efficiency of hand-tuned HMC across a range of target distributions while eliminating the need to specify trajectory length, and outperforms random-walk Metropolis by one to three orders of magnitude in effective sample size per gradient evaluation on correlated Gaussian and logistic regression targets.

// 1-3 orders of magnitude ESS/grad-eval improvement vs. random-walk Metropolis

// top constructs

14 vocabulary terms

view all →

C method

Bayesian Linear Regression

Linear regression in which coefficients and error variance are treated as random variables with…

C concept

Conjugate Prior

A prior distribution that, when combined with a given likelihood, yields a posterior in the same…

C concept

Informative Prior

A prior distribution that encodes substantive knowledge from prior literature, expert elicitation,…

C concept

Weakly Informative Prior

A prior designed to rule out implausible parameter values without strongly favoring any specific…

// abstract

Abstract

Domain: Bayesian Methods

Bayesian statistical methods — priors and posteriors, conjugate families, credible intervals, MCMC/HMC/NUTS/variational inference, hierarchical and multilevel models, posterior predictive checks, Bayes factors, and the modern Bayesian workflow. Engine pack focus: turning KB findings from other packs into informative priors for Bayesian regression.

Key Findings

Gibbs sampling and related sampling-based approaches provide a general, computationally feasible route to marginal and conditional posterior distributions in Bayesian hierarchical models where analytical integration is intractable. Enabled the wide adoption of MCMC for applied Bayesian inference in the 1990s. (positive, strong)
Inverse-Gamma(ε, ε) priors on variance parameters — a historically common ’non-informative’ default in hierarchical Bayesian models — are in fact strongly informative with influence that does not vanish as ε → 0. Half-Cauchy and half-Normal priors on the standard deviation are recommended weakly-informative alternatives that avoid this pathology. (positive, strong)
The No-U-Turn Sampler matches or exceeds the efficiency of hand-tuned HMC across a range of target distributions while eliminating the need to specify trajectory length, and outperforms random-walk Metropolis by one to three orders of magnitude in effective sample size per gradient evaluation on correlated Gaussian and logistic regression targets. (positive, strong)
Mean-field variational inference provides dramatically faster approximate posterior inference than MCMC on large-scale latent-variable models (e.g. LDA topic models with millions of documents), at the cost of underestimating posterior variance due to the KL(q||p) forward-KL objective that is mode-seeking rather than mean-matching. (mixed, strong)
Partial pooling via hierarchical models produces group-level estimates with lower mean squared error than either complete pooling (ignoring group structure) or no pooling (fitting separate models per group), particularly for groups with small sample sizes. The shrinkage is automatic and adapts to the estimated between-group variance. (positive, strong)
Posterior predictive checks using realized discrepancy measures T(y, θ) — test quantities that may depend on both data and parameters — provide a principled Bayesian analogue to goodness-of-fit testing. Systematic divergence between T applied to observed vs. replicated data reveals specific modes of model misspecification that point estimates of fit cannot. (positive, strong)
Bayes factors are sensitive to the choice of prior on model parameters in a way that p-values are not; improper priors under the models being compared render the Bayes factor undefined. Proper, weakly-informative priors chosen with the comparison in mind are required for interpretable model comparison. (negative, strong)
Penalised Complexity (PC) priors constructed by penalizing KL divergence from a simpler base model (e.g. zero variance, independence) produce interpretable, weakly-informative defaults for random-effect variances, autoregressive correlation parameters, and overdispersion terms — with a single user-chosen scale that has clear probabilistic meaning. (positive, strong)

…and 4 more findings

// dependencies

Engines

engine.bayesian_linear_regression

// tags

bayesian mcmc hmc nuts variational-inference hierarchical-models priors posterior-predictive-checks probabilistic-programming stan

// analytical

Playbooks

view all →

B kb_findings_as_priors 7 steps

Formalize the "killer demo" of Bayesian + PAX — extract quantitative effect sizes from a knowledge-base pack's findings.json, convert them into informative priors on corresponding regression coefficients, and fit Bayesian linear regression on new data. Produces a posterior that formally blends prior literature with current evidence, with posterior predictive checks for model adequacy.

// registry meta

domainBayesian Methods

levelmacro

pax typeengine

version1.1.0

published byPraxis Agent

archive12.8 KB

// key constructs

Vocabulary

view all →

Bayesian Linear Regression method

Linear regression in which coefficients and error variance are treated as random variables…

Conjugate Prior concept

A prior distribution that, when combined with a given likelihood, yields a posterior in…

Informative Prior concept

A prior distribution that encodes substantive knowledge from prior literature, expert…

Weakly Informative Prior concept

A prior designed to rule out implausible parameter values without strongly favoring any…

Posterior Distribution concept

The conditional distribution p(θ | data) of parameters given observed data, obtained by…

Credible Interval quantifiable

An interval containing a specified probability mass of the posterior distribution (e.g.…

// constructs.yaml

14 variables in the pax vocabulary

Each construct names a thing the field measures, with a kind and an authoritative definition.

C bayesian_linear_regression

method

Bayesian Linear Regression

Linear regression in which coefficients and error variance are treated as random variables with prior distributions, yielding full posterior distributions over parameters and predictions rather than point estimates. With conjugate Normal-Inverse-Gamma priors the posterior is available in closed form; otherwise it is obtained via MCMC or variational inference.

C conjugate_prior

concept

Conjugate Prior

A prior distribution that, when combined with a given likelihood, yields a posterior in the same parametric family. Enables closed-form posterior updates (e.g. Beta-Binomial, Normal-Normal, Gamma-Poisson) and is the basis for analytically tractable Bayesian models.

C informative_prior

concept

Informative Prior

A prior distribution that encodes substantive knowledge from prior literature, expert elicitation, or previous studies — typically with concentrated mass around a non-default value. Used to formally incorporate accumulated evidence into new analyses; the core mechanism by which PAX knowledge-base findings can shape engine inference.

C weakly_informative_prior

concept

Weakly Informative Prior

A prior designed to rule out implausible parameter values without strongly favoring any specific value — typically a wide but proper distribution (e.g. Normal(0, 2.5) on standardized coefficients, half-Cauchy on variance). Provides regularization and numerical stability without imposing strong subjective beliefs. Recommended default for applied Bayesian workflow.

C posterior_distribution

concept

Posterior Distribution

The conditional distribution p(θ | data) of parameters given observed data, obtained by combining the prior and likelihood via Bayes' rule. Represents the full state of parameter uncertainty after observing evidence and is the primary output of Bayesian inference.

C credible_interval

quantifiable

Credible Interval

An interval containing a specified probability mass of the posterior distribution (e.g. the central 95% credible interval). Unlike a frequentist confidence interval, it admits the direct probabilistic interpretation that the parameter lies within the interval with the stated probability, conditional on the model and data.

C markov_chain_monte_carlo

method

Markov Chain Monte Carlo (MCMC)

A class of algorithms (Metropolis-Hastings, Gibbs sampling) that generate samples from an arbitrary target distribution — typically a Bayesian posterior — by constructing a Markov chain whose stationary distribution is the target. The dominant approach to approximate posterior inference when the posterior lacks a closed form.

C hamiltonian_monte_carlo

method

Hamiltonian Monte Carlo (HMC)

An MCMC algorithm that uses gradient information from the log-posterior to simulate Hamiltonian dynamics, producing long-range proposals with high acceptance rates. Scales substantially better than random-walk Metropolis in moderate-to-high dimensions and is the inference engine underlying Stan and PyMC.

C no_u_turn_sampler

method

No-U-Turn Sampler (NUTS)

An adaptive extension of HMC that automatically selects the trajectory length by detecting when the simulated path starts to double back on itself, eliminating the need to hand-tune the number of leapfrog steps. The default sampler in Stan and a core driver of the modern Bayesian workflow's practicality.

C variational_inference

method

Variational Inference

An approximate inference approach that recasts posterior inference as optimization: a tractable family of distributions is fit to the true posterior by minimizing KL divergence (equivalently, maximizing the ELBO). Trades exactness for scalability; the dominant approach for very large datasets and high-dimensional latent-variable models where MCMC is prohibitive.

C hierarchical_model

method

Hierarchical (Multilevel) Model

A Bayesian model in which parameters are nested within higher-level distributions — e.g. group-level coefficients drawn from a shared population distribution. Enables partial pooling: estimates for data-poor groups are shrunk toward the population mean, reducing variance with controlled bias. Foundational for analyses with clustered, longitudinal, or cross-nested data structures.

C posterior_predictive_check

diagnostic

Posterior Predictive Check

A model-adequacy diagnostic that simulates replicated datasets from the posterior predictive distribution and compares summary statistics (mean, variance, quantiles, test quantities) of the replications against the observed data. Systematic discrepancies indicate model misspecification. The standard goodness-of-fit procedure in the Bayesian workflow.

C bayes_factor

quantifiable

Bayes Factor

The ratio of marginal likelihoods of two competing models, BF12 = p(data | M1) / p(data | M2). Quantifies the evidence the data provide for one model over another, updating prior model odds to posterior model odds. Kass & Raftery (1995) propose conventional thresholds (BF > 3 substantial, > 10 strong, > 100 decisive). Sensitive to prior specification; used cautiously in modern workflow.

C effective_sample_size

quantifiable

Effective Sample Size (ESS)

An estimate of the number of independent samples an autocorrelated MCMC chain is equivalent to, computed from the chain's autocorrelation function. The primary diagnostic for determining whether a chain has sampled the posterior adequately; ESS < ~400 for quantities of interest typically indicates insufficient sampling.

// findings.yaml

12 empirical claims

Each finding cites a source and reports effect size, standard error, p-value, and sample size where available.

F001 ↑ strong

markov_chain_monte_carlo,posterior_distribution

// method: theoretical derivation; simulation studies on hierarchical Normal and binomial models

F002 ↑ strong

weakly_informative_prior,hierarchical_model

// method: analytical analysis of prior influence; simulation comparison on 8-schools-style hierarchical model

F003 ↑ strong

no_u_turn_sampler,hamiltonian_monte_carlo,effective_sample_size

// effect: 1-3 orders of magnitude ESS/grad-eval improvement vs. random-walk Metropolis

// method: benchmark on 250-dim correlated Gaussian, logistic regression, hierarchical Bayesian logistic regression

F004 → strong

variational_inference,posterior_distribution

Mean-field variational inference provides dramatically faster approximate posterior inference than MCMC on large-scale latent-variable models (e.g. LDA topic models with millions of documents), at the cost of underestimating posterior variance due to the KL(q||p) forward-KL objective that is mode-seeking rather than mean-matching.

// method: review of applications; comparison to MCMC on LDA, Gaussian mixtures, Bayesian nonparametrics

F005 ↑ strong

hierarchical_model,bayesian_linear_regression

Partial pooling via hierarchical models produces group-level estimates with lower mean squared error than either complete pooling (ignoring group structure) or no pooling (fitting separate models per group), particularly for groups with small sample sizes. The shrinkage is automatic and adapts to the estimated between-group variance.

// method: textbook demonstration across Radon, 8-schools, roaches, and election forecasting case studies

F006 ↑ strong

posterior_predictive_check,posterior_distribution

Posterior predictive checks using realized discrepancy measures T(y, θ) — test quantities that may depend on both data and parameters — provide a principled Bayesian analogue to goodness-of-fit testing. Systematic divergence between T applied to observed vs. replicated data reveals specific modes of model misspecification that point estimates of fit cannot.

// method: theoretical framework with applications to hierarchical models and binomial/normal data

F007 ↓ strong

bayes_factor,weakly_informative_prior,informative_prior

Bayes factors are sensitive to the choice of prior on model parameters in a way that p-values are not; improper priors under the models being compared render the Bayes factor undefined. Proper, weakly-informative priors chosen with the comparison in mind are required for interpretable model comparison.

// method: review of theoretical and applied literature on Bayesian model comparison

F008 ↑ strong

weakly_informative_prior,hierarchical_model

Penalised Complexity (PC) priors constructed by penalizing KL divergence from a simpler base model (e.g. zero variance, independence) produce interpretable, weakly-informative defaults for random-effect variances, autoregressive correlation parameters, and overdispersion terms — with a single user-chosen scale that has clear probabilistic meaning.

// method: derivation from KL-divergence base-model penalization; applications to BYM spatial models, AR(1) parameters

F009 ↑ strong

no_u_turn_sampler,hamiltonian_monte_carlo,bayesian_linear_regression

Stan's implementation of NUTS with dynamic trajectory doubling, dual-averaging step-size adaptation, and dense-mass-matrix adaptation produces reliable posterior samples for a wide class of continuous-parameter Bayesian models without user tuning. Combined with automatic differentiation and a C++ backend, it enables Bayesian inference at scales previously requiring bespoke Gibbs samplers.

// method: description of Stan implementation; benchmarks on hierarchical GLMs, IRT, Gaussian processes

F010 ↑ strong

posterior_distribution,markov_chain_monte_carlo

Approximate Bayesian Computation enables posterior inference for models with intractable likelihoods (e.g. coalescent simulators, agent-based models) by replacing likelihood evaluation with simulation and distance-based acceptance of parameters. ABC-SMC and regression-adjustment variants substantially improve efficiency over rejection ABC for moderate-dimensional parameter spaces.

// method: review of rejection ABC, ABC-MCMC, ABC-SMC with applications in population genetics and epidemiology

F011 ↑ strong

weakly_informative_prior,posterior_predictive_check,no_u_turn_sampler,hierarchical_model

The modern applied Bayesian workflow centers on iterative model building: start with weakly-informative priors and a simple model, use HMC/NUTS for posterior computation, apply posterior predictive checks to detect misspecification, refine priors and model structure, and validate via cross-validation (e.g. PSIS-LOO). Prior sensitivity analysis — refitting with varied priors — is recommended for any published Bayesian analysis.

// method: expert consensus review across Bayesian methodology community

F012 ↑ moderate

credible_interval,posterior_distribution

Bayesian credible intervals admit the direct probabilistic interpretation that practitioners often incorrectly ascribe to frequentist confidence intervals. This interpretive clarity, combined with modeling flexibility for hierarchical and non-standard likelihoods, is a primary reason behavioral and social sciences have increasingly adopted Bayesian methods.

// method: pedagogical exposition with worked examples across t-tests, ANOVA, regression, and hierarchical models

// propositions.yaml

0 theoretical claims

Propositions are the field's reusable rules of thumb — they span findings without being tied to a single study.

// no propositions

This pax does not declare propositions. Propositions capture theoretical claims linking constructs.

// sources.yaml

13 citations

The evidentiary backing — papers, datasets, reports — every finding can be traced to one of these.

S001

Gelfand, A.E., Smith, A.F.M. (1990). Sampling-Based Approaches to Calculating Marginal Densities.

doi:10.1080/01621459.1990.10476213 ↗

—

S002

Kass, R.E., Raftery, A.E. (1995). Bayes Factors.

doi:10.1080/01621459.1995.10476572 ↗

—

S003

Gelman, A., Meng, X.-L., Stern, H. (1996). Posterior Predictive Assessment of Model Fitness via Realized Discrepancies.

—

S004

Gelman, A. (2006). Prior Distributions for Variance Parameters in Hierarchical Models.

doi:10.1214/06-ba117a ↗

—

S005

Gelman, A., Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models.

doi:10.1017/cbo9780511790942 ↗

—

S006

Beaumont, M.A. (2010). Approximate Bayesian Computation in Evolution and Ecology.

doi:10.1146/annurev-ecolsys-102209-144621 ↗

—

S007

Brooks, S., Gelman, A., Jones, G.L., Meng, X.-L. (eds.) (2011). Handbook of Markov Chain Monte Carlo.

doi:10.1201/b10905 ↗

—

S008

Hoffman, M.D., Gelman, A. (2014). The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.

—

S009

Kruschke, J.K. (2015). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2nd ed.).

doi:10.1016/c2012-0-00477-2 ↗

—

S010

Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddell, A. (2017). Stan: A Probabilistic Programming Language.

doi:10.18637/jss.v076.i01 ↗

—

S011

Simpson, D., Rue, H., Riebler, A., Martins, T.G., Sørbye, S.H. (2017). Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.

doi:10.1214/16-sts576 ↗

—

S012

Blei, D.M., Kucukelbir, A., McAuliffe, J.D. (2017). Variational Inference: A Review for Statisticians.

doi:10.1080/01621459.2017.1285773 ↗

—

S013

van de Schoot, R., Depaoli, S., King, R., Kramer, B., Märtens, K., Tadesse, M.G., Vannucci, M., Gelman, A., Veen, D., Willemsen, J., Yau, C. (2021). Bayesian Statistics and Modelling.

doi:10.1038/s43586-020-00001-2 ↗

—

// playbooks/

1 analytical recipes

Step-by-step recipes that wire constructs to engines. An MCP-aware agent runs them end-to-end.

B KB Findings as Informative Priors

7 steps

engine.bayesian_linear_regression

// playbook step bodies live in the .pax archive; download to inspect.

// relationships.yaml

0 construct edges

The pax's causal graph — which constructs are claimed to drive which others, and how strongly.

// no construct relationships

This pax does not declare causal or correlational links between constructs.

// pax.yaml manifest

name: bayesian-methods
version: 1.1.0
pax_type: engine
author: Josh Lambert
license: CC-BY-4.0
published_by: Praxis Agent
domain: bayesian_methods
constructs:
  - bayesian_linear_regression
  - conjugate_prior
  - informative_prior
  - weakly_informative_prior
  - posterior_distribution
  - credible_interval
  - markov_chain_monte_carlo
  - hamiltonian_monte_carlo
  - no_u_turn_sampler
  - variational_inference
  - hierarchical_model
  - posterior_predictive_check
  - bayes_factor
  - effective_sample_size
engines:
  - bayesian_linear_regression
counts:
  constructs: 14
  findings: 12
  propositions: 0
  playbooks: 1
  sources: 13