Statistical Inference

9.3 Contingency Tables and Log-Linear Models

Citation:

Contingency tables organize categorical data, showing relationships between variables through frequencies. They're crucial for analyzing associations in fields like market research and epidemiology, helping us understand patterns and dependencies in complex datasets.

Log-linear models take contingency table analysis further, modeling cell frequencies as functions of variable effects. These powerful tools allow us to examine intricate relationships in categorical data, test hypotheses, and make predictions about complex interactions between variables.

Contingency Tables

Construction of contingency tables

Contingency table structure organizes categorical data into rows and columns representing different categories with cell frequencies showing count for each combination
Types include two-way tables for two variables and multi-way tables for three or more variables (3D or higher)
Marginal frequencies sum row totals and column totals providing overall category distributions
Conditional frequencies show distribution of one variable given specific category of another
Expected frequencies calculate theoretical cell counts assuming independence between variables
Relative frequencies express cell counts as percentages (row percentages, column percentages) for easier comparison
Independence vs association examines whether variables are related or occur independently
Simpson's paradox demonstrates how association between variables can reverse when data combined or separated (medical treatment efficacy)

Concept of log-linear models

Log-linear models analyze relationships in categorical data by modeling cell frequencies as function of variable effects
Applied to complex categorical data analysis (market segmentation, epidemiology)
Related to logistic regression but model cell counts rather than probabilities
Advantages include handling multi-way tables and testing complex hypotheses
Model components incorporate main effects (individual variable impacts) and interaction effects (combined variable impacts)
Hierarchical nature means higher-order effects include all lower-order effects
Saturated models include all possible effects while unsaturated models omit some effects for parsimony

Fitting and interpretation of models

Model specification uses Poisson regression framework with log link function to relate predictors to cell counts
Parameter estimation employs maximum likelihood estimation often using iterative proportional fitting algorithm
Model notation utilizes design matrix to represent variable effects and log-linear equations to express relationships
Interpretation of model parameters reveals strength and direction of main effects and interaction effects on cell frequencies
Odds ratios and relative risk derived from parameters quantify association between variables
Contrast coding for categorical variables allows comparison of specific categories or groups

Model selection and goodness-of-fit

Goodness-of-fit statistics assess model fit: likelihood ratio statistic ($G^2$) and Pearson chi-square statistic ($X^2$)
Degrees of freedom calculation considers number of parameters and table dimensions
P-values and significance testing determine if model fits data better than chance
Residual analysis examines standardized and adjusted residuals to identify poorly fit cells
Model comparison techniques include nested model testing and information criteria (AIC, BIC) to balance fit and complexity
Stepwise model selection uses forward selection or backward elimination to build optimal model
Parsimony principle favors simpler models with fewer parameters when fit is comparable
Model assumptions checked include adequate sample size and avoiding sparse contingency tables with many zero cells

Table of Contents

🎣statistical inference review

9.3 Contingency Tables and Log-Linear Models

Contingency Tables

Construction of contingency tables

Concept of log-linear models

Fitting and interpretation of models

Model selection and goodness-of-fit

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes