Understanding common statistical tests is key in biostatistics and statistical prediction. These tests help analyze data, compare groups, and identify relationships, guiding decisions in research and public health. Mastering them enhances your ability to interpret and predict outcomes effectively.
-
t-test (independent and paired)
- Compares the means of two groups to determine if they are statistically different from each other.
- Independent t-test is used when comparing two separate groups (e.g., treatment vs. control).
- Paired t-test is used when comparing two related groups (e.g., measurements before and after treatment).
- Assumes normal distribution of the data and equal variances (for independent t-test).
- Provides a p-value to assess the significance of the difference between means.
-
Chi-square test
- Assesses the association between categorical variables.
- Compares observed frequencies in each category to expected frequencies under the null hypothesis.
- Requires a minimum sample size and expected frequency in each cell (usually at least 5).
- Can be used for goodness-of-fit tests or tests of independence.
- Results in a chi-square statistic and a p-value to determine significance.
-
ANOVA (one-way and two-way)
- One-way ANOVA compares means across three or more groups based on one independent variable.
- Two-way ANOVA examines the effect of two independent variables on a dependent variable and their interaction.
- Assumes normality, homogeneity of variances, and independence of observations.
- Provides an F-statistic to assess the overall significance of group differences.
- Post-hoc tests (e.g., Tukey's HSD) are often needed to identify specific group differences.
-
Linear regression
- Models the relationship between a continuous dependent variable and one or more independent variables.
- Assumes a linear relationship between the variables.
- Provides coefficients that indicate the strength and direction of the relationship.
- Assesses model fit using R-squared and p-values for individual predictors.
- Can be used for prediction and understanding the influence of predictors.
-
Logistic regression
- Used when the dependent variable is binary (e.g., success/failure).
- Models the probability of the outcome occurring based on one or more independent variables.
- Provides odds ratios to interpret the effect of predictors on the likelihood of the outcome.
- Assumes a logistic distribution of the dependent variable.
- Evaluates model fit using measures like the likelihood ratio test and pseudo R-squared.
-
Correlation analysis
- Measures the strength and direction of the linear relationship between two continuous variables.
- The correlation coefficient (e.g., Pearson's r) ranges from -1 to 1, indicating negative, no, or positive correlation.
- Assumes linearity, normality, and homoscedasticity of the data.
- Does not imply causation; correlation does not equal causation.
- Can be visualized using scatter plots.
-
Mann-Whitney U test
- A non-parametric test used to compare differences between two independent groups.
- Does not assume normal distribution and is used when data are ordinal or not normally distributed.
- Ranks all data points and compares the sum of ranks between groups.
- Provides a U statistic and a p-value to assess significance.
- Useful for small sample sizes or when data violate t-test assumptions.
-
Wilcoxon signed-rank test
- A non-parametric test for comparing two related samples or matched observations.
- Used when the differences between pairs are not normally distributed.
- Ranks the absolute differences and considers the sign of the differences.
- Provides a W statistic and a p-value to assess significance.
- Suitable for small sample sizes or ordinal data.
-
Kruskal-Wallis test
- A non-parametric alternative to one-way ANOVA for comparing three or more independent groups.
- Does not assume normal distribution and is used for ordinal or non-normally distributed data.
- Ranks all data points and compares the sum of ranks across groups.
- Provides a H statistic and a p-value to assess significance.
- Useful when ANOVA assumptions are violated.
-
F-test
- Used to compare the variances of two or more groups to assess if they are significantly different.
- Commonly used in the context of ANOVA to test the equality of variances.
- Assumes normal distribution and independence of observations.
- Provides an F statistic and a p-value to determine significance.
- Helps in deciding whether to use parametric tests that assume equal variances.