Engineering Applications of Statistics

5.4 Non-parametric hypothesis tests (chi-square, Wilcoxon rank-sum)

Citation:

Non-parametric tests are statistical methods used when data doesn't meet the assumptions of traditional parametric tests. They're handy for analyzing ordinal or nominal data, dealing with small sample sizes, and handling outliers more effectively than their parametric counterparts.

While non-parametric tests are versatile, they have some drawbacks. They're generally less powerful than parametric tests when assumptions are met, and their results can be trickier to interpret. Still, they're invaluable tools when dealing with non-normal distributions or categorical data.

Assumptions and limitations of non-parametric tests

When to use non-parametric tests

Non-parametric tests are used when the assumptions of parametric tests, such as normality and equal variances, are not met or when the data is ordinal (Likert scales) or nominal (categories)
The sample size requirements for non-parametric tests are generally less stringent than those for parametric tests, making them suitable for smaller datasets
Non-parametric tests are more robust to outliers and extreme values compared to parametric tests, as they rely on ranks or frequencies rather than actual values

Limitations of non-parametric tests

Non-parametric tests are less powerful than parametric tests when the assumptions of parametric tests are met, meaning they have a lower probability of rejecting the null hypothesis when it is false (Type II error)
Non-parametric tests may be less efficient than parametric tests when the assumptions of parametric tests are met, requiring larger sample sizes to achieve the same level of power
The results of non-parametric tests are often more difficult to interpret and communicate compared to parametric tests, as they do not provide estimates of effect sizes or confidence intervals

Chi-square test for goodness-of-fit and independence

Chi-square goodness-of-fit test

The chi-square goodness-of-fit test is used to determine whether an observed distribution of categorical data fits a specified theoretical distribution (uniform, binomial, Poisson)
The test statistic for the chi-square goodness-of-fit test is calculated as the sum of the squared differences between the observed and expected frequencies, divided by the expected frequencies: $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$
The degrees of freedom for the chi-square goodness-of-fit test are calculated as (number of categories - 1), as the expected frequencies are determined by the theoretical distribution
The p-value for the chi-square goodness-of-fit test is determined by comparing the calculated test statistic to the chi-square distribution with the appropriate degrees of freedom

Chi-square test for independence

The chi-square test for independence is used to determine whether two categorical variables are independent or associated (gender and preference for a product)
The test statistic for the chi-square test for independence is calculated using the same formula as the goodness-of-fit test, but the expected frequencies are determined by the row and column totals
The degrees of freedom for the chi-square test for independence are calculated as (number of rows - 1) × (number of columns - 1), as both variables contribute to the determination of expected frequencies
The p-value for the chi-square test for independence is determined by comparing the calculated test statistic to the chi-square distribution with the appropriate degrees of freedom

Assumptions of the chi-square test

The assumptions of the chi-square test include independence of observations, adequate sample size (expected frequencies ≥ 5 for each cell), and mutually exclusive and exhaustive categories
If the assumptions are violated, alternative tests (Fisher's exact test for small sample sizes) or data transformations (combining categories with low frequencies) may be necessary

Wilcoxon rank-sum test for comparing two samples

Overview of the Wilcoxon rank-sum test

The Wilcoxon rank-sum test, also known as the Mann-Whitney U test, is a non-parametric alternative to the independent samples t-test
The test is used to compare the medians of two independent samples when the assumptions of the t-test are not met or when the data is ordinal (Likert scales)
The Wilcoxon rank-sum test assumes that the two samples are independent, and the distributions of the two populations have the same shape and spread

Test procedure and calculation

The test procedure involves combining the observations from both samples, ranking them from smallest to largest, and calculating the sum of the ranks for each sample
The test statistic, U, is calculated as the smaller of the two rank sums: $U = min(R_1, R_2)$ , where $R_1$ and $R_2$ are the rank sums for the two samples
The p-value is determined by comparing the test statistic to the appropriate reference distribution (exact distribution for small sample sizes or normal approximation for larger sample sizes)

Interpreting the results

A small p-value (typically < 0.05) indicates that there is a significant difference between the medians of the two populations, while a large p-value suggests that there is no significant difference
The Wilcoxon rank-sum test does not provide an estimate of the magnitude of the difference between the medians, so additional measures (Hodges-Lehmann estimator) may be used to quantify the effect size

Interpreting non-parametric hypothesis test results

Statistical significance and practical significance

When interpreting the results of non-parametric tests, it is essential to consider the practical significance of the findings in addition to the statistical significance
A statistically significant result (small p-value) may not always be practically meaningful, especially if the effect size is small or the sample size is large
The context of the study and the domain knowledge should be used to determine the practical implications of the findings

Drawing conclusions and reporting results

The conclusions drawn from non-parametric tests should be stated in terms of the specific hypotheses being tested and should not be generalized beyond the scope of the study
When reporting the results of non-parametric tests, include the test statistic, degrees of freedom (if applicable), p-value, and a clear interpretation of the findings
Consider using visualizations (bar charts for chi-square tests, boxplots for Wilcoxon rank-sum test) to help communicate the results effectively to the target audience

Table of Contents

🧰engineering applications of statistics review