Probability and Statistics

10.2 Spearman rank correlation

Citation:

Spearman rank correlation measures the strength and direction of association between two ranked variables. It's a non-parametric method that assesses monotonic relationships, making it useful for ordinal data or when assumptions for Pearson correlation aren't met.

The coefficient ranges from -1 to +1, with values closer to these extremes indicating stronger relationships. It's less sensitive to outliers than Pearson correlation and can detect non-linear monotonic relationships, making it versatile for various fields of study.

Definition of Spearman rank correlation

Spearman rank correlation is a non-parametric measure of the strength and direction of association between two ranked variables
Assesses the monotonic relationship between two variables, where the variables tend to change together but not necessarily at a constant rate
Calculates a correlation coefficient, denoted by the Greek letter ρ (rho) or rs, which ranges from -1 to +1

Assumptions for Spearman rank correlation

The data must be at least ordinal, meaning that the variables can be ranked in a meaningful order
The relationship between the two variables should be monotonic, either increasing or decreasing consistently
The observations must be paired and come from the same population
There are no specific assumptions about the distribution of the data or the presence of outliers

Calculating Spearman rank correlation coefficient

Ranking data values

Assign ranks to each observation within each variable separately, starting with 1 for the smallest value
If there are tied values, assign the average rank to each tied observation
The sum of the ranks for each variable should be equal

Differences between ranks

Calculate the difference between the ranks (di) for each pair of observations
Square each difference to obtain di²
Sum the squared differences to obtain Σdi²

Formula for Spearman rank correlation

The Spearman rank correlation coefficient is calculated using the following formula: $r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}$
Where:
- rs is the Spearman rank correlation coefficient
- di is the difference between the ranks of the ith pair of observations
- n is the number of pairs of observations

Interpreting Spearman rank correlation

Strength of monotonic relationship

The absolute value of the Spearman rank correlation coefficient indicates the strength of the monotonic relationship between the two variables
A value close to 1 suggests a strong monotonic relationship, while a value close to 0 indicates a weak or no monotonic relationship

Positive vs negative correlation

A positive Spearman rank correlation coefficient (0 < rs ≤ 1) indicates a monotonically increasing relationship, where both variables tend to increase together
A negative Spearman rank correlation coefficient (-1 ≤ rs < 0) indicates a monotonically decreasing relationship, where one variable tends to decrease as the other increases

No correlation

A Spearman rank correlation coefficient close to 0 suggests no monotonic relationship between the variables
However, this does not necessarily imply that there is no relationship at all, as there could be a non-monotonic relationship

Hypothesis testing with Spearman rank correlation

Null and alternative hypotheses

The null hypothesis (H0) states that there is no monotonic relationship between the two variables in the population
The alternative hypothesis (Ha) states that there is a monotonic relationship between the two variables in the population

Test statistic and p-value

The test statistic for the Spearman rank correlation is the sample correlation coefficient (rs)
The p-value is the probability of obtaining a sample correlation coefficient as extreme as the observed value, assuming the null hypothesis is true

Significance level and decision rule

Choose a significance level (α) for the hypothesis test (common choices are 0.01, 0.05, or 0.10)
If the p-value is less than the chosen significance level, reject the null hypothesis in favor of the alternative hypothesis
If the p-value is greater than or equal to the significance level, fail to reject the null hypothesis

Comparing Spearman and Pearson correlation

Similarities in interpretation

Both Spearman and Pearson correlation coefficients range from -1 to +1
The sign of the correlation coefficient indicates the direction of the relationship (positive or negative)
The absolute value of the correlation coefficient indicates the strength of the relationship

Differences in assumptions

Pearson correlation assumes a linear relationship between the variables and requires interval or ratio data
Spearman rank correlation assumes a monotonic relationship and requires at least ordinal data
Pearson correlation is parametric and assumes normally distributed data, while Spearman rank correlation is non-parametric and does not make assumptions about the distribution

Robustness to outliers

Spearman rank correlation is less sensitive to outliers than Pearson correlation because it is based on ranks rather than actual values
Outliers can have a significant impact on the Pearson correlation coefficient, potentially leading to misleading results

Applications of Spearman rank correlation

Ordinal or ranked data

Spearman rank correlation is particularly useful when dealing with ordinal or ranked data, such as survey responses on a Likert scale (strongly disagree to strongly agree)
It can be applied to data that does not meet the assumptions of Pearson correlation, such as non-normally distributed data or data with outliers

Non-linear monotonic relationships

Spearman rank correlation can detect monotonic relationships that are not necessarily linear
This makes it suitable for assessing relationships between variables that have a consistent increasing or decreasing trend, even if the rate of change is not constant

Examples in various fields

In psychology, Spearman rank correlation can be used to study the relationship between participants' rankings of different stimuli (preferences for various colors)
In environmental science, it can be used to assess the relationship between the ranks of different pollutants in various locations (air quality rankings in different cities)
In finance, Spearman rank correlation can be used to analyze the relationship between the performance rankings of different stocks or investment funds

Limitations of Spearman rank correlation

Sensitivity to tied ranks

When there are many tied values in the data, the presence of tied ranks can affect the accuracy of the Spearman rank correlation coefficient
Tied ranks are assigned the average rank, which may not accurately represent the true relationship between the variables

Inability to detect non-monotonic relationships

Spearman rank correlation only assesses monotonic relationships and cannot detect non-monotonic relationships, such as U-shaped or inverted U-shaped relationships
If the relationship between the variables is non-monotonic, the Spearman rank correlation coefficient may not accurately represent the true nature of the relationship

Interpretation challenges with small samples

When working with small sample sizes, the interpretation of the Spearman rank correlation coefficient can be challenging
Small samples may not provide a reliable representation of the population, leading to increased uncertainty in the results
It is important to consider the sample size and the potential impact of sampling variability when interpreting Spearman rank correlation results

Table of Contents

📊probability and statistics review