scoresvideos
Probability and Statistics
Table of Contents

Spearman rank correlation measures the strength and direction of association between two ranked variables. It's a non-parametric method that assesses monotonic relationships, making it useful for ordinal data or when assumptions for Pearson correlation aren't met.

The coefficient ranges from -1 to +1, with values closer to these extremes indicating stronger relationships. It's less sensitive to outliers than Pearson correlation and can detect non-linear monotonic relationships, making it versatile for various fields of study.

Definition of Spearman rank correlation

  • Spearman rank correlation is a non-parametric measure of the strength and direction of association between two ranked variables
  • Assesses the monotonic relationship between two variables, where the variables tend to change together but not necessarily at a constant rate
  • Calculates a correlation coefficient, denoted by the Greek letter ρ (rho) or rs, which ranges from -1 to +1

Assumptions for Spearman rank correlation

  • The data must be at least ordinal, meaning that the variables can be ranked in a meaningful order
  • The relationship between the two variables should be monotonic, either increasing or decreasing consistently
  • The observations must be paired and come from the same population
  • There are no specific assumptions about the distribution of the data or the presence of outliers

Calculating Spearman rank correlation coefficient

Ranking data values

  • Assign ranks to each observation within each variable separately, starting with 1 for the smallest value
  • If there are tied values, assign the average rank to each tied observation
  • The sum of the ranks for each variable should be equal

Differences between ranks

  • Calculate the difference between the ranks (di) for each pair of observations
  • Square each difference to obtain di²
  • Sum the squared differences to obtain Σdi²

Formula for Spearman rank correlation

  • The Spearman rank correlation coefficient is calculated using the following formula: rs=16di2n(n21)r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}
  • Where:
    • rs is the Spearman rank correlation coefficient
    • di is the difference between the ranks of the ith pair of observations
    • n is the number of pairs of observations

Interpreting Spearman rank correlation

Strength of monotonic relationship

  • The absolute value of the Spearman rank correlation coefficient indicates the strength of the monotonic relationship between the two variables
  • A value close to 1 suggests a strong monotonic relationship, while a value close to 0 indicates a weak or no monotonic relationship

Positive vs negative correlation

  • A positive Spearman rank correlation coefficient (0 < rs ≤ 1) indicates a monotonically increasing relationship, where both variables tend to increase together
  • A negative Spearman rank correlation coefficient (-1 ≤ rs < 0) indicates a monotonically decreasing relationship, where one variable tends to decrease as the other increases

No correlation

  • A Spearman rank correlation coefficient close to 0 suggests no monotonic relationship between the variables
  • However, this does not necessarily imply that there is no relationship at all, as there could be a non-monotonic relationship

Hypothesis testing with Spearman rank correlation

Null and alternative hypotheses

  • The null hypothesis (H0) states that there is no monotonic relationship between the two variables in the population
  • The alternative hypothesis (Ha) states that there is a monotonic relationship between the two variables in the population

Test statistic and p-value

  • The test statistic for the Spearman rank correlation is the sample correlation coefficient (rs)
  • The p-value is the probability of obtaining a sample correlation coefficient as extreme as the observed value, assuming the null hypothesis is true

Significance level and decision rule

  • Choose a significance level (α) for the hypothesis test (common choices are 0.01, 0.05, or 0.10)
  • If the p-value is less than the chosen significance level, reject the null hypothesis in favor of the alternative hypothesis
  • If the p-value is greater than or equal to the significance level, fail to reject the null hypothesis

Comparing Spearman and Pearson correlation

Similarities in interpretation

  • Both Spearman and Pearson correlation coefficients range from -1 to +1
  • The sign of the correlation coefficient indicates the direction of the relationship (positive or negative)
  • The absolute value of the correlation coefficient indicates the strength of the relationship

Differences in assumptions

  • Pearson correlation assumes a linear relationship between the variables and requires interval or ratio data
  • Spearman rank correlation assumes a monotonic relationship and requires at least ordinal data
  • Pearson correlation is parametric and assumes normally distributed data, while Spearman rank correlation is non-parametric and does not make assumptions about the distribution

Robustness to outliers

  • Spearman rank correlation is less sensitive to outliers than Pearson correlation because it is based on ranks rather than actual values
  • Outliers can have a significant impact on the Pearson correlation coefficient, potentially leading to misleading results

Applications of Spearman rank correlation

Ordinal or ranked data

  • Spearman rank correlation is particularly useful when dealing with ordinal or ranked data, such as survey responses on a Likert scale (strongly disagree to strongly agree)
  • It can be applied to data that does not meet the assumptions of Pearson correlation, such as non-normally distributed data or data with outliers

Non-linear monotonic relationships

  • Spearman rank correlation can detect monotonic relationships that are not necessarily linear
  • This makes it suitable for assessing relationships between variables that have a consistent increasing or decreasing trend, even if the rate of change is not constant

Examples in various fields

  • In psychology, Spearman rank correlation can be used to study the relationship between participants' rankings of different stimuli (preferences for various colors)
  • In environmental science, it can be used to assess the relationship between the ranks of different pollutants in various locations (air quality rankings in different cities)
  • In finance, Spearman rank correlation can be used to analyze the relationship between the performance rankings of different stocks or investment funds

Limitations of Spearman rank correlation

Sensitivity to tied ranks

  • When there are many tied values in the data, the presence of tied ranks can affect the accuracy of the Spearman rank correlation coefficient
  • Tied ranks are assigned the average rank, which may not accurately represent the true relationship between the variables

Inability to detect non-monotonic relationships

  • Spearman rank correlation only assesses monotonic relationships and cannot detect non-monotonic relationships, such as U-shaped or inverted U-shaped relationships
  • If the relationship between the variables is non-monotonic, the Spearman rank correlation coefficient may not accurately represent the true nature of the relationship

Interpretation challenges with small samples

  • When working with small sample sizes, the interpretation of the Spearman rank correlation coefficient can be challenging
  • Small samples may not provide a reliable representation of the population, leading to increased uncertainty in the results
  • It is important to consider the sample size and the potential impact of sampling variability when interpreting Spearman rank correlation results