scoresvideos
Intro to Business Statistics
Table of Contents

Data distributions come in different shapes, affecting how we interpret their central tendencies. Symmetrical distributions have equal tails, while skewed ones lean left or right. Understanding these shapes helps us choose the right measure of central tendency.

Mean, median, and mode are key measures of central tendency, each with unique properties. The mean is sensitive to outliers, the median is robust, and the mode shows the most common value. Skewness values help quantify distribution asymmetry, guiding our interpretation of data.

Measures of Central Tendency and Distribution Shape

Symmetrical vs skewed distributions

  • Symmetrical distributions have histograms that are mirror-symmetric about the center, with the mean, median, and mode being equal or very close to each other (normal distribution, uniform distribution)
  • Skewed distributions have asymmetric histograms, with one tail longer than the other and the mean, median, and mode not equal and potentially far apart
    • Right-skewed (positively skewed) distributions have a longer tail on the right side of the histogram, with the mean > median > mode (income distribution)
    • Left-skewed (negatively skewed) distributions have a longer tail on the left side of the histogram, with the mode > median > mean (exam scores with a difficult test)

Measures of central tendency

  • Mean represents the arithmetic average of all values in a dataset, calculated using the formula $\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$, and is sensitive to extreme values (outliers)
    • In a symmetrical distribution, the mean equals the median and mode
    • In a right-skewed distribution, the mean is greater than the median and mode
    • In a left-skewed distribution, the mean is less than the median and mode
  • Median is the middle value when the dataset is ordered from smallest to largest, calculated as the $\frac{n+1}{2}$th value for odd n or the average of the $\frac{n}{2}$th and $(\frac{n}{2}+1)$th values for even n, and is a robust measure not affected by extreme values
    • In a symmetrical distribution, the median equals the mean and mode
    • In a skewed distribution, the median lies between the mean and mode
  • Mode is the most frequently occurring value in a dataset and can have no mode, one mode (unimodal), or multiple modes (bimodal, multimodal)
    • In a symmetrical distribution, the mode equals the mean and median
    • In a right-skewed distribution, the mode is less than the median and mean
    • In a left-skewed distribution, the mode is greater than the median and mean

Interpreting skewness values

  • Skewness measures the asymmetry of a distribution using Pearson's coefficient of skewness: $SK = \frac{3(\bar{x} - \text{Median})}{s}$, where $\bar{x}$ is the sample mean and s is the sample standard deviation
  • Interpretation of skewness values:
    1. If SK = 0, the distribution is symmetrical
    2. If SK > 0, the distribution is right-skewed (positively skewed), with higher positive values indicating a more severe right skew
    3. If SK < 0, the distribution is left-skewed (negatively skewed), with lower negative values indicating a more severe left skew
  • Rule of thumb for interpreting skewness values:
    • If |SK| < 0.5, the distribution is approximately symmetrical
    • If 0.5 ≤ |SK| < 1, the distribution is moderately skewed
    • If |SK| ≥ 1, the distribution is highly skewed

Additional measures and visualizations

  • Standard deviation measures the spread of data around the mean, providing insight into the variability of a distribution
  • Quartiles divide the dataset into four equal parts, with the second quartile being the median
  • Percentiles indicate the relative position of a data point within the distribution
  • Box plots visually represent the quartiles, median, and potential outliers of a dataset
  • Histograms provide a graphical representation of the frequency distribution of a dataset, helping to visualize its shape and skewness