8.6 The Normal Distribution

3 min readjune 18, 2024

The is a key concept in statistics, showing how data spreads around a central point. It's known for its bell-shaped curve and the , which tells us where most data falls.

Z-scores help compare data points across different datasets by showing how far they are from the mean. Tools like Google Sheets make it easy to work with , calculating and data values quickly.

The Normal Distribution

68-95-99.7 rule for percentiles

Top images from around the web for 68-95-99.7 rule for percentiles
Top images from around the web for 68-95-99.7 rule for percentiles
  • Describes the distribution of data in a normal distribution
    • 68% of data falls within one ±1σ\pm 1\sigma of mean μ\mu (majority of data)
    • 95% of data falls within two standard deviations ±2σ\pm 2\sigma of mean μ\mu (nearly all data)
    • 99.7% of data falls within three standard deviations ±3σ\pm 3\sigma of mean μ\mu (almost entire dataset)
  • Calculate percentiles using the by determining mean μ\mu and standard deviation σ\sigma of dataset
    • Find desired percentile range using the rule (68th percentile μ±1σ\mu \pm 1\sigma)
  • Calculate data values using the 68-95-99.7 rule by determining mean μ\mu and standard deviation σ\sigma of dataset
    • Find data values corresponding to desired percentile range using the rule (95th percentile μ±2σ\mu \pm 2\sigma)
  • The of a normal distribution represents the probability of observing a value within a specific range

Standardized scores in normal distributions

  • () represents number of standard deviations a data point is from mean
  • Calculate z-score using formula z=xμσz = \frac{x - \mu}{\sigma}
    • xx individual data point
    • μ\mu mean of dataset
    • σ\sigma standard deviation of dataset
  • Interpret z-scores
    • Positive z-score indicates data point above mean (right side of distribution)
    • Negative z-score indicates data point below mean (left side of distribution)
    • Z-score of 0 indicates data point equal to mean (center of distribution)
    • Magnitude of z-score represents distance from mean in standard deviations (z-score 1.5 means 1.5 standard deviations above mean)

Technology for normal distribution calculations

  • Google Sheets provides functions for normal distributions
  • Find percentiles for normal distribution using
    [NORM.DIST](https://www.fiveableKeyTerm:NORM.DIST)
    function
    • Syntax:
      =NORM.DIST(x, mean, standard_deviation, cumulative)
      • x
        data value
      • mean
        mean of distribution
      • standard_deviation
        standard deviation of distribution
      • cumulative
        boolean value (TRUE or FALSE) indicating use of CDF
    • Set
      cumulative
      to TRUE to find percentile for given data value
  • Find data values for given percentile using
    [NORM.INV](https://www.fiveableKeyTerm:NORM.INV)
    function
    • Syntax:
      =NORM.INV(probability, mean, standard_deviation)
      • probability
        desired percentile as decimal between 0 and 1
      • mean
        mean of distribution
      • standard_deviation
        standard deviation of distribution
    • Function returns data value corresponding to given percentile

Characteristics of Normal Distributions

  • Normal distributions are a type of continuous
  • The shape of a normal distribution can be described using measures such as:
    • : indicates the symmetry of the distribution
    • : measures the thickness of the tails relative to the center of the distribution
  • The states that the sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the underlying population distribution

Key Terms to Review (26)

68-95-99.7 Rule: The 68-95-99.7 Rule describes how data is distributed in a normal distribution. It states that approximately 68% of the data falls within one standard deviation, 95% within two standard deviations, and 99.7% within three standard deviations from the mean.
68-95-99.7 rule: The 68-95-99.7 rule, also known as the empirical rule, describes how data is distributed in a normal distribution. Specifically, it states that approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and around 99.7% falls within three standard deviations. This rule is fundamental in understanding how probabilities relate to the normal distribution and helps in making predictions about data sets.
Area Under the Curve: The area under the curve refers to the region enclosed between a curve and the horizontal axis, often representing the accumulation of a quantity over an interval. In the context of probability and statistics, particularly with the normal distribution, this area corresponds to the likelihood of a random variable falling within a specific range of values, serving as a fundamental concept for understanding probabilities and distributions.
Bell curve: A bell curve, also known as a normal distribution, is a graphical representation of data that shows how values are distributed around a central mean. It is characterized by its symmetrical shape, where most values cluster around the mean, and the probabilities for values taper off equally in both directions from the mean. This concept is crucial in understanding statistical measures like mean, median, and mode, as well as in determining variability through range and standard deviation.
Carl Friedrich Gauss: Carl Friedrich Gauss was a German mathematician and scientist known for his contributions to many fields, including number theory, statistics, and astronomy. He is best recognized for developing the concept of the normal distribution, a fundamental statistical tool that describes how data points are distributed around a mean value, and for his role in formulating the Central Limit Theorem, which explains the significance of the normal distribution in real-world applications.
Central Limit Theorem: The Central Limit Theorem states that when independent random variables are added, their normalized sum tends to a normal distribution, even if the original variables themselves are not normally distributed. This powerful concept is foundational in statistics as it allows for the use of the normal distribution to approximate the behavior of sums of random variables, particularly when considering large sample sizes.
Continuous random variable: A continuous random variable is a type of variable that can take on an infinite number of possible values within a given range. Unlike discrete random variables, which can only assume specific values, continuous random variables can represent measurements such as height, weight, or time, making them crucial in statistical analysis and probability distributions like the normal distribution.
Cumulative distribution function: The cumulative distribution function (CDF) of a random variable gives the probability that the variable takes on a value less than or equal to a specified number. It is a non-decreasing function that ranges from 0 to 1.
Cumulative Distribution Function: A cumulative distribution function (CDF) describes the probability that a random variable takes on a value less than or equal to a specific value. It provides a complete description of the distribution of a random variable, and is crucial in understanding both discrete and continuous probability distributions, showing how probabilities accumulate over the range of possible outcomes.
Empirical Rule: The empirical rule is a statistical guideline that states that for a normal distribution, approximately 68% of the data falls within one standard deviation from the mean, about 95% falls within two standard deviations, and around 99.7% falls within three standard deviations. This concept helps to understand how data is spread out and gives insights into the distribution of values within a dataset.
Kurtosis: Kurtosis is a statistical measure that describes the shape of a probability distribution's tails in relation to its overall shape. It provides insights into the presence of outliers in the data, distinguishing between distributions that have heavier or lighter tails compared to a normal distribution. This concept is particularly important when examining the characteristics of a normal distribution, as it helps to assess how much data deviates from the expected values.
NORM.DIST: NORM.DIST is a function used in statistics to compute the probability of a given value occurring within a normal distribution. It calculates the area under the normal curve for a specified mean and standard deviation, providing insights into how likely a certain outcome is in relation to the overall distribution. This function is essential for understanding the behavior of normally distributed data, making it valuable for various applications in fields like psychology, finance, and natural sciences.
NORM.INV: NORM.INV is a statistical function used in various software, like Excel, to calculate the inverse of the normal cumulative distribution function for a given probability. This function is crucial for understanding probabilities and distributions within the context of the normal distribution, allowing one to determine the specific value in a normally distributed dataset that corresponds to a given percentile or probability level. By connecting probabilities back to actual data values, it enhances decision-making in fields like finance, science, and engineering.
Normal Distribution: Normal distribution is a statistical concept that describes how data points are spread out around the mean, forming a symmetric, bell-shaped curve. This curve illustrates that most observations cluster around the central peak, with probabilities tapering off symmetrically on either side, making it essential for understanding probability and variability in data analysis.
Normal distributions: A normal distribution is a probability distribution that is symmetric around the mean, showing that data near the mean are more frequent in occurrence. It forms a bell-shaped curve where most of the observations cluster around the central peak.
Percentiles: Percentiles are statistical measures that indicate the relative standing of a value within a data set, representing the percentage of observations that fall below a certain value. Understanding percentiles helps in interpreting data distribution, particularly when comparing scores or values within a population. They play a crucial role in summarizing data by providing insights into its distribution and variability.
Probability density function: A probability density function (PDF) describes the likelihood of a continuous random variable taking on a particular value. The area under the PDF curve over an interval represents the probability that the variable falls within that interval.
Probability Density Function: A probability density function (PDF) is a statistical function that describes the likelihood of a continuous random variable taking on a particular value. The PDF provides the probabilities of the random variable falling within a particular range of values, which can be determined by calculating the area under the curve of the function within that range. This concept is crucial in understanding distributions such as binomial and normal distributions, where it helps illustrate how probabilities are distributed across different outcomes.
Probability Distribution: A probability distribution is a mathematical function that describes the likelihood of different outcomes in a random experiment. It provides a comprehensive overview of how probabilities are assigned to each possible value of a random variable, and can be represented either as a discrete or continuous distribution. Understanding probability distributions is essential for analyzing uncertainty and making informed predictions based on data.
Skewness: Skewness is a statistical measure that describes the asymmetry of a probability distribution around its mean. It indicates whether the data points tend to be more concentrated on one side of the mean, giving insight into the shape and behavior of the distribution. A positive skewness indicates that the tail on the right side is longer or fatter than the left side, while a negative skewness indicates the opposite. Understanding skewness helps in analyzing data distributions and their percentiles, as well as comparing them to the normal distribution.
Standard Deviation: Standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a set of data values. It indicates how much individual data points deviate from the mean, helping to understand the distribution and spread of data. A low standard deviation means that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range. This concept is crucial for interpreting expected values, analyzing central tendencies like the mean, median, and mode, and assessing data distributions, including normal distributions.
Standard normal distribution: The standard normal distribution is a special type of normal distribution that has a mean of 0 and a standard deviation of 1. This distribution allows for the comparison of different data sets by transforming any normal distribution into this standardized form, making it easier to calculate probabilities and z-scores. It serves as a critical tool in statistics, particularly in hypothesis testing and confidence intervals.
Standardized score: A standardized score, also known as a z-score, indicates how many standard deviations an element is from the mean of its distribution. It allows for comparison between different sets of data by converting them into a common scale.
Z-score: A z-score is a statistical measure that indicates how many standard deviations a data point is from the mean of a dataset. It helps to understand the relative position of an individual score within a distribution, making it essential for comparing scores from different datasets and analyzing their distributions.
μ: The symbol μ represents the mean or average of a set of values in statistics, particularly in the context of the normal distribution. It serves as a measure of central tendency, indicating where the center of the data lies, and is crucial in understanding the behavior of data distributions. In a normal distribution, μ is located at the peak of the bell curve, which illustrates that most data points cluster around this central value.
Σ: The symbol Σ, known as sigma, represents the mathematical concept of summation, which is the process of adding a sequence of numbers. In various mathematical contexts, Σ is used to denote the sum of a series of terms, making it essential for understanding series and distributions, among other applications. Its significance extends to different areas like calculating total values in geometric sequences, determining variability in statistics, and analyzing probabilities in distributions.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.