Preparatory Statistics

📈Preparatory Statistics Unit 7 – Discrete Variables & Probability Distributions

Discrete variables and probability distributions form the backbone of statistical analysis for events with distinct outcomes. This unit explores how to quantify and predict the likelihood of specific events occurring, from coin flips to customer behavior. Understanding these concepts is crucial for decision-making in various fields. By mastering probability calculations and distribution types, you'll gain valuable tools for analyzing data, making predictions, and solving real-world problems in areas like quality control, marketing, and reliability engineering.

Key Concepts

  • Discrete variables take on a finite or countably infinite number of distinct values
  • Probability is the likelihood of an event occurring, expressed as a number between 0 and 1
    • 0 indicates an impossible event, while 1 represents a certain event
  • Probability distributions describe the likelihood of each possible outcome for a discrete variable
  • Expected value is the average value of a discrete random variable over many trials, calculated as the sum of each outcome multiplied by its probability
  • Variance and standard deviation measure the spread or dispersion of a probability distribution
    • Variance is the average squared deviation from the mean, while standard deviation is the square root of variance
  • Independence means that the occurrence of one event does not affect the probability of another event
  • Mutually exclusive events cannot occur simultaneously, and their probabilities sum to 1

Types of Discrete Variables

  • Bernoulli variables have only two possible outcomes, typically labeled as success (1) or failure (0)
    • Examples include flipping a coin (heads or tails) or a yes/no survey question
  • Binomial variables count the number of successes in a fixed number of independent Bernoulli trials
    • Parameters are the number of trials (nn) and the probability of success (pp)
  • Poisson variables count the number of events occurring in a fixed interval of time or space
    • The parameter is the average rate of occurrence (λ\lambda)
  • Geometric variables count the number of trials until the first success in a series of independent Bernoulli trials
    • The parameter is the probability of success (pp)
  • Hypergeometric variables count the number of successes in a fixed number of trials without replacement from a finite population
    • Parameters include the population size, the number of successes in the population, and the number of trials

Probability Basics

  • The probability of an event AA is denoted as P(A)P(A)
  • The complement of an event AA, denoted as AA' or Aˉ\bar{A}, is the event that AA does not occur
    • P(A)=1P(A)P(A') = 1 - P(A)
  • The union of two events AA and BB, denoted as ABA \cup B, is the event that either AA or BB occurs
    • P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
  • The intersection of two events AA and BB, denoted as ABA \cap B, is the event that both AA and BB occur
    • For independent events, P(AB)=P(A)×P(B)P(A \cap B) = P(A) \times P(B)
  • Conditional probability is the probability of an event AA occurring given that event BB has occurred, denoted as P(AB)P(A|B)
    • P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}
  • Bayes' theorem relates conditional probabilities and can be used to update probabilities based on new information
    • P(AB)=P(BA)×P(A)P(B)P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}

Discrete Probability Distributions

  • A probability mass function (PMF) gives the probability of each possible outcome for a discrete random variable
    • f(x)=P(X=x)f(x) = P(X = x), where XX is the random variable and xx is a specific value
  • The cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a specific value
    • F(x)=P(Xx)=yxf(y)F(x) = P(X \leq x) = \sum_{y \leq x} f(y)
  • The expected value (mean) of a discrete random variable XX is E(X)=xx×f(x)E(X) = \sum_{x} x \times f(x)
  • The variance of a discrete random variable XX is Var(X)=E((Xμ)2)=x(xμ)2×f(x)Var(X) = E((X - \mu)^2) = \sum_{x} (x - \mu)^2 \times f(x), where μ=E(X)\mu = E(X)
    • The standard deviation is the square root of the variance, σ=Var(X)\sigma = \sqrt{Var(X)}
  • The moment-generating function (MGF) is a tool for finding moments of a distribution, such as the mean and variance
    • MX(t)=E(etX)=xetx×f(x)M_X(t) = E(e^{tX}) = \sum_{x} e^{tx} \times f(x)

Common Discrete Distributions

  • Bernoulli distribution: P(X=1)=pP(X = 1) = p, P(X=0)=1pP(X = 0) = 1 - p
    • E(X)=pE(X) = p, Var(X)=p(1p)Var(X) = p(1 - p)
  • Binomial distribution: P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1 - p)^{n-k}
    • E(X)=npE(X) = np, Var(X)=np(1p)Var(X) = np(1 - p)
  • Poisson distribution: P(X=k)=eλλkk!P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}
    • E(X)=λE(X) = \lambda, Var(X)=λVar(X) = \lambda
  • Geometric distribution: P(X=k)=(1p)k1pP(X = k) = (1 - p)^{k-1} p
    • E(X)=1pE(X) = \frac{1}{p}, Var(X)=1pp2Var(X) = \frac{1 - p}{p^2}
  • Hypergeometric distribution: P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}
    • E(X)=nKNE(X) = n \frac{K}{N}, Var(X)=nKN(1KN)NnN1Var(X) = n \frac{K}{N} (1 - \frac{K}{N}) \frac{N-n}{N-1}

Calculating Probabilities

  • To find the probability of a specific outcome, use the PMF for the appropriate distribution
    • Substitute the given parameters and the desired outcome into the formula
  • To find the probability of a range of outcomes, sum the probabilities of each individual outcome in the range
    • Alternatively, use the CDF to find P(aXb)=F(b)F(a)P(a \leq X \leq b) = F(b) - F(a)
  • For independent events, multiply the probabilities of each event to find the probability of their intersection
  • For mutually exclusive events, add the probabilities of each event to find the probability of their union
  • Use conditional probability and Bayes' theorem when the probability of an event depends on the occurrence of another event

Applications in Real-World Scenarios

  • Quality control: Model the number of defective items in a batch using a binomial distribution
    • Determine the probability of accepting or rejecting a batch based on the number of defects found in a sample
  • Insurance claims: Use a Poisson distribution to model the number of claims filed within a given time period
    • Calculate the probability of exceeding a certain number of claims to set premiums and reserves
  • Genetics: Apply the hypergeometric distribution to calculate probabilities in population genetics
    • For example, find the probability of observing a specific number of individuals with a rare allele in a sample from a larger population
  • Marketing: Employ the geometric distribution to model the number of ads a customer sees before making a purchase
    • Estimate the expected number of ads needed to generate a sale and optimize ad placement strategies
  • Reliability engineering: Utilize the negative binomial distribution to model the number of failures before a specified number of successes
    • Assess the reliability of systems or components and plan maintenance schedules accordingly

Practice Problems and Examples

  • A fair coin is flipped 10 times. What is the probability of observing exactly 7 heads?
    • P(X=7)=(107)(0.5)7(0.5)30.1172P(X = 7) = \binom{10}{7} (0.5)^7 (0.5)^3 \approx 0.1172
  • The average number of customers entering a store per hour is 30. What is the probability that more than 35 customers enter the store in a given hour?
    • Using the Poisson distribution with λ=30\lambda = 30, P(X>35)=1P(X35)0.1185P(X > 35) = 1 - P(X \leq 35) \approx 0.1185
  • In a batch of 100 items, 10 are known to be defective. If 20 items are randomly selected without replacement, what is the probability that exactly 3 of them are defective?
    • Using the hypergeometric distribution, P(X=3)=(103)(9017)(10020)0.1954P(X = 3) = \frac{\binom{10}{3} \binom{90}{17}}{\binom{100}{20}} \approx 0.1954
  • A machine produces defective items with a probability of 0.02. What is the probability that the 5th defective item is produced on the 200th item?
    • Using the negative binomial distribution, P(X=200)=(1994)(0.02)5(0.98)1950.0190P(X = 200) = \binom{199}{4} (0.02)^5 (0.98)^{195} \approx 0.0190
  • Two fair dice are rolled. What is the probability that the sum of the numbers rolled is either 7 or 11?
    • P(sum=7)=636=16P(\text{sum} = 7) = \frac{6}{36} = \frac{1}{6} and P(sum=11)=236=118P(\text{sum} = 11) = \frac{2}{36} = \frac{1}{18}
    • Since these events are mutually exclusive, P(sum=7 or 11)=16+118=29P(\text{sum} = 7 \text{ or } 11) = \frac{1}{6} + \frac{1}{18} = \frac{2}{9}


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.