Preparatory Statistics

📈Preparatory Statistics Unit 7 – Discrete Variables & Probability Distributions

Discrete variables and probability distributions form the backbone of statistical analysis for events with distinct outcomes. This unit explores how to quantify and predict the likelihood of specific events occurring, from coin flips to customer behavior. Understanding these concepts is crucial for decision-making in various fields. By mastering probability calculations and distribution types, you'll gain valuable tools for analyzing data, making predictions, and solving real-world problems in areas like quality control, marketing, and reliability engineering.

Study Guides for Unit 7

7.1

Discrete Probability Distributions

3 min read

7.2

Binomial Distribution

3 min read

7.3

Poisson Distribution

3 min read

Key Concepts

Discrete variables take on a finite or countably infinite number of distinct values
Probability is the likelihood of an event occurring, expressed as a number between 0 and 1
- 0 indicates an impossible event, while 1 represents a certain event
Probability distributions describe the likelihood of each possible outcome for a discrete variable
Expected value is the average value of a discrete random variable over many trials, calculated as the sum of each outcome multiplied by its probability
Variance and standard deviation measure the spread or dispersion of a probability distribution
- Variance is the average squared deviation from the mean, while standard deviation is the square root of variance
Independence means that the occurrence of one event does not affect the probability of another event
Mutually exclusive events cannot occur simultaneously, and their probabilities sum to 1

Types of Discrete Variables

Bernoulli variables have only two possible outcomes, typically labeled as success (1) or failure (0)
- Examples include flipping a coin (heads or tails) or a yes/no survey question
Binomial variables count the number of successes in a fixed number of independent Bernoulli trials
- Parameters are the number of trials ( $n$ ) and the probability of success ( $p$ )
Poisson variables count the number of events occurring in a fixed interval of time or space
- The parameter is the average rate of occurrence ( $\lambda$ )
Geometric variables count the number of trials until the first success in a series of independent Bernoulli trials
- The parameter is the probability of success ( $p$ )
Hypergeometric variables count the number of successes in a fixed number of trials without replacement from a finite population
- Parameters include the population size, the number of successes in the population, and the number of trials

Probability Basics

The probability of an event $A$ is denoted as $P(A)$
The complement of an event $A$ $A$ , denoted as $A'$ $A^{'}$ or $\bar{A}$ $\overset{ˉ}{A}$ , is the event that $A$ $A$ does not occur
- $P(A') = 1 - P(A)$
The union of two events $A$ $A$ and $B$ $B$ , denoted as $A \cup B$ $A \cup B$ , is the event that either $A$ $A$ or $B$ $B$ occurs
- $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
The intersection of two events $A$ $A$ and $B$ $B$ , denoted as $A \cap B$ $A \cap B$ , is the event that both $A$ $A$ and $B$ $B$ occur
- For independent events, $P(A \cap B) = P(A) \times P(B)$
Conditional probability is the probability of an event $A$ $A$ occurring given that event $B$ $B$ has occurred, denoted as $P(A|B)$ $P (A ∣ B)$
- $P(A|B) = \frac{P(A \cap B)}{P(B)}$
Bayes' theorem relates conditional probabilities and can be used to update probabilities based on new information
- $P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$

Discrete Probability Distributions

A probability mass function (PMF) gives the probability of each possible outcome for a discrete random variable
- $f(x) = P(X = x)$ , where $X$ is the random variable and $x$ is a specific value
The cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a specific value
- $F(x) = P(X \leq x) = \sum_{y \leq x} f(y)$
The expected value (mean) of a discrete random variable $X$ is $E(X) = \sum_{x} x \times f(x)$
The variance of a discrete random variable $X$ $X$ is $Var(X) = E((X - \mu)^2) = \sum_{x} (x - \mu)^2 \times f(x)$ $Va r (X) = E ((X - μ)^{2}) = \sum_{x} (x - μ)^{2} \times f (x)$ , where $\mu = E(X)$ $μ = E (X)$
- The standard deviation is the square root of the variance, $\sigma = \sqrt{Var(X)}$
The moment-generating function (MGF) is a tool for finding moments of a distribution, such as the mean and variance
- $M_X(t) = E(e^{tX}) = \sum_{x} e^{tx} \times f(x)$

Common Discrete Distributions

Bernoulli distribution: $P(X = 1) = p$ $P (X = 1) = p$ , $P(X = 0) = 1 - p$ $P (X = 0) = 1 - p$
- $E(X) = p$ , $Var(X) = p(1 - p)$
Binomial distribution: $P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k}$ $P (X = k) = (k n) p^{k} (1 - p)^{n - k}$
- $E(X) = np$ , $Var(X) = np(1 - p)$
Poisson distribution: $P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}$ $P (X = k) = \frac{e ^{- λ} λ ^{k}}{k !}$
- $E(X) = \lambda$ , $Var(X) = \lambda$
Geometric distribution: $P(X = k) = (1 - p)^{k-1} p$ $P (X = k) = (1 - p)^{k - 1} p$
- $E(X) = \frac{1}{p}$ , $Var(X) = \frac{1 - p}{p^2}$
Hypergeometric distribution: $P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}$ $P (X = k) = \frac{( k K ) ( n - k N - K )}{( n N )}$
- $E(X) = n \frac{K}{N}$ , $Var(X) = n \frac{K}{N} (1 - \frac{K}{N}) \frac{N-n}{N-1}$

Calculating Probabilities

To find the probability of a specific outcome, use the PMF for the appropriate distribution
- Substitute the given parameters and the desired outcome into the formula
To find the probability of a range of outcomes, sum the probabilities of each individual outcome in the range
- Alternatively, use the CDF to find $P(a \leq X \leq b) = F(b) - F(a)$
For independent events, multiply the probabilities of each event to find the probability of their intersection
For mutually exclusive events, add the probabilities of each event to find the probability of their union
Use conditional probability and Bayes' theorem when the probability of an event depends on the occurrence of another event

Applications in Real-World Scenarios

Quality control: Model the number of defective items in a batch using a binomial distribution
- Determine the probability of accepting or rejecting a batch based on the number of defects found in a sample
Insurance claims: Use a Poisson distribution to model the number of claims filed within a given time period
- Calculate the probability of exceeding a certain number of claims to set premiums and reserves
Genetics: Apply the hypergeometric distribution to calculate probabilities in population genetics
- For example, find the probability of observing a specific number of individuals with a rare allele in a sample from a larger population
Marketing: Employ the geometric distribution to model the number of ads a customer sees before making a purchase
- Estimate the expected number of ads needed to generate a sale and optimize ad placement strategies
Reliability engineering: Utilize the negative binomial distribution to model the number of failures before a specified number of successes
- Assess the reliability of systems or components and plan maintenance schedules accordingly

Practice Problems and Examples

A fair coin is flipped 10 times. What is the probability of observing exactly 7 heads?
- $P(X = 7) = \binom{10}{7} (0.5)^7 (0.5)^3 \approx 0.1172$
The average number of customers entering a store per hour is 30. What is the probability that more than 35 customers enter the store in a given hour?
- Using the Poisson distribution with $\lambda = 30$ , $P(X > 35) = 1 - P(X \leq 35) \approx 0.1185$
In a batch of 100 items, 10 are known to be defective. If 20 items are randomly selected without replacement, what is the probability that exactly 3 of them are defective?
- Using the hypergeometric distribution, $P(X = 3) = \frac{\binom{10}{3} \binom{90}{17}}{\binom{100}{20}} \approx 0.1954$
A machine produces defective items with a probability of 0.02. What is the probability that the 5th defective item is produced on the 200th item?
- Using the negative binomial distribution, $P(X = 200) = \binom{199}{4} (0.02)^5 (0.98)^{195} \approx 0.0190$
Two fair dice are rolled. What is the probability that the sum of the numbers rolled is either 7 or 11?
- $P(\text{sum} = 7) = \frac{6}{36} = \frac{1}{6}$ and $P(\text{sum} = 11) = \frac{2}{36} = \frac{1}{18}$
- Since these events are mutually exclusive, $P(\text{sum} = 7 \text{ or } 11) = \frac{1}{6} + \frac{1}{18} = \frac{2}{9}$