Fiveable
Fiveable

Common Probability Distributions to Know for Biostatistics

Understanding common probability distributions is key in biostatistics and probabilistic methods. These distributions help model real-world phenomena, from normal data patterns to rare events, guiding decision-making and analysis in various fields, including health and research.

  1. Normal (Gaussian) Distribution

    • Symmetrical, bell-shaped curve characterized by its mean (ยต) and standard deviation (ฯƒ).
    • Approximately 68% of data falls within one standard deviation of the mean, 95% within two, and 99.7% within three (Empirical Rule).
    • Central to many statistical methods due to the Central Limit Theorem, which states that the sum of a large number of independent random variables tends to be normally distributed.
  2. Binomial Distribution

    • Models the number of successes in a fixed number of independent Bernoulli trials (e.g., coin flips).
    • Defined by two parameters: the number of trials (n) and the probability of success (p).
    • Useful for calculating probabilities of discrete outcomes, such as the likelihood of getting a certain number of heads in a series of coin tosses.
  3. Poisson Distribution

    • Describes the number of events occurring in a fixed interval of time or space, given a known average rate (ฮป) and independence of events.
    • Particularly useful for modeling rare events, such as the number of phone calls received at a call center in an hour.
    • The mean and variance of a Poisson distribution are both equal to ฮป.
  4. Exponential Distribution

    • Models the time between events in a Poisson process, characterized by the rate parameter (ฮป).
    • Memoryless property: the probability of an event occurring in the next time interval is independent of how much time has already elapsed.
    • Commonly used in survival analysis and reliability engineering to model lifetimes of objects or time until an event occurs.
  5. Chi-Square Distribution

    • A distribution of the sum of the squares of k independent standard normal random variables, used primarily in hypothesis testing and confidence interval estimation.
    • Commonly applied in tests of independence and goodness-of-fit tests in categorical data analysis.
    • The shape of the distribution depends on the degrees of freedom (df), with more degrees of freedom resulting in a distribution that approaches normality.
  6. Student's t-Distribution

    • Similar to the normal distribution but with heavier tails, making it more suitable for small sample sizes.
    • Defined by degrees of freedom, which affects the shape; as sample size increases, it approaches the normal distribution.
    • Used primarily in hypothesis testing and constructing confidence intervals for means when the population standard deviation is unknown.
  7. Uniform Distribution

    • All outcomes are equally likely within a specified range, characterized by minimum (a) and maximum (b) values.
    • Can be discrete (e.g., rolling a fair die) or continuous (e.g., selecting a random number between 0 and 1).
    • Useful in simulations and scenarios where each outcome has the same probability of occurring.
  8. Bernoulli Distribution

    • A special case of the binomial distribution with a single trial, representing two possible outcomes: success (1) or failure (0).
    • Defined by a single parameter, the probability of success (p).
    • Fundamental in probability theory and serves as the building block for more complex distributions.
  9. Beta Distribution

    • A continuous distribution defined on the interval [0, 1], characterized by two shape parameters (ฮฑ and ฮฒ).
    • Flexible in modeling random variables that represent proportions or probabilities.
    • Commonly used in Bayesian statistics and for modeling random variables that are constrained to a finite range.
  10. Gamma Distribution

    • A continuous distribution defined by a shape parameter (k) and a scale parameter (ฮธ), often used to model waiting times.
    • Generalizes the exponential distribution; when k is an integer, it can represent the sum of k independent exponential random variables.
    • Useful in various fields, including queuing models and reliability analysis.