Intro to Biostatistics

🫁Intro to Biostatistics Unit 2 – Probability Theory

Probability theory forms the foundation of statistical analysis in biomedical research. It provides tools to quantify uncertainty, assess risks, and make informed decisions based on available data. Understanding key concepts like sample spaces, events, and random variables is crucial for interpreting study results. This unit covers probability basics, types of probability, and probability distributions. It also explores applications in biostatistics, including diagnostic testing, epidemiology, and clinical trials. Mastering probability calculations and avoiding common pitfalls are essential skills for conducting rigorous statistical analyses in biomedical research.

Key Concepts and Definitions

  • Probability the likelihood of an event occurring, expressed as a number between 0 and 1
    • 0 indicates an impossible event, while 1 represents a certain event
  • Sample space the set of all possible outcomes of an experiment or random process
  • Event a subset of the sample space, representing one or more outcomes of interest
  • Random variable a function that assigns a numerical value to each outcome in a sample space
    • Can be discrete (countable values) or continuous (uncountable values)
  • Independence two events are independent if the occurrence of one does not affect the probability of the other
  • Mutually exclusive events cannot occur simultaneously (rolling a 1 and a 6 on a single die roll)

Probability Basics

  • Probability is calculated by dividing the number of favorable outcomes by the total number of possible outcomes
    • P(A) = (number of favorable outcomes) / (total number of possible outcomes)
  • The sum of probabilities for all possible outcomes in a sample space equals 1
  • Complement of an event (A') probability that event A does not occur, calculated as P(A') = 1 - P(A)
  • Addition rule for mutually exclusive events P(A or B) = P(A) + P(B)
  • Multiplication rule for independent events P(A and B) = P(A) × P(B)
  • Conditional probability the probability of event A occurring given that event B has already occurred, denoted as P(A|B)
    • Calculated as P(A|B) = P(A and B) / P(B)

Types of Probability

  • Classical probability based on the assumption that all outcomes are equally likely
    • Used in situations with a finite number of equally likely outcomes (fair coin, unbiased die)
  • Empirical (frequentist) probability estimated based on observed data or past experiences
    • Calculated as the relative frequency of an event in a large number of trials
  • Subjective probability based on personal belief or judgment, often used when limited data is available
  • Axiomatic probability follows a set of axioms to ensure consistency and avoid paradoxes
    • Non-negativity P(A) ≥ 0 for all events A
    • Normalization P(S) = 1, where S is the entire sample space
    • Additivity for mutually exclusive events P(A or B) = P(A) + P(B)

Probability Distributions

  • Probability distribution a function that describes the likelihood of different outcomes for a random variable
  • Discrete probability distributions used for random variables with countable outcomes
    • Examples Bernoulli, binomial, Poisson, geometric distributions
  • Continuous probability distributions used for random variables with uncountable outcomes
    • Examples uniform, normal (Gaussian), exponential, beta distributions
  • Probability density function (PDF) describes the relative likelihood of a continuous random variable taking on a specific value
  • Cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a specific value
  • Expected value (mean) the average value of a random variable over a large number of trials
  • Variance and standard deviation measures of the spread or dispersion of a probability distribution

Applications in Biostatistics

  • Diagnostic testing calculating sensitivity, specificity, and predictive values using probability
    • Sensitivity P(positive test | disease), specificity P(negative test | no disease)
  • Epidemiology estimating disease prevalence, incidence, and risk factors using probability methods
  • Genetics calculating the probability of inheriting certain traits or genetic disorders based on Mendelian inheritance
  • Clinical trials determining the probability of treatment success, adverse events, and patient outcomes
  • Survival analysis estimating the probability of survival over time using methods like Kaplan-Meier curves and Cox regression
  • Risk assessment quantifying the probability of developing a disease or experiencing an adverse event based on risk factors

Probability Calculations

  • Bayes' theorem used to calculate the probability of an event based on prior knowledge and new evidence
    • P(A|B) = (P(B|A) × P(A)) / P(B)
  • Permutations calculate the number of ways to arrange objects in a specific order
    • nPr = n! / (n - r)!, where n is the total number of objects and r is the number of objects being arranged
  • Combinations calculate the number of ways to select objects without regard to order
    • nCr = n! / (r! × (n - r)!), where n is the total number of objects and r is the number of objects being selected
  • Binomial probability calculates the probability of a specific number of successes in a fixed number of independent trials
    • P(X = k) = nCk × p^k × (1 - p)^(n - k), where n is the number of trials, k is the number of successes, and p is the probability of success in a single trial
  • Poisson probability calculates the probability of a specific number of events occurring in a fixed interval of time or space
    • P(X = k) = (λ^k × e^(-λ)) / k!, where λ is the average number of events per interval and k is the number of events of interest

Common Mistakes and Pitfalls

  • Confusing independence and mutual exclusivity events can be mutually exclusive but not independent, or independent but not mutually exclusive
  • Misinterpreting conditional probability P(A|B) is not always equal to P(B|A)
  • Neglecting the base rate (prior probability) when using Bayes' theorem
  • Misusing the multiplication rule for non-independent events P(A and B) ≠ P(A) × P(B) if A and B are dependent
  • Overestimating the likelihood of rare events based on personal experience or media coverage (availability heuristic)
  • Misinterpreting p-values as the probability of the null hypothesis being true, rather than the probability of observing the data given that the null hypothesis is true

Real-World Examples

  • Weather forecasting predicting the probability of rain, snow, or other weather events based on historical data and current conditions
  • Insurance calculating premiums based on the probability of claims, considering factors like age, health status, and risk behaviors
  • Quality control estimating the probability of defective products in a manufacturing process to ensure compliance with standards
  • Sports betting determining the odds of different outcomes in a game or tournament based on team statistics and performance
  • Medical decision-making using probability to weigh the risks and benefits of different diagnostic tests or treatment options
  • Finance assessing the probability of investment returns, loan defaults, or market fluctuations to inform financial strategies


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.