🫁Intro to Biostatistics Unit 9 – Survival Analysis

Survival analysis is a crucial statistical method in biomedical research, focusing on time-to-event data. It allows researchers to study the timing of events like disease onset or treatment outcomes, even when some subjects haven't experienced the event by the study's end. Key concepts include censoring, survival and hazard functions, and the Kaplan-Meier method. The Cox proportional hazards model is widely used to analyze the effects of multiple factors on survival. These tools help researchers compare treatments, identify risk factors, and develop prognostic models in various medical fields.

What's Survival Analysis?

  • Branch of statistics focused on analyzing time-to-event data where the outcome variable is the time until an event of interest occurs
  • Commonly used in medical research to study the effectiveness of treatments, risk factors for disease, and prognostic factors
  • Allows for the inclusion of censored data, which occurs when the event of interest has not been observed for a subject during the study period
  • Differs from other statistical methods as it accounts for the fact that the event of interest may not have occurred for all subjects by the end of the study
  • Provides insights into the probability of an event occurring over time and the factors that influence this probability
  • Enables researchers to compare survival patterns between different groups (treatment vs. control) and identify risk factors associated with the event of interest
  • Offers a flexible framework for handling various types of censoring (right, left, or interval) and accommodating time-dependent covariates

Key Concepts and Terms

  • Event: The specific occurrence of interest, such as death, disease recurrence, or equipment failure
  • Survival time: The duration from the starting point (e.g., diagnosis or treatment initiation) until the event occurs or the subject is censored
  • Censoring: Occurs when the event of interest has not been observed for a subject during the study period
    • Right censoring: The most common type, where the subject is still event-free at the end of the study or is lost to follow-up
    • Left censoring: When the event of interest has already occurred before the subject is included in the study
    • Interval censoring: When the event is known to have occurred within a specific time interval, but the exact time is unknown
  • Survival function S(t)S(t): The probability that an individual survives beyond time tt
  • Hazard function h(t)h(t): The instantaneous rate of experiencing the event at time tt, given that the individual has survived up to that point
  • Kaplan-Meier estimator: A non-parametric method for estimating the survival function from observed survival times, accounting for censoring
  • Cox proportional hazards model: A semi-parametric regression model that relates the hazard function to a set of covariates, assuming that the hazard ratios between groups remain constant over time

Types of Survival Data

  • Right-censored data: The most common type, where the event of interest has not occurred by the end of the study period or the subject is lost to follow-up
  • Left-censored data: When the event of interest has already occurred before the subject is included in the study
  • Interval-censored data: When the event is known to have occurred within a specific time interval, but the exact time is unknown
  • Truncated data: When subjects are not included in the study until they have reached a certain point in their survival time
    • Left truncation: Subjects are only included if they have survived up to a specific time point
    • Right truncation: Subjects are only included if the event occurs before a specific time point
  • Competing risks data: When subjects are at risk of experiencing multiple, mutually exclusive events (e.g., death from different causes)
  • Recurrent event data: When the event of interest can occur multiple times for the same subject (e.g., asthma attacks or hospital readmissions)

Survival and Hazard Functions

  • Survival function S(t)S(t) represents the probability that an individual survives beyond time tt
    • Defined as S(t)=P(T>t)S(t) = P(T > t), where TT is the survival time
    • Ranges from 1 at the start of the study (when t=0t = 0) to 0 as tt approaches infinity
    • Can be estimated non-parametrically using the Kaplan-Meier method or parametrically by assuming a specific distribution for the survival times (e.g., exponential, Weibull, or log-normal)
  • Hazard function h(t)h(t) represents the instantaneous rate of experiencing the event at time tt, given that the individual has survived up to that point
    • Defined as h(t)=limΔt0P(tT<t+ΔtTt)Δth(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t | T \geq t)}{\Delta t}
    • Provides insights into how the risk of the event changes over time
    • Can be modeled using the Cox proportional hazards model or parametric models (e.g., exponential, Weibull, or log-normal)
  • The survival and hazard functions are related through the cumulative hazard function H(t)H(t), which is defined as the integral of the hazard function from 0 to tt
    • The relationship between S(t)S(t) and H(t)H(t) is given by S(t)=exp(H(t))S(t) = \exp(-H(t))

Kaplan-Meier Method

  • Non-parametric method for estimating the survival function from observed survival times, accounting for censoring
  • Calculates the probability of surviving beyond each observed event time, conditional on having survived up to that point
  • Estimates the survival function as a step function, with drops at each observed event time
  • Formula for the Kaplan-Meier estimator:
    • Let t1<t2<...<tkt_1 < t_2 < ... < t_k be the distinct observed event times
    • Let did_i be the number of events at time tit_i and nin_i be the number of individuals at risk just prior to time tit_i
    • The Kaplan-Meier estimator is given by S^(t)=i:tit(1dini)\hat{S}(t) = \prod_{i: t_i \leq t} (1 - \frac{d_i}{n_i})
  • Provides a visual representation of the survival experience of the study population
  • Allows for the comparison of survival curves between different groups using the log-rank test
  • Limitations include the inability to incorporate covariates directly and the assumption that censoring is non-informative

Cox Proportional Hazards Model

  • Semi-parametric regression model that relates the hazard function to a set of covariates
  • Assumes that the hazard ratios between groups remain constant over time (proportional hazards assumption)
  • The model is defined as h(tX)=h0(t)exp(β1X1+β2X2+...+βpXp)h(t|X) = h_0(t) \exp(\beta_1 X_1 + \beta_2 X_2 + ... + \beta_p X_p), where:
    • h(tX)h(t|X) is the hazard function for an individual with covariate values X=(X1,X2,...,Xp)X = (X_1, X_2, ..., X_p)
    • h0(t)h_0(t) is the baseline hazard function, which is left unspecified
    • β1,β2,...,βp\beta_1, \beta_2, ..., \beta_p are the regression coefficients that quantify the effect of each covariate on the hazard
  • Coefficients are estimated using partial likelihood, which accounts for the ordering of the event times without specifying the baseline hazard function
  • Hazard ratios (exp(βi\beta_i)) represent the multiplicative effect of a one-unit increase in the corresponding covariate on the hazard, assuming all other covariates remain constant
  • Allows for the inclusion of both continuous and categorical covariates
  • Can be extended to incorporate time-dependent covariates and stratification factors
  • Model assumptions (proportional hazards, linearity, and non-informative censoring) should be assessed before interpreting the results

Interpreting Survival Analysis Results

  • Kaplan-Meier curves provide a visual representation of the survival experience over time
    • Steep drops indicate time points with a high number of events
    • Wide gaps between curves suggest a substantial difference in survival between groups
    • Crossing curves may indicate non-proportional hazards
  • Log-rank test assesses the statistical significance of the difference in survival curves between groups
    • A small p-value (typically < 0.05) suggests a significant difference in survival
  • Cox proportional hazards model results are typically presented as hazard ratios (HR) with 95% confidence intervals (CI)
    • An HR > 1 indicates an increased risk of the event for the corresponding covariate, while an HR < 1 indicates a decreased risk
    • The 95% CI provides a range of plausible values for the true HR; if the CI does not include 1, the covariate is considered statistically significant
  • Proportional hazards assumption can be assessed using graphical methods (e.g., log-log survival plots or Schoenfeld residuals) or statistical tests (e.g., time-dependent covariates or Grambsch-Therneau test)
  • Model fit can be evaluated using measures such as the likelihood ratio test, Wald test, or score test
  • Results should be interpreted in the context of the study design, population, and research question, considering potential confounding factors and limitations

Real-World Applications in Biostatistics

  • Clinical trials: Evaluating the efficacy and safety of new treatments or interventions
    • Comparing survival outcomes between treatment and control groups
    • Identifying subgroups of patients who may benefit more from a specific treatment
  • Epidemiological studies: Investigating risk factors for disease onset or progression
    • Assessing the impact of lifestyle factors (smoking, diet, physical activity) on disease-free survival
    • Examining the role of genetic or environmental factors in the development of chronic diseases
  • Prognostic studies: Developing models to predict patient outcomes based on clinical, demographic, or molecular characteristics
    • Identifying prognostic biomarkers that can stratify patients into risk groups
    • Developing risk scores or nomograms to aid in treatment decision-making
  • Public health research: Analyzing population-level trends in disease incidence, prevalence, and mortality
    • Evaluating the effectiveness of screening programs or public health interventions
    • Investigating disparities in health outcomes across different socioeconomic or racial/ethnic groups
  • Reliability analysis: Assessing the durability or failure rates of medical devices or equipment
    • Comparing the performance of different device designs or materials
    • Identifying factors that contribute to device failure or malfunction


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.