🐛Biostatistics Unit 10 – Survival Analysis: Kaplan-Meier in Biology

Survival analysis is a crucial statistical method in biology for studying time-to-event data. It's especially useful in medical research, where it helps measure patient survival rates after treatments. The Kaplan-Meier estimator is a key tool in this field, providing a non-parametric way to estimate survival functions. This approach can handle censored data, making it versatile for real-world studies. It allows researchers to compare survival curves between groups, estimate median survival times, and calculate confidence intervals. Understanding these concepts is essential for interpreting biological and medical research outcomes.

What's Survival Analysis?

  • Branch of statistics focused on analyzing the expected duration of time until one or more events happen
  • Commonly used in medical research to measure the fraction of patients living for a certain amount of time after treatment
  • Incorporates data from a cohort of individuals, some of whom remain event-free for the duration of the study (right-censored observations)
  • Survival analysis methods can accommodate censoring and provide a survival function that estimates the probability of an event occurring beyond a certain time
  • Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from lifetime data
    • Non-parametric means it makes no assumptions about the underlying distribution of the survival times
  • Useful for analyzing the distribution of time between an initial event (diagnosis, treatment) and a terminal event (death, relapse)
  • Can compare survival curves between groups using statistical tests (log-rank test) to determine if differences are significant

Key Concepts in Kaplan-Meier

  • Survival function S(t)S(t) gives the probability that an individual survives longer than some specified time tt
  • Hazard function h(t)h(t) represents the instantaneous event rate at time tt conditional on survival until time tt or later
  • Censoring occurs when the survival time for some individuals is unknown due to loss to follow-up or study termination before the event occurs
    • Right-censoring is most common where the event occurs after the observed survival time
  • Kaplan-Meier curve is a series of horizontal steps of declining magnitude that approaches the true survival function for the population
  • Median survival time is the time at which S(t)=0.5S(t) = 0.5, representing when 50% of the individuals have experienced the event
  • Confidence intervals can be calculated for the survival function to quantify the uncertainty in the estimates
  • Log-rank test compares the survival distributions of two or more groups to determine if they are statistically equivalent

Setting Up Your Data

  • Data should be structured with one row per individual and columns for the survival time, censoring indicator, and any covariates of interest
  • Survival time is the duration from the initial event (start of follow-up) to the terminal event (failure) or censoring
  • Censoring indicator is a binary variable (0 for censored, 1 for event) that distinguishes between complete and incomplete observations
    • Censored observations contribute to the survival function only up to their observed survival time
  • Time scale should be chosen based on the research question and the granularity of the available data (days, months, years)
  • Data should be checked for inconsistencies, such as negative survival times or missing values, and cleaned accordingly
  • Covariates can be included to explore their association with the survival outcome and to adjust for potential confounding factors
  • Stratification can be used to estimate separate survival curves for different subgroups (treatment arms, risk categories) within the same model

Calculating Survival Probabilities

  • Kaplan-Meier estimator calculates the survival probability at each distinct event time tit_i as the product of the conditional probabilities of surviving to each event time up to tit_i
  • Conditional probability of surviving beyond time tit_i given survival to tit_i is estimated as (nidi)/ni(n_i - d_i) / n_i, where:
    • nin_i is the number of individuals at risk (not censored and still event-free) just prior to time tit_i
    • did_i is the number of events (failures) at time tit_i
  • Survival probability at time tit_i is the product of the conditional probabilities up to and including tit_i: S^(ti)=j=1injdjnj\hat{S}(t_i) = \prod_{j=1}^i \frac{n_j - d_j}{n_j}
  • Standard error of the survival probability can be estimated using Greenwood's formula to construct confidence intervals
  • Calculations are typically performed using statistical software (R, SAS, STATA) that can handle tied event times and produce the necessary outputs

Plotting the Kaplan-Meier Curve

  • Kaplan-Meier curve is a graphical representation of the survival function over time
  • X-axis represents the survival time, and the Y-axis represents the estimated survival probability
  • Curve starts at a survival probability of 1 (100% of individuals are event-free at the beginning of follow-up)
  • At each distinct event time, the curve drops vertically by an amount proportional to the number of events at that time
  • Censored observations are typically marked with a tick or cross on the curve at their observed survival time
  • 95% confidence intervals can be plotted as dashed lines around the survival curve to show the uncertainty in the estimates
  • When comparing multiple groups, separate curves are plotted on the same graph, often with different colors or line types
  • Median survival time for each group can be marked on the x-axis or provided in a legend

Interpreting the Results

  • Kaplan-Meier curve provides a visual summary of the survival experience over time
  • Steeper drops in the curve indicate time periods with a higher rate of events
  • Flatter sections of the curve suggest time periods with a lower rate of events or a higher proportion of censored observations
  • Median survival time represents the time at which half of the individuals have experienced the event
    • Useful summary measure, especially when the maximum follow-up time is insufficient for all individuals to experience the event
  • Confidence intervals that do not overlap between groups suggest statistically significant differences in survival
  • Log-rank test provides a formal comparison of the survival curves, with a small p-value indicating that the curves are significantly different
  • Hazard ratios can be estimated using Cox proportional hazards regression to quantify the relative risk of an event between groups while adjusting for covariates
  • Results should be interpreted in the context of the study design, population, and potential limitations (selection bias, confounding, limited follow-up)

Real-World Applications in Biology

  • Cancer research: Comparing survival outcomes between different treatment regimens or risk groups
    • Example: Kaplan-Meier curves for overall survival in patients with advanced lung cancer receiving chemotherapy versus immunotherapy
  • Epidemiology: Analyzing time to infection or disease onset in exposed and unexposed populations
    • Example: Estimating the incubation period distribution for a novel infectious disease using data from contact tracing studies
  • Ecology: Studying factors affecting animal lifespan or time to specific events (migration, reproduction)
    • Example: Comparing survival curves for different populations of a threatened species in habitats with varying levels of human disturbance
  • Genetics: Investigating the effect of genetic variants on age-related phenotypes or disease progression
    • Example: Assessing the impact of a particular gene mutation on the time to onset of Alzheimer's disease symptoms
  • Biomarkers: Evaluating the prognostic value of biological markers in predicting survival outcomes
    • Example: Using Kaplan-Meier curves to demonstrate the association between high levels of a circulating protein and reduced progression-free survival in cancer patients

Common Pitfalls and How to Avoid Them

  • Violating the assumption of non-informative censoring, which requires that censored individuals have the same survival prospects as those who remain under observation
    • Ensure that censoring is not related to the outcome of interest and that follow-up is as complete as possible
  • Failing to account for competing risks, which occur when an individual experiences an event that precludes the occurrence of the primary event of interest
    • Use specialized methods (cumulative incidence function, cause-specific hazard function) to properly analyze competing risks data
  • Misinterpreting the survival probability as the probability of being event-free at a specific time, rather than the probability of surviving beyond that time
    • Emphasize that the survival function represents the cumulative probability of surviving beyond each time point
  • Overinterpreting small differences in survival curves, especially when confidence intervals are wide or overlapping
    • Focus on clinically meaningful differences and consider the uncertainty in the estimates when drawing conclusions
  • Extrapolating survival estimates beyond the observed follow-up time, which can lead to unrealistic predictions
    • Restrict interpretations to the time period covered by the data and avoid making predictions far beyond the last observed event time
  • Failing to report key information (median survival time, confidence intervals, p-values) needed to fully interpret the results
    • Follow reporting guidelines (CONSORT, STROBE) and include all relevant statistics and graphical displays to ensure transparency and reproducibility
  • Ignoring the impact of covariates or confounding factors on the survival outcomes
    • Use multivariate regression methods (Cox proportional hazards model) to adjust for potential confounders and explore the effects of covariates on survival


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.