Creating Kaplan Meier Curves for Two Age Groups in R Using ggsurvplot Function

Introduction to Kaplan Meier Curves and ggsurvplot

=====================================================

In survival analysis, Kaplan-Meier curves are a popular method for visualizing the survival distribution of an outcome variable. The curve plots the probability of surviving beyond a certain time point against that time. In this article, we will explore how to create two separate Kaplan Meier curves using the ggsurvplot function from the ggsurv package in R.

Understanding the Kaplan-Meier Curve


A Kaplan-Meier curve is a step function that plots the cumulative survival probability against time. The x-axis represents the time, and the y-axis represents the probability of surviving beyond that time. Each data point on the curve corresponds to an observation with its associated survival status (alive or dead).

Understanding ggsurvplot


ggsurvplot is a function in the ggsurv package that generates Kaplan-Meier curves for survival analysis. It provides a convenient way to visualize the survival distribution of an outcome variable. The function takes several arguments, including:

  • fit: A survivable object created by survfit().
  • data: The data frame used to fit the survival curve.
  • palette: The color palette for the curves.
  • ggtheme: The theme of the plot.

Creating Kaplan Meier Curves for Two Age Groups


The problem we face here is how to create a single Kaplan-Meier curve that represents two different age groups: those under 19 years old and those over 19 years old. To achieve this, we will merge both datasets into one long data frame and define an agegrp variable to distinguish between the two age groups.

Step 1: Create a Long Data Frame

First, we need to create a long data frame that contains all the observations from both datasets.

# Load required libraries
library(ggplot2)
library(ggsurv)

# Define the datasets
data.all.agefs.under19 <- read.table("under19_data.csv")
data.all.agefs.above19 <- read.table("above19_data.csv")

# Create a long data frame for under 19 years old
df1 <- data.frame(
    time = data.all.agefs.under19$time,
    status = data.all.agefs.under19$death.specific,
    agegrp = "Under19"
)

# Create a long data frame for above 19 years old
df2 <- data.frame(
    time = data.all.agefs.above19$time,
    status = data.all.agefs.above19$death.specific,
    agegrp = "Above19"
)

# Concatenate both datasets into one long data frame
df <- rbind(df1, df2)

Step 2: Fit the Survival Curve

Next, we fit a survival curve for each group using survfit().

# Create survivable objects
fitme <- survfit(Surv(time, status) ~ agegrp, data = df)

# Print the summary of the fitted model
summary(fitme)

Step 3: Plot the Kaplan-Meier Curve

Finally, we plot the Kaplan-Meier curve using ggsurvplot().

# Set the theme for the plot
ggtheme <- theme_light()

# Plot the Kaplan-Meier curve
ggsurvplot(
    fit = fitme,
    data = df,
    palette = c("#E7B800", "#2E9FDF"),
    risk.table = TRUE,
    pval = TRUE,
    conf.int = TRUE,
    xlab = "Time in days",
    break.time.by = 100
)

This code generates a Kaplan-Meier curve that plots the survival distribution for both age groups on the same plot.

Using ggsurvplot with Additional Options


The ggsurvplot() function provides several additional options to customize the appearance of the plot. Some of these options include:

  • risk.table: A boolean value indicating whether to display the risk table.
  • pval: A boolean value indicating whether to display the p-value of the log-rank test.
  • conf.int: A boolean value indicating whether to display the confidence intervals for point estimates of survival curves.
  • palette: The color palette for the curves.
  • ggtheme: The theme of the plot.

By using these options, you can customize the appearance of your Kaplan-Meier curve plots.

Conclusion


In this article, we explored how to create two separate Kaplan-Meier curves using the ggsurvplot function from the ggsurv package in R. We demonstrated how to merge both datasets into one long data frame and define an agegrp variable to distinguish between the two age groups. Additionally, we discussed various options for customizing the appearance of your Kaplan-Meier curve plots.

By following this tutorial, you should now be able to create high-quality Kaplan-Meier curves using R and visualize survival distributions for different age groups.


Last modified on 2023-08-12