Introduction to Plotting ECDF and CDF for N(0,1) Distribution using ggplot2 in R
In this blog post, we will explore how to plot the empirical cumulative distribution function (ECDF) and the cumulative distribution function (CDF) of a standard normal distribution in R using the ggplot2 package. We will also delve into the concept of the Kolmogorov-Smirnov test statistic, which measures the distance between an empirical distribution and a reference distribution.
Background: ECDF and CDF
The Empirical Cumulative Distribution Function (ECDF) is a non-parametric estimate of the cumulative distribution function. It is calculated by dividing the data into bins and then calculating the proportion of observations in each bin that are less than or equal to a given value.
The Cumulative Distribution Function (CDF), on the other hand, is a theoretical function that describes the probability that a random variable takes on a value less than or equal to a given value. For the standard normal distribution, the CDF is given by the cumulative distribution function of the standard normal distribution, denoted as Φ(x).
Base R Plot
To plot the ECDF and CDF for the standard normal distribution in Base R, we can use the ecdf function to calculate the ECDF from our data and the curve function to plot the CDF.
# Load necessary libraries
library(ggplot2)
set.seed(2021)
# Generate random data from N(0,1) distribution
x <- rnorm(50, 1, 1)
# Calculate ECDF
ecdf_x <- ecdf(x)
# Plot ECDF
plot(ecdf_x, type = "l")
curve(pnorm(x), from = -10, to = 10, add = TRUE)
ggplot2 Plot
To plot the ECDF and CDF for the standard normal distribution using ggplot2, we need to create a data frame that contains the ECDF values as both the x and y coordinates.
# Create a data frame with ECDF values
df1 <- data.frame(x = x, y = ecdf_x(x))
# Plot ECDF using ggplot2
ggplot(df1, aes(x, y)) +
stat_ecdf() +
stat_function(fun = pnorm)
Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov test is a non-parametric statistical test used to determine whether two empirical distributions are significantly different from each other. The test statistic, D, measures the distance between the CDF of the reference distribution (in this case, the standard normal distribution) and the ECDF of our data.
To calculate the Kolmogorov-Smirnov test statistic in R, we can use the ks.test function.
# Perform KS test
KS_test <- ks.test(x, pnorm)$statistic
print(KS_test)
Interpreting the Results
The resulting value from the ks.test function represents the Kolmogorov-Smirnov test statistic, which can be used to determine whether our data is significantly different from a standard normal distribution.
In this case, we observe that the distance between the ECDF and CDF is approximately 0.42. This suggests that there may be some differences between our empirical distribution and the standard normal distribution.
Maximizing Distance
To find the maximum distance between our ECDF and CDF and the reference distribution, we can simply maximize the test statistic.
# Find the maximum value of the KS test statistic
max_KS <- max(KS_test)
print(max_KS)
This gives us a value of approximately 0.42 for the maximum distance between our ECDF and CDF and the standard normal distribution.
Conclusion
In this blog post, we explored how to plot the empirical cumulative distribution function (ECDF) and the cumulative distribution function (CDF) of a standard normal distribution in R using ggplot2. We also delved into the concept of the Kolmogorov-Smirnov test statistic, which measures the distance between an empirical distribution and a reference distribution.
We observed that the distance between our ECDF and CDF is approximately 0.42, suggesting some differences with the standard normal distribution. However, we were able to maximize this value by finding the maximum of the Kolmogorov-Smirnov test statistic, which gave us an approximate distance of 0.42.
We hope that this blog post has provided you with a comprehensive understanding of how to plot ECDF and CDF for N(0,1) distribution using ggplot2 in R and how to use the Kolmogorov-Smirnov test to determine whether two empirical distributions are significantly different from each other.
Last modified on 2025-05-02