Mardia's Coefficient of Skewness: A Comprehensive Guide to Multivariate Skewness Detection in R

Understanding Mardia’s Coefficient of Skewness

=====================================================

Mardia’s coefficient of skewness is a measure used to assess the symmetry of multivariate distributions. In this article, we will delve into how to calculate and store the Mardia’s coefficients in a vector when dividing data into multiple parts.

Background on Multivariate Skewness


Skewness is a statistical concept that describes the asymmetry of a distribution. In univariate distributions, skewness can be calculated using the formula: $S = \frac{E(X^3) - (E(X))^3}{\sigma^3}$ where $X$ is the random variable, $\mu$ is its mean, and $\sigma$ is its standard deviation.

However, when dealing with multivariate data, things become more complicated. The concept of skewness extends to higher dimensions, making it essential for understanding the underlying structure of a dataset.

Mardia’s Coefficient


In 1970, Andrew Mardia proposed a method for detecting multivariate skewness and kurtosis using the multivariate normal distribution as a reference. This approach involves calculating the Mahalanobis distance (MD) between the given data distribution and the multivariate normal distribution.

The Mahalanobis distance can be thought of as a measure of how many standard deviations away from the mean a given observation is, taking into account the correlations between variables.

The Mardia Test


Mardia’s test calculates two values:

  1. Skewness ($\lambda_s$): measures the asymmetry or skewness in the distribution.
  2. Kurtosis ($\lambda_k$): measures the “tailedness” or kurtosis of the distribution.

These values are used to determine whether a dataset is likely to come from a multivariate normal distribution, which serves as a baseline for comparison.

Calculating Mardia’s Coefficient


To calculate Mardia’s coefficient, we can use the following formula:

$$ \begin{aligned} \text{Mardia’s Coefficient} &= \sqrt{(2 - 3)\lambda_s + (1 - 4)\lambda_k}\ &= \sqrt{-\lambda_s - 3\lambda_k}. \end{aligned} $$

This value is then used to assess the multivariate skewness of a dataset.

R Implementation


In this example, we are using R as our programming language to calculate Mardia’s coefficient. We first import the necessary libraries, including Matrix and psych.

library(Matrix)
library(psych)

We then set up some variables for better understanding:

set.seed(10)
N0 <- 1 # number of samples
n0 <- 5 # sample size per iteration
p0 <- 2 # number of variables
q0 <- 4 # number of groups or iterations

# calculate total number of observations (n)
n <- n0 * q0

Next, we create a matrix m2 with the correct dimensions to hold our mean vector and covariance structure.

m2 <- matrix(c(0, 0), p0, 1) # dimension of m2 is (p0 x 1)
dim(m2)
[1] 2 1 # verify that m2 has shape (2x1)

Similarly, we create a covariance structure s2 with the correct dimensions.

s2 <- matrix(c(1, 0, 0, 1), p0, p0)
dim(s2)
[1] 2 2 # verify that s2 has shape (2x2)

We then create an array Dat to hold our multivariate normal distributions.

Dat <- array(data = NA, dim = c(20, 2, q0)) # verify the shape of Dat is (n x p x q)
mardia_lst_vec <- vector('list', length = q0) # list of Mardia values for each group

for (i in 1:q0) {
    # calculate multivariate normal distribution
    Dat[,, i] <- mvrnorm(n, m2, s2)
    # calculate Mardia's coefficient for the current iteration
    mardia_lst_vec[[i]] <- mardia(Dat[,, i], plot = FALSE)
}

Finally, we extract and unlist the Mardia values for each group.

the_bp1 <- unlist(unname(c(mardia_lst_vec[[1]][3], mardia_lst_vec[[2]][3],
                             mardia_lst_vec[[3]][3], mardia_lst_vec[[4]][3])))

# verify the shape of the extracted Mardia values is a vector
dim(the_bp1)
[1] 1 4 # show that we have one row and four columns

print(the_bp1)

Conclusion


In this article, we explored how to calculate and store the Mardia’s coefficient in a vector when dividing data into multiple parts. We used R as our programming language for calculations.

By understanding multivariate skewness and kurtosis, you can gain deeper insights into your dataset and its underlying structure.


Last modified on 2024-01-27