Tuesday, August 18, 2009

Minitab's Levene's Test vs JMP's Levene's Test

I once used Levene's test in JMP and found that the results are obviously counter-intuitive to my understanding of the data im testing. Being a logical person and a practitioner of applied statistics, I refuse to accept the results in its face value without understanding first why it contradicts my personal understanding of the statistical problem. So the first thing I did was to try to replicate JMP's Levene's test result using another statistical software. As i have an immediate access only to Minitab that time, I used Minitab 15. And guess what? The conclusion disagrees with JMP!
Minitab agreed with my intuition, but that does not mean we are correct and JMP is wrong. It could be the other way around. Now I have no choice but to examine the internal formula and algoritms used by the softwares. Good thing is that both of them have superb documentations. That is where i found their differences. But before I tell you about the details of what I found, it is necessary that I give you enough background of the Levene's Test.

The Levene's Test of Homogeniety of Variances

To test whether a sample variance is equal to some hypothesized value, we use the Chi-square test for variance. If there are two sample variances we wish to compare, we use the F-test for two variances. If more than two samples are involved and we want to simultaneously check the homogeneity of their variances, we use Bartlett's Test.

These tests however are very sensitive in the assumption that the sample data being examined are coming from a population that are normally distributed. By sensitive we mean that if the data are not from a normal population, the test would give inaccurate results. The alpha, or probability of False Negatives (you conclude as no difference but in reality there is), would be much higher than what we expect it to be. It is in this backdrop that the Levene's test comes in.
So what do we do when the data are not from a normal distribution? We do a trade off. We do a more generalized test which is applicable for other distributions, but in return we lose power. By power we mean the ability to detect differences. A more powerful test means that it can discern even slight differences,and give the conclusion that there is a difference if it sees one. A less powerful test means that it is more conservative in saying that there is no difference between the samples.
The tests Chi-square, F-test,and Bartlett's test are very powerful, but only for normally distributed data. That is why given the choice,and your data is tested to be normally distributed, these tests should be the first choice. However when the data is not from a normal distribution, we should use Levene's test. Levene's test is one of the non-parametric tests available to statisticians. Non-parametric test means it does not assume a distribution for it to be usable.
Now originally there is only one Levene's test. It is a formula for a test statistic where it uses an averaging of the data. Later on it evolved into 2 more forms. One uses the median, and the other one uses the trimmed mean. So currently there are 3 Levene's test. There is a fourth variation that is being used by JMP. It uses the mean, but in addition, instead of using the classical squared (Xi-Xmean), it uses the absolute value of the (Xi-Xmean) in its computation of the data spread. For the details of the formula, you may refer to this link while the fourth method is described in JMP's Statistics documentation.

Minitab vs JMP

We alredy described JMP's Levene's test method. JMP uses the mean and absolute value of (Xi-Xmean). Minitab on the other hand uses the median and the square of the (Xi-Xmean). So which is correct? You may have guessed this right. The answer is both of them are correct. (of course! or else they would be severely criticized). But mind also that both of them can also be wrong. The 3 variants of the Levene's test (where in JMP's fourth method can be classified as being the same as the first method) are applicable for different situations. I will quoute NIST here, from the URL provided above:

"The three choices for defining Zij determine the robustness and power of Levene's test. By robustness, we mean the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in fact equal. By power, we mean the ability of the test to detect unequal variances when the variances are in fact unequal.

Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean performed best when the underlying data followed a Cauchy distribution (i.e., heavy-tailed) and the median performed best when the underlying data followed a (i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.

Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of the underlying distribution of the data, this may indicate using one of the other choices. "

Going Back to my Problem

So what is correct conclusion for my case? Since I am testing the variance of samples coming from a Poisson distribution which has a heavy tail on the right, Minitab's method is the one correct for me. So in the end, my intuition was proven to be correct. But the lesson learned in here is not about intuition. It is about making sense of the data, and verifying the conclusion against reality. A novice would always be excited to plug data into a statistical software, click on some buttons, copy some charts and values,and then paste it into his/her presentation. An expert does the thinking first. Then uses the tools to quantify uncertainties. In the end it is he/she that does the thinking, and not the software that is run on the machine.

No comments:

Post a Comment

 
Custom Search