Statistical Process Controls: August 2009

Friday, August 28, 2009

BASIC SPC CONCEPTS: Definition of Quality

“The further we can look back, the better we see what is ahead.” --Adapted from W.E Deming.

Introduction

In my past entries, I jumped in immediately in posting JMP scripts that are useful for SPC practitioners. I have not spent much time though in introducing, defining, and explaining what SPC is. Allow me to back step this time. In this entry I would take time to look at the basic concepts in detail, and there is no better jumping board than the Definition of Quality.

Definition of Quality

About 8 years ago I was already visualizing myself pursuing a career in Quality Management. I was daydreaming of solving very complex problems, deploying quality management systems, initiating customer-focused activities, and all other things that could convince even the hardest person that I am the God’s gift to the Quality Community. But then my professor in Statistical Process Control course came in our classroom and threw a question to start a discussion. After a long pause made by waiting for someone to answer, the professor looked at me and asked, “How about you Rey, how would you define Quality? What is Quality for you?” I was dumbfounded of course. Not only because I know I can not give a satisfactory answer, but more of the realization that I was dreaming big while not even knowing what it is I was dreaming about. How could I manage Quality if I can not even define what Quality is in the first place? Well I gave the professor a reply, “Quality is consistently exceeding the expectations”, but after that my mind wandered-off for the rest of the class. I have learned that very moment that I do not know enough. Actually I learned that I really do not know a thing about Quality, and that lesson is enough for me for one day.
After that incident I decided to do some personal research. Here is what I recall of what I came up with.

“Quality is Fitness of Use” – Joseph M. Juran
Joseph Juran is the chief editor of The Quality Handbook (1999) and is the person behind the Pareto Concept (Vital Few, Trivial Many). Juran defines quality as fitness of use. According to him, quality is the freedom from deficiencies. Quality from his point of view therefore costs less, since this implies fewer defects and less scrap rates. It also means gain in productivity brought about by decreasing reworks, and increase in customer satisfaction made by products or services that are free from flaws.

“Quality is Conformance to Requirements” – Philip B. Crosby
Philip Crosby is credited for popularizing the concepts Zero Defect and Quality is Free. According to Crosby quality is the conformance to the requirements, and therefore does not cost any. What adds cost is not doing it right the first time. This additional cost he termed as the Cost of Poor Quality. Crosby shares with Juran the concept that quality actually costs less. They differ however on the perspective of how the end product is used. For Juran, it is the customer who ultimately decides if the end product or service is of quality. For Crosby however, it is the conformance to the specifications or written procedures that defines the quality. The question of whether that specification is “fit for use” is irrelevant. As long as the requirement is met, quality is present.
An important note must be made at this point. A Quality Practitioner should be able to distinguish between Juran's and Crosby’s concepts. It should be clear that they are referring to two different aspects of quality. One is the Quality of Design, while the other is the Quality of Conformance. A product may have been able to meet the design requirements but if the design itself is poor the product may end up as unfit for use. On the other hand it is also possible to have a superb design, but when an end product can not conform to its specifications quality is not present either.

“Quality is How the Customers Define it” – W. Edwards Deming
Edwards Deming is the person behind the well known Plan-Do-Check-Act Cycle. His advocacy on Quality is that only the customers can define it, and it typically changes from time to time. Focus therefore should be on understanding the customer, translating their needs and wants into measurable quality characteristics, and continuously reducing product variations in terms of these characteristics. For Deming, quality increases as the variation decreases. From this perspective, Quality does not necessarily mean less cost. Since quality from this definition may mean meeting customer wants, improving process performances, and tightening of tolerances, then quality may actually cost more.

“Quality has 8 Dimensions” – David A. Garvin
David Garvin is a Harvard Professor famous for his development of the 8 Dimensions of Quality concept. According to Garvin quality is multi-faceted and has 8 faces. These are the following:
1. Performance
2. Features
3. Reliability
4. Conformance
5. Durability
6. Serviceability
7. Aesthetics
8. Perceived Quality

--to be continued…

Tuesday, August 18, 2009

Minitab's Levene's Test vs JMP's Levene's Test

I once used Levene's test in JMP and found that the results are obviously counter-intuitive to my understanding of the data im testing. Being a logical person and a practitioner of applied statistics, I refuse to accept the results in its face value without understanding first why it contradicts my personal understanding of the statistical problem. So the first thing I did was to try to replicate JMP's Levene's test result using another statistical software. As i have an immediate access only to Minitab that time, I used Minitab 15. And guess what? The conclusion disagrees with JMP!
Minitab agreed with my intuition, but that does not mean we are correct and JMP is wrong. It could be the other way around. Now I have no choice but to examine the internal formula and algoritms used by the softwares. Good thing is that both of them have superb documentations. That is where i found their differences. But before I tell you about the details of what I found, it is necessary that I give you enough background of the Levene's Test.

The Levene's Test of Homogeniety of Variances

To test whether a sample variance is equal to some hypothesized value, we use the Chi-square test for variance. If there are two sample variances we wish to compare, we use the F-test for two variances. If more than two samples are involved and we want to simultaneously check the homogeneity of their variances, we use Bartlett's Test.

These tests however are very sensitive in the assumption that the sample data being examined are coming from a population that are normally distributed. By sensitive we mean that if the data are not from a normal population, the test would give inaccurate results. The alpha, or probability of False Negatives (you conclude as no difference but in reality there is), would be much higher than what we expect it to be. It is in this backdrop that the Levene's test comes in.
So what do we do when the data are not from a normal distribution? We do a trade off. We do a more generalized test which is applicable for other distributions, but in return we lose power. By power we mean the ability to detect differences. A more powerful test means that it can discern even slight differences,and give the conclusion that there is a difference if it sees one. A less powerful test means that it is more conservative in saying that there is no difference between the samples.
The tests Chi-square, F-test,and Bartlett's test are very powerful, but only for normally distributed data. That is why given the choice,and your data is tested to be normally distributed, these tests should be the first choice. However when the data is not from a normal distribution, we should use Levene's test. Levene's test is one of the non-parametric tests available to statisticians. Non-parametric test means it does not assume a distribution for it to be usable.
Now originally there is only one Levene's test. It is a formula for a test statistic where it uses an averaging of the data. Later on it evolved into 2 more forms. One uses the median, and the other one uses the trimmed mean. So currently there are 3 Levene's test. There is a fourth variation that is being used by JMP. It uses the mean, but in addition, instead of using the classical squared (Xi-Xmean), it uses the absolute value of the (Xi-Xmean) in its computation of the data spread. For the details of the formula, you may refer to this link while the fourth method is described in JMP's Statistics documentation.

Minitab vs JMP

We alredy described JMP's Levene's test method. JMP uses the mean and absolute value of (Xi-Xmean). Minitab on the other hand uses the median and the square of the (Xi-Xmean). So which is correct? You may have guessed this right. The answer is both of them are correct. (of course! or else they would be severely criticized). But mind also that both of them can also be wrong. The 3 variants of the Levene's test (where in JMP's fourth method can be classified as being the same as the first method) are applicable for different situations. I will quoute NIST here, from the URL provided above:

"The three choices for defining Zij determine the robustness and power of Levene's test. By robustness, we mean the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in fact equal. By power, we mean the ability of the test to detect unequal variances when the variances are in fact unequal.

Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean performed best when the underlying data followed a Cauchy distribution (i.e., heavy-tailed) and the median performed best when the underlying data followed a (i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.

Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of the underlying distribution of the data, this may indicate using one of the other choices. "

Going Back to my Problem

So what is correct conclusion for my case? Since I am testing the variance of samples coming from a Poisson distribution which has a heavy tail on the right, Minitab's method is the one correct for me. So in the end, my intuition was proven to be correct. But the lesson learned in here is not about intuition. It is about making sense of the data, and verifying the conclusion against reality. A novice would always be excited to plug data into a statistical software, click on some buttons, copy some charts and values,and then paste it into his/her presentation. An expert does the thinking first. Then uses the tools to quantify uncertainties. In the end it is he/she that does the thinking, and not the software that is run on the machine.

Monday, August 10, 2009

Visualizing Sigma: A JMP JSL Demostration (reposted from http://elsmar.com/Forums/blog.php?b=135)

This is a repost from: http://elsmar.com/Forums/blog.php?b=135
Hope it would help someone out there.

In the elmar's cove forum, i have written this note:

This is again for the JMP users out there.

Lately I have been noticing that whenever I am conducting trainings, it helps when the participants have a way to visualize the concepts that are being presented to them. In a recent example I was asked what is the basis of the value 6 in the Pp formula (USL-LSL)/(6*StdDev). Here is JMP JSL script that illustrates the logic behind the value 6.

A sample window is shown below. It is interactive. After showing this, you may then explain that the sigma is equivalent to distance from the center to the point of inflection on a normal curve,and you can fit about 6 of this length from one end of the curve to the other end. The output window is like this:

Note: You can freely use and share this script. I would be grateful though if you would give credit to me and point to this forum as the source.

Regards,
Reynald Francisco

The JMP output window is shown below:

The JMP JSL Scripth5>


Clear Globals();







/*Define constants*/

rsqrt2pi = 1 / Sqrt( 2 * Pi() );

e = e();

/*Define intial average

Define initial Sigma within

Get user information through a dialog box*/



dlg = Dialog(

 "VISUALIZING SIGMA: A DEMOSTRATION",

 " ",

 "Enter the following information to begin, then click on OK",

 " ",

 V List(

  V List(

   " Population parameter:",

   "",

   Line Up( 2,

    " Mean", mu = Edit Number( 0 ),

    " Standard Deviation", sigma = Edit Number( 1 )

   ),

   "",



  )

 ),



 " ",

 H List( Button( "OK" ), Button( "Cancel" ) )

);





If( dlg["Button"] == -1,

 Throw( "User cancelled" )

);

Remove From( dlg );

Eval List( dlg );





/*Define the output window*/

OUT_WINDOW = New Window( "VISUALIZING SIGMA: A JMP JSL DEMOSTRATION",

 Border Box(

  TOP( 5 ),

  Left( 5 ),

  Panel Box( "VISUALIZING SIGMA",

   V List Box(

    T1 = Text Box( "Move the slider below to adjust the Sigma." ),

    T2 = Text Box( "Move the handle of the curve to adjust the Mean" ),

    NormCurve = Graph Box(

     FrameSize( 500, 350 ),

     Y Scale( 0, 1.40 * rsqrt2pi / sigma ),

     X Scale( mu - 10 * sigma, mu + 10 * sigma ),

     Double Buffer,

     Pen Color( "red" );

     Text Color( "red" );

     Text( {mu + 1 * sigma, rsqrt2pi / sigma * e ^ (-1 / 2)}, "Sigma = ", sigma );

     LINEMATRIX_X = {mu, mu + sigma};

     LINEMATRIX_X = Matrix( LINEMATRIX_X );

     LINEMATRIX_Y = {rsqrt2pi / sigma * e ^ (-1 / 2), rsqrt2pi / sigma * e ^ (-1 / 2

     )};

     LINEMATRIX_Y = Matrix( LINEMATRIX_Y );

     Arrow( LINEMATRIX_X, LINEMATRIX_Y );

     LINEMATRIX_X1 = {mu + sigma, mu};

     LINEMATRIX_X1 = Matrix( LINEMATRIX_X1 );

     LINEMATRIX_Y1 = {rsqrt2pi / sigma * e ^ (-1 / 2), rsqrt2pi / sigma * e ^ (-1 /

     2)};

     LINEMATRIX_Y1 = Matrix( LINEMATRIX_Y1 );

     Arrow( LINEMATRIX_X1, LINEMATRIX_Y1 );

     Pen Color( "blue" );

     Text Color( "blue" );

     Pen Size( 2 );

     Text Color( "BLUE" );

     Text( {mu, rsqrt2pi / sigma}, "----->Mean = ", Char( Round( mu, 3 ) ) );

     Y Function( Normal Density( (x - mu) / sigma ) / sigma, x );

     Pen Size( 1 );

     Pen Color( "Green" );

     V Line( mu, 0, rsqrt2pi / sigma );

     Line Style( 2 );

/*V Line( mu + sigma, 0, rsqrt2pi / sigma * e ^ (-1 / 2) );*/

     V Line( mu + 2 * sigma, 0, rsqrt2pi / sigma * e ^ (-4 / 2) );

     V Line( mu + 3 * sigma, 0, rsqrt2pi / sigma * e ^ (-9 / 2) );

     V Line( mu - sigma, 0, rsqrt2pi / sigma * e ^ (-1 / 2) );

     V Line( mu - 2 * sigma, 0, rsqrt2pi / sigma * e ^ (-4 / 2) );

     V Line( mu - 3 * sigma, 0, rsqrt2pi / sigma * e ^ (-9 / 2) );

     Pen Color( "Red" );

     ARROW_MAT_X = {mu + sigma, mu + sigma};

     ARROW_MAT_Y = {rsqrt2pi / sigma * e ^ (-1 / 2), 0};

     ARROW_MAT_X = Matrix( ARROW_MAT_X );

     ARROW_MAT_Y = Matrix( ARROW_MAT_Y );

     Arrow( ARROW_MAT_X, ARROW_MAT_Y );

     Handle( mu, rsqrt2pi / sigma, mu = x );

    ),





    H List Box(

     Text Color( "BLUE" );

     T3 = Text Box( "SIGMA VALUE" );,

     SB1 = Slider Box( 0.5 * sigma, 10 * sigma, sigma, NormCurve <<>

Thursday, August 6, 2009

Cp vs Cpk: Illustrating the Impact of Mean Shift and Sigma Changes

This is a script I developed to help me in explaining the concepts of Cp/Pp and Cpk/Ppk whenever I am conducting Six Sigma trainings. This is one of the reasons I appreciate JMP's JSL. It allows me to customize demostrations for my trainees that visually aid them in grasping Six Sigma concepts.
You may freely use. I would be grateful if you would give the credit to me and refer to this blog.

Reynald Francisco
http://statisticalprocesscontrols.blogspot.com/

The JMP JSL Script


dlg = Dialog(
V List(V List("Cpk Parameters","",
    Line Up(2,
    "Mean", mu = Edit Number(0),
    "Standard Deviation", sigma = Edit Number(1)
       ),"","",
    "Define Specifications","",
   Line Up( 2,
    "USL ",USL = Edit Number(2)
   ),Line Up( 2,
    "LSL ",LSL = Edit Number(-2)
   )
   ),
 H List( Button("OK"), Button("Cancel") )
));


If( dlg["Button"] == -1, Throw( "User cancelled" ) );
Remove From( dlg ); Eval List( dlg );






rsqrt2pi = 1/sqrt(2*pi());

New Window("Cpk Demostration",

Graph Box(
FrameSize(500,500), 
XScale(mu-8*sigma,mu+8*sigma), 
yScale(0,1.40*rsqrt2pi/sigma), 
Double Buffer,
Pencolor("blue"),
pensize(1), 
text size(12),
TextColor("black"),
YFunction(Normal Density((x-mu)/sigma)/sigma, x);   /*Y-scale is Normalized to Z-scores*/
YFunction(Normal Density((x-mu)/sigma)/sigma, x,fill(20),max(LSL));  /*Fill low*/
YFunction(Normal Density((x-mu)/sigma)/sigma, x,fill(20),min(USL));  /*Fill high*/
Handle(mu,rsqrt2pi/sigma,mu=x;sigma=rsqrt2pi/y);
Pencolor("red"),
pensize(1), 
text size(10),
TextColor("black"),
XFunction(LSL, y); 
Handle(LSL,0.45*rsqrt2pi/sigma,LSL=x);
XFunction(USL, y);
Handle(USL,0.55*rsqrt2pi/sigma,USL=x);
text({mu,0.85*rsqrt2pi/sigma},"mu ",mu,"  sigma ",sigma);
textcolor("red");
text({LSL,0.45*rsqrt2pi/sigma},"LSL= ",LSL);
text({USL,0.55*rsqrt2pi/sigma},"USL= ",USL);
Pencolor("blue"),
pensize(1), 
text size(11),
TextColor("blue"),
Cpu=(USL-mu)/(3*sigma);
Zu=(USL-mu)/sigma;
Zl=(LSL-mu)/sigma;
yield=normal distribution(Zu)- normal distribution(Zl);
Cpl=(mu-LSL)/(3*sigma);
Cpk=min(Cpu,Cpl);
text({mu,1.15*rsqrt2pi/sigma},"Cpk= ",Cpk);
text({mu,1.35*rsqrt2pi/sigma},"Cpu= ",Cpu);
text({mu,1.25*rsqrt2pi/sigma},"Cpl= ",Cpl);
text({mu,1.05*rsqrt2pi/sigma},"Estimated Yield= ",yield);
)              /* Close Graph Box parenthesis*/
);            /* Close New Window parenthesis*/

Statistical Process Controls