What is Six Sigma?
Six Sigma, commonly written as 6σ, is a set of techniques for process improvement originally developed for industrial manufacturing by engineer Bill Smith at Motorola in the 1980-s. A central part of the six sigma tools are procedures for statistical process control: ensuring production meets particular quality standards which result in reliable products or services delivered to satisfied end customers at a reasonable cost.
The most common way to specify how well-controlled a process is to list the number of defects it would produce per million opportunities (DPMO or DPMOps). Combined with a measure for the complexity of an overall process for producing a more complex product this number can easily be expanded to produce defects per million products (DPM). Another, and perhaps more convenient way of expressing this is either in terms of yield (percentage of processes resulting in output within specification) or in terms of percent defects.
For example, a process may result in 3.4 defects per million opportunities which equals a process yield of 99.99965% and a defect rate (equal to the scrap rate plus the rework rate) of 0.00045%.
Within the profession it is also common to speak of the level of control in terms of how many standard deviations away from the average the produce falls based on some characteristic of interest. As the notation for standard deviation is the Greek lower-case letter sigma (σ), a six-sigma process is one in which the specification limits fall at six standard deviation from the mean in both directions.
The issue with sigma tables, calculators, and DPMO values
Bill Smith, the founder of the Six Sigma method, in his 1993 article “Making War on Defects” provided a table with different sigma levels and their corresponding DPMO values. In it, one can see that a process for which the specification limits are ±6σ from its central tendency would result in 3.40 defects per million opportunities. A ±3σ process thus results in 66810 DPMO, a ±4σ in 6210 DPMO and a ±5σ process in 233 DPMO. Note that the article was for the Six Sigma process at Motorola and in this context the table was also specific to their process control.
However, these values became widely adopted as a convention, a standard, if you will. Most tables list the above DPMO values for these sigma levels, while some list short-term and long-term values, e.g. a process estimated to be 6σ through short-term process means it is a 4.5σ process in the long-term. Most online and Excel six sigma calculators (our own six sigma calculator being an exception) apply a 1.5 sigma shift by default, thus an observed sigma level of 4.5σ is reported as 6σ. Some allow the adjustment of the sigma shift, while others do not.
What is the issue with that, you may ask? The issue is that there is that a true six-sigma process should result in 0.002 defects per million opportunities. This is two defects in one billion chances, or about 2700 times less defects per million opportunities than Smith’s DPMO value of 3.4. Smith’s value was produced by applying what became known as “sigma shift” which is actually shifting the mean of the process.
Outside of Motorola’s own processes at the time, there is no evidence or rationale for applying a 1.5 shift in the observed long-term sigma level when extrapolating it to a short-term value, or vice versa. Applying this 1.5 sigma shift results in confusion with regards to the actual defect rate and yield rate of the process in question, either overestimating or underestimating them. In fact, there is hardly any rationale for applying any kind of shift, in both cases, as we shall see.
What commonly happens as result is that a process which results in a long-term defect rate corresponding to a sigma level of about 4.5 is reported as a six sigma process, while in other cases a process which results in a short-term defect rate of 6 sigma is reported as being a 4.5 sigma process. In both cases this happens without any evidence whatsoever.
The issue is only somewhat mitigated by reporting the actual DPMO, yield and defect rate – so long as the user ignores the sigma level, they will be fine. However, why is it reported, then?
Sigma shift – origins and justification for using 1.5 sigma shift
The DPMO values in Smiths article were calculated using what Smith considered an “unsurprising” scenario: a 1.5 shift in the mean of the process from one batch to another, especially for processes not under direct control. This was based on empirical observations he made for Motorola’s production processes. Note that this is not a table for predicting the long-term performance of a process based on short-term measurements, rather it was a description of what DPMO a long-term process with a given nominal sigma level would actually result in if it is also subject to 1.5 shifts of its true mean from batch to batch.
How the industry ended up adopting this scenario specific to Motorola as a standard to be used across companies and industries is unclear to the author.
In trying to defend and explain the practice in the early 20-th century, works by Bothe and Harry made statistical arguments for adopting exactly 1.5 sigma shift of the mean of the process. Both based their argument on taking samples of 4 units per hour or per batch or per some other set of outputs and made the correct observation that with such a small sample size, actual shifts in the mean of the process would likely remain undetected. Bothe did consider some other cases, but finally advocated for using sigma shift, even though he left room for adjustments of its size and even allowed for zero shift in particularly stable processes. Harry, on the other hand, pretty much insisted on using exactly 1.5 sigma shift when making predictions about long-term sigma from short-term data and vice versa.
Sigma shift doesn’t make sense
Let us examine some issue with sigma shift as a concept in general, and 1.5 sigma shift in particular.
One possible mistake of Smith is confusing observed shift in the mean with actual shift of the mean. Since he failed to provide confidence limits for his measurements of batch-to-batch changes of the mean of processes, it is impossible to tell how uncertain they were. Perhaps these were long-term observations and the uncertainty of their aggregate is low and these shifts in the mean did in fact, mostly occur. If these were actual shifts and the tables reflect the true defect and yield rates of these processes then why didn’t he put the correct sigma number next to them, based on the corresponding Z score from the Normal distribution? Smith makes the distinction between a process designed as six-sigma and such a process implemented with the batch-to-batch variability observed at Motorola. Hence for him it is a difference between “designed” and “actual” and it may be translated to a sigma estimate based on few observations grouped closely together versus what a longer-term observation of the same process would reveal over time.
It seems Smith considered degradation in a process not only normal, but also either undetectable or non-actionable, which is peculiar if customer satisfaction is put in first place and product quality needs to be strictly controlled to achieve that. However, Smith nowhere makes the argument that a 1.5 sigma shift is any kind of a general rule, so it seems this enshrinement happened later.
Bothe’s work seems to follow a similar logic but is more specific about the inability to detect certain departures with specific small sample sizes (3-5). It should be noted he does not advocate strictly for a 1.5 sigma shift, just a sigma shift of some size, depending on the sample size of the tests performed.
Harry’s work can be best summarized by this quote: “More specifically, the shift factor is a statistically-based mechanism that is intended to adjust the short-term capability of a performance variable for the influence of unknown (but yet anticipated) long-term random variations.”. He further clarifies that a shift of 1.5σ is a worst-case expectation.
However, what limits the sigma shift to 1.5? Nothing, really. A process may shift by much more than that and as there is practically no universal limit of how much it can shift. As Wheeler notes in his 2003 work “The Six Sigma Zone”: “the assumption that an unpredictable process will not shift location more than 1.5 sigma is completely indefensible” . No specific value of sigma makes sense even for a worst-case computation.
Does this mean that some other, empirical level of shift might be justified, as some Six Sigma professionals suggest? Let us examine the two possible scenarios. In one, we have data gathered over a long-term period t0 and want to estimate the short-term sigma level for a specific period t1 < t0. In this case, applying a shift is unnecessary since all we need to do is estimate the variability over all possible periods t1 which are contained in the data observed over t0. We have a direct estimate, no need for adjustments, fudge factors, etc.
In the second scenario one has short-term data over a period t0 from which they want to predict the sigma level over some longer-term period t1, t1 > t0. In this case we simply do not have the necessary information to make a reliable long-term prediction. The best we can do is to use the short-term estimate and report its associated uncertainty bounds. Applying any kind of adjustment will be completely arbitrary, unless based on experience producing similar parts on a similar machine / through a similar process. In the latter case the adjustment might be accurate enough to be justified and it is in fact the scenario under which Smith constructed his now famous table. However, even in this case, the reported sigma level should correspond to the defect rate, which is not the case with Smith’s table and its many copies in various articles and books.
For an accurate estimate of long-term performance long-term data should be gathered, deterioration rates and trends should be estimated and then extrapolated based on a sound process to get an even longer-term estimate.
The above is more or less in line with criticism of the 1.5 sigma shift expressed by many in the industry, including Wheeler’s “The Six-Sigma Zone”, which goes in depth about not only sigma shift, but also DPMO numbers, and makes a great analysis based on effective cost of production curves.
Stop claiming Six Sigma upon observing 3.4 DPMO
The takeaway from this article is thus: if the observed defect rate of a process is about 3.4 DPMO from data collected over a short period, then this is an approximately 4.5 sigma process in the short-term and most likely a 4.5-sigma or lower process in the long term. If the observed defect rate of a process is about 3.4 DPMO over a long period of time, then this is a an approximately 4.5 sigma process in both the long term. Its short-term defect rate can be estimated depending on how short that period is, relative to the overall available data – by definition, the shorter the period, the larger the possible error becomes.
A true six-sigma process is one which results in 2 defects per billion opportunities over whatever time period is deemed relevant. Calling any other process six-sigma is simply misleading and can only lead to unnecessary confusion.
Having said that, going beyond DMPO and DPM and working with the actual costs and savings associated with different types of defects seem to be a worthy area of pursuit as this is the cross-section between business metrics and engineering metrics. Wheeler’s work seems a most suitable starting point for that.
 Bothe, D.R. “Statistical Reason for the 1.5σ Shift”, Quality Engineering, 12(3):479-487, DOI: 10.1081/QEN-120001884
 Harry, M. J. (2003) “Resolving the Mysteries of Six Sigma: Statistical Constructs and Engineering Rationale.” Palladyne Publishing. Phoenix, Arizona
 Smith, B. (1993) “Making War on Defects”, IEEE Spectrum 30(9):43-50; DOI: 10.1109/6.275174
 Wheeler, D.J. (2003) “The Six-Sigma Zone”, SPC Press
An applied statistician, data analyst, and optimizer by calling, Georgi has expertise in web analytics, statistics, design of experiments, and business risk management. He covers a variety of topics where mathematical models and statistics are useful. Georgi is also the author of “Statistical Methods in Online A/B Testing”.