XYFIT

 

Working with cumulative distributions and spreadsheet data.

1.0

Statistical data evaluation:

It is a painful work to calculate data tables to extract the objective information we need. The correlation of data usually has to be proved by an experiment. In production processes we know almost well, which data are correlated. But we often do not know anything about failures and why values are changing by time. Most of these effects are very rarely, unpredictable and hidden in the data noise. Therefore we need statistical methods to identify the problem.

Usually we want to characterize a device and to specify its values. Already here we can run into severe problems, because we have almost no knowledge about the real distribution of the device data. We almost assume all data follow a normal distribution or can be transformed to a normal distribution. In reality the data can have failures or multiple distributions.  Before we calculate the data, we have to remove the failing devices, which are not normal, which are not in agreement to the statistics. A probability plot helps us to identify the  devices, which have failed. Then we can trace these bad devices in other tests and how are they distributed in other tests. Are they grouped (correlated) or are they randomly distributed. The correlated tests almost point to the same error.  They lead us to an overstressed and weak part of the device or to a test failure. So we can learn a lot from failures before we remove them. 

The Excel Add on (XYFIT 451) helps to get an overview over the statistics of many tests, to identify the weakness by statistical methods. We will demonstrate this now with an example of an electronic device and it's test data and we will show how to evaluate the test data using distribution plots.  

1.1 Distribution plots

Unfortunately we don’t know very much about the distribution of our test data. Are they as we expect, do they show a multiple distribution or any other type of distribution?  Therefore we start to create a cumulative normal distribution plot, which is independent of the type of distribution and linear for a normal distribution. The data Xi are sorted and weighted with the error-function. The Y-axis of the cumulative distribution is scaled in Sigma (the standard width of a normal distribution) and the X-axis refers to the linear scaled data values.

The statistical values of these plots usually cannot be used to characterize the devices, because they contain failures or multiple distributions.  A perfect normal distribution shows all data near a line, and failing parts are far away from this line. Other distribution types often can be transformed to a normal distribution (e.g. a log normal distribution). 

 

Picture 1   3 types of distribution

The picture 1 shows three general types of distributions, a double distributions, normal distribution and failures.
 
When we have hundreds of tests to evaluate, we are interested to find very quickly the bad distributions. A characterization in different types of distributions will help to sort the tests. Therefore we have to find a value, which must be positive, without a dimension and should be in a range, which can be displayed in bar chart. Many statistical values like the Mean, Standard Deviation, CV, and others are not helpful, because they can be infinite or negative or their range is too wide.
The distribution ratio Sigma  over 1Sigma "S/1S" or ( Stdev(100%)/Stdev(68%) is ideal to generate this overview over the test data. The 1 Sigma value is the Sigma value of the distribution in the 1 Sigma range ( the normal distribution range of 68% of the population around the median). This value has no dimension, and is always positive and never infinite or 0. Therefore the inverse 1S/S has the same properties.

1.      S/1S<1  
a ratio below 1 has a higher 1Sigma value and indicates a double distribution or a rectangle shaped distribution (test limited distributions)

2.   S/1S=1
                   A ratio equal to 1 is a normal distribution.

3.     S/1S>1
A ratio above 1 has a lower 1Sigma value and indicates failing devices in a distribution or a distribution with smaller side distributions or a logarithmic distribution.

All types can also appear mixed together. In the next picture 2 we see a S/1S bar chart, which gives us a " S/1S Profile

1.2 Picture 2  S over 1S  (S/1S) bar charts  

a)

 

The value S/1S provides a good profile of many test distributions. The bars higher than 1 are distributions with failing parts and the bars below 1 are double distributions. The double distributions (S/1S<1) are the most unwanted types

b)

 

The picture 2a shows more than 150 tests of devices.  6 S/1S's are above 10, about 30% above 4, about 50% at 1and 5 below 1. Picture 2b shows another test. Both pictures contain devices with severe failures (S/1S>1). In the next  picture 3 we see plots of distributions with failures S/1S>>1.

Picture 3 device failures

          

Picture 3 shows one of the first tests with an extreme failure of one device and a test with a side distribution (type 3 with S/1S>1). Tests with a high ratio of S/1S often with just one extreme failure seem to correlate all to each other. Their regression coefficient is close to 1. After a cleaning  often there is no correlation at all and the regression coefficient is very low. 
The tests with a S/1S ratio below 1 have double or rectangle shaped distributions as we see in the next pictures 4.

Picture 4 


 

The cumulative plots with S/1S>1 can be used to clean the data and to remove these devices. 

1.31 Unstable devices

Distributions also can have failures with a continuos drop from a straight line. These devices can be unstable and run out of the distribution or indicate a moving offset of the test equipment. We can add to the S/1S plot a second plot from a test at a different  time of the same devices and observe the changes of their distributions.

Distributions can change by time. Usually the activation energy and the MTF (Mean Time to Failure) are used to describe their lifetime. But before devices fail, they can change their distribution. The Weibull statistic helps us to describe the failure rate. It doesn't tell anything about the problems of the device, whether it suddenly or slowly fails and why it fails. It only tells us something about the probability of failures over time.

One or more parameter of a device can either drift in one direction or random in any direction. A drift in one direction (Offset) causes a change of the Mean and a random drift changes the Sigma (width) of the distribution. We almost can observe both. The Sigma can decrease or increase.  Sometimes we can observe a fast and a slow aging of the same population, caused by different processes ( thermal , mechanical stress or electro migration). A process that causes an improvement (decrease of the Sigma) is for example a mechanical and thermal stress reduction in electronic components by aging or annealing. Another process "Burn in" is used to accelerate fast aging and to separate those devices with a short lifetime ( often caused by electro migration related to a weak design)  from devices with a long lifetime. These processes are used to improve the quality. But they don’t improve the designed quality of the devices; they try to separate the good from the bad devices. We find weak and fast aging devices without the knowledge of their decease. It is a patch, a selection method based on the MTF statistics and not knowing why the devices will fail.

A more successful approach is the continuous observation of the statistics over time, to improve the design of a device, to find and replace the weak parts, which create unstable statistics, to improve and control the test systems and production equipment.

Picture 5 shows an example of a S/1S plot, how the distribution of a population has changed over time.

1.32 S/1S plot two tests: 2nd after a heat treatment of the same device population

Picture 5

The plots of one population before (blue bars) and after (violet bar) a high temperature storage show changes of the distributions. We see reduced S/1S values of many tests

Picture 7 

1.4 Interpretation of statistics

The first distribution in the S/1S plot is a test with a shift of the mean. The Mean value of one test has shifted more than 100%. We believe the devices are not stable!
But we see in the S/1S plot, that the distribution has changed. The original distribution (orange color) picture 7 has changed dramatically (after a temperature storage blue color). We see suddenly a double distribution. When we look at the contact test  (right picture) we find the contact of about half of the devices has changed. 40% of the devices after the heat treatment had a higher contact resistance. But the Sigma of the 40% high resistance contacts has not changed. How can we have such an offset in the distribution?  Why 60% of the devices are stable and 40% have a very determined offset? We must assume something was wrong with the test because of the sudden offset of the distribution after the heat treatment.

Picture 8

In the he next plots on the right side the higher contact resistance devices are marked with a red color. In the left plot we see in other tests also a double distribution. The same devices in the left and right plot are marked with the red color. This is a demonstration, that the contact resistance test is correlated to the change to a double distribution.

Picture 9

In the first picture of picture 9 on the right side we see an uncorrelated distribution plot. In the contact distribution the red devices are grouped together and in this distribution the same devices are randomly distributed. We know these tests are not correlated. We can look at the overview of the distribution of all tests, to find all correlated and not correlated tests. Here we see on the right picture several correlated tests and on the left some uncorrelated tests as we expected. We found, that all double distributions are very well correlated.

Picture 10

This indicates, why the test failed! During the second test the Mean contact resistance did change. It might have been a change of the test system or environment. EMI (Electro-Magnetic-Interference) interference of a machine near the test equipment could be the problem or an operator interference.

With other tools we probably had found unstable devices instead of a test problem. A misleading result can be very expensive and cost a lot of money. It might stop the production of the devices or delete the delivery. Besides the loss of money, it can cause confusion in the market and wrong activities in a company. Engineers also can control with a S/1S plot very well the quality of purchased parts.Therefore a careful evaluation of test data can save money. More important in some cases: it can help to save lives, when carefully designed and tested devices are delivered in a safety system of cars, airplanes and other transportation systems

  11/2002 ELB