Ryan's Math Rant: Statistics in the Modern World

We often see advertisements about how good or how effective some products and services are in our society. Many people are easily swayed by a 99.9% guarantee. Here's some food for thought:

If 99.9% is good enough then....
-12 newborns will be given to the wrong parents daily
-114,500 mismatched pairs of shoes will be shipped/year
-18,322 pieces of mail will be mishandled/hour
-2,000,000 documents will be lost by the IRS this year
-2.5 million books will be shipped with the wrong covers
-Two planes landed at Chicago's O'Hare airport will be unsafe every day
-315 entries in Webster's Dictionary will be misspelled
-20,000 incorrect drug prescriptions will be written this year
-880,000 credit cards in circulation will turn out to have incorrect
cardholder information on their magnetic strips
-103,260 income tax returns will be processed incorrectly during the
year
-5.5 million cases of soft drinks produced will be flat
-291 pacemaker operations will be performed incorrectly
-3056 copies of tomorrow's Wall Street Journal will be missing one
of the three sections
-A typical day would be 24 hours long (give or take 86.4 seconds)

Taken from http://www.scribd.com/doc/300027/999-Correct-everyday-statistics">http://www.scribd.com/doc/300027/999-Correct-everyday-statistics

To understand statistics, one must have a good understanding of the statistical averages: Mean, median, and mode.

The Mean
The mean is the sum of all data points, divided by the number of data points. For example, the mean of 10, 12, 13, 13, 13, 14, 25 is approximately 14.28 (the sum of those 7 numbers is 100, and when divided by 7 the result is 14.28). This is a good statistic to use when the data set is relatively normal, which means that it is fairly evenly distributed from the low end to the high end, with some clumping near the middle. However, when one or more data points are much higher or lower than the rest, the mean is pulled towards those extremes. This leads to reports of the "average" being much higher or much lower than what the majority of data points actually are. The mean is pulled towards the tail when the data is skewed.

The Median
The median is the middle data point among a set of numbers. Using the previous example of 10, 12, 13, 13, 13, 14, 25, the median would be13 because when the data set is in order, 13 is in the exact middle of the list. This is a good statistic when the data is skewed, meaning there are data points that are extremely higher or lower than the majority of the data points.

The Mode
The mode is the most common data point in a set. In the previous example, since 13 occurs 3 times, 13 would also be the mode. This is a good statistic to use when the data is heavily skewed.

Normal Curves and Skewed Curves
Below is a picture of 3 curves. These represent data sets, with the most common data points under the highest part of the bump, and the less common points near the tail ends. The middle picture is a regular normal curve, where the mean, median, and mode are the same point. A negatively skewed data set, the left picture, represents data with extremes on the low end. A positively skewed data set, the right picture, represents data with extremes on the high end.

Picture from http://www.google.com/imgres?imgurl=http://www.southalabama.edu/coe/bset/johnson/lectures/lec15_files/image014.jpg&imgrefurl=http://www.southalabama.edu/coe/bset/johnson/lectures/lec15.htm&h=278&w=575&sz=19&tbnid=E-102q3qvt11AM:&tbnh=65&tbnw=134&prev=/images%3Fq%3Dskewed%2Bnormal%2Bcurve&zoom=1&q=skewed+normal+curve&usg=__6DTQC2GNa1MLAhSIgHHpMJYrc-c=&sa=X&ei=9HSaTN68MIi-sAP55ZiFBA&ved=0CBYQ9QEwAA

Notice that in a skewed distribution the mean, median, and mode are pulled further apart from each other. This is how people lie with statistics. If they want you to believe that the majority of people have more positive results in a negatively skewed data set they would use the mode or median instead of the mean. If they want you to believe people have more positive results in a positively skewed data set, they would use the mean instead of the median or mode.

Lying with Statistics
Consider my example from earlier: 10, 12, 13, 13, 13, 14, 25. If these were the hourly wages for 7 people in a company, this would be a positively skewed data set, because 25 is so much higher than the other data points. If a company wants you to believe their average starting salary is higher than the majority of people in the company actually make, the company would report the mean (14.28) instead of the median or mode (13). Now consider these data points to be the number of accidents a company had each month. They would probably report the "average" to be 13, the median or mode number, instead of the mean of 14.28, to make their average accidents per month sound less serious, which completely fails to take the 25 into account and might cause some to overlook it.

Here's a funny little video I found. It's a bit lame, but the tune is catchy. Might be helpful for junior high students.

2 comments:

AnonymousSeptember 22, 2010 at 8:47 PM
Wow! Those statistics are amazing! That o.1% really makes a difference, doesn't it? I had no idea.... I will no longer be swayed by the 99.9% guarantee. That's no guarantee at all! Thanks for sharing. It definitely got me thinking.
Dr. MillerSeptember 23, 2010 at 7:21 AM
This is pretty advances! So this is what students get in high school huh? Wow! Good info and I am glad there is a chart to break it up!

Wednesday, September 22, 2010

Statistics in the Modern World

2 comments: