Some Key Ingredients for Inferential Statistics
Z Scores, the Normal Curve, Sample versus Population, and Probability
✪ Z Scores 68
✪ The Normal Curve 73
✪ Sample and Population 83
✪ Probability 88
✪ Controversies: Is the Normal Curve Really So Normal? and Using Nonrandom Samples 93
✪ Z Scores, Normal Curves, Samples and Populations, and Probabilities in Research Articles 95
Ordinarily, psychologists conduct research to test a theoretical principle or theeffectiveness of a practical procedure. For example, a psychophysiologist mightmeasure changes in heart rate from before to after solving a difficult problem. The measurements are then used to test a theory predicting that heart rate should change following successful problem solving. An applied social psychologist might examine
✪ Advanced Topic: Probability Rules and Conditional Probabilities 96
✪ Summary 97
✪ Key Terms 98
✪ Example Worked-Out Problems 99
✪ Practice Problems 102
✪ Using SPSS 105
✪ Chapter Notes 106
T I P F O R S U C C E S S Before beginning this chapter, be sure you have mastered the mater- ial in Chapter 1 on the shapes of distributions and the material in Chapter 2 on the mean and stan- dard deviation.
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
68 Chapter 3
the effectiveness of a program of neighborhood meetings intended to promote water conservation. Such studies are carried out with a particular group of research partici- pants. But researchers use inferential statistics to make more general conclusions about the theoretical principle or procedure being studied. These conclusions go beyond the particular group of research participants studied.
This chapter and Chapters 4, 5, and 6 introduce inferential statistics. In this chapter, we consider four topics: Z scores, the normal curve, sample versus popula- tion, and probability. This chapter prepares the way for the next ones, which are more demanding conceptually.
Z Scores In Chapter 2, you learned how to describe a group of scores in terms and the mean and variation around the mean. In this section you learn how to describe a particular score in terms of where it fits into the overall group of scores. That is, you learn how to use the mean and standard deviation to create a Z score; a Z score describes a score in terms of how much it is above or below the average.
Suppose you are told that a student, Jerome, is asked the question, “To what extent are you a morning person?” Jerome responds with a 5 on a 7-point scale, where not at all and extremely. Now suppose that we do not know anything about how other students answer this question. In this situation, it is hard to tell whether Jerome is more or less of a morning person in relation to other students. However, suppose that we know for students in general, the mean rating (M) is 3.40 and the standard deviation (SD) is 1.47. (These values are the actual mean and standard deviation that we found for this question in a large sample of statistics students from eight different universities across the United States and Canada.) With this knowledge, we can see that Jerome is more of a morning person than is typical among students. We can also see that Jerome is above the average (1.60 units more than average; that is, ) by a bit more than students typically vary from the average (that is, students typically vary by about 1.47, the standard deviation). This is all shown in Figure 3–1.
What Is a Z Score? A Z score makes use of the mean and standard deviation to describe a particular score. Specifically, a Z score is the number of standard deviations the actual score is above or below the mean. If the actual score is above the mean, the Z score is posi- tive. If the actual score is below the mean, the Z score is negative. The standard deviation now becomes a kind of yardstick, a unit of measure in its own right.
In our example, Jerome has a score of 5, which is 1.60 units above the mean of 3.40. One standard deviation is 1.47 units; so Jerome’s score is a little more than 1 standard
5 – 3.40 = 1.60
7 = 1 =
Jerome’s score (5)
Figure 3–1 Score of one student, Jerome, in relation to the overall distribution on the measure of the extent to which students are morning people.
Z score number of standard deviations that a score is above (or below, if it is negative) the mean of its distribution; it is thus an ordinary score transformed so that it better describes the score’s location in a distribution.
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Some Key Ingredients for Inferential Statistics 69
deviation above the mean. To be precise, Jerome’s Z score is (that is, his score of 5 is 1.09 standard deviations above the mean).Another student, Michelle, has a score of 2. Her score is 1.40 units below the mean. Therefore, her score is a little less than 1 stan- dard deviation below the mean (a Z score of ). So, Michelle’s score is below the average by about as much as students typically vary from the average.
Z scores have many practical uses.As you will see later in this chapter, they are es- pecially useful for showing exactly where a particular score falls on the normal curve.
Z Scores as a Scale Figure 3–2 shows a scale of Z scores lined up against a scale of raw scores for our example of the degree to which students are morning people. A raw score is an ordi- nary score as opposed to a Z score. The two scales are something like a ruler with inches lined up on one side and centimeters on the other.
Changing a number to a Z score is a bit like converting words for measurement in various obscure languages into one language that everyone can understand—inches, cubits, and zingles (we made up that last one), for example, into centimeters. It is a very valuable tool.
Suppose that a developmental psychologist observed 3-year-old David in a lab- oratory situation playing with other children of the same age. During the observa- tion, the psychologist counted the number of times David spoke to the other children. The result, over several observations, is that David spoke to other children about 8 times per hour of play. Without any standard of comparison, it would be hard to draw any conclusions from this. Let’s assume, however, that it was known from pre- vious research that under similar conditions, the mean number of times children speak is 12, with a standard deviation of 4. With that information, we can see that David spoke less often than other children in general, but not extremely less often. David would have a Z score of ( and , thus a score of 8 is 1 SD below M), as shown in Figure 3–3.
Suppose Ryan was observed speaking to other children 20 times in an hour. Ryan would clearly be unusually talkative, with a Z score of (see Figure 3–3). Ryan speaks not merely more than the average but more by twice as much as children tend to vary from the average!
SD = 4M = 12-1
Raw score: .46
Figure 3–2 Scales of Z scores and raw scores for the example of the extent to which students are morning people.
Times spoken per hour: 0
Figure 3–3 Number of times each hour that two children spoke, shown as raw scores and Z scores.
raw score ordinary score (or any num- ber in a distribution before it has been made into a Z score or otherwise trans- formed).
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
70 Chapter 3
Formula to Change a Raw Score to a Z Score A Z score is the number of standard deviations by which the raw score is above or below the mean. To figure a Z score, subtract the mean from the raw score, giving the deviation score. Then divide the deviation score by the standard deviation. The formula is
For example, using the formula for David, the child who spoke to other children 8 times in an hour (where the mean number of times children speak is 12 and the standard deviation is 4),
Steps to Change a Raw Score to a Z Score ❶ Figure the deviation score: subtract the mean from the raw score. ❷ Figure the Z score: divide the deviation score by the standard deviation.
Using these steps for David, the child who spoke with other children 8 times in an hour,
❶ Figure the deviation score: subtract the mean from the raw score.
❷ Figure the Z score: divide the deviation score by the standard deviation.
Formula to Change a Z Score to a Raw Score To change a Z score to a raw score, the process is reversed: multiply the Z score by the standard deviation and then add the mean. The formula is
Suppose a child has a Z score of 1.5 on the number of times spoken with another child during an hour. This child is 1.5 standard deviations above the mean. Because the standard deviation in this example is 4 raw score units (times spoken), the child is 6 raw score units above the mean, which is 12. Thus, 6 units above the mean is 18. Using the formula,
Steps to Change a Z Score to a Raw Score ❶ Figure the deviation score: multiply the Z score by the standard deviation. ❷ Figure the raw score: add the mean to the deviation score.
Using these steps for the child with a Z score of 1.5 on the number of times spoken with another child during an hour:
❶ Figure the deviation score: multiply the Z score by the standard deviation.
❷ Figure the raw score: add the mean to the deviation score. 6 + 12 = 18. 1.5 * 4 = 6.
X = (Z) (SD) + M = (1.5) (4) + 12 = 6 + 12 = 18
X = (Z) (SD) + M
-4/4 = -1.
8 – 12 = -4.
Z = 8 – 12
Z = X – M
A Z score is the raw score minus the mean, divided by the standard deviation.
The raw score is the Z score multiplied by the standard deviation, plus the mean.
Some Key Ingredients for Inferential Statistics 71
Additional Examples of Changing Z Scores to Raw Scores and Vice Versa Consider again the example from the start of the chapter in which students were asked the extent to which they were a morning person. Using a scale from 1 (not at all ) to 7 (extremely), the mean was 3.40 and the standard deviation was 1.47. Sup- pose a student’s raw score is 6. That student is well above the mean. Specifically, using the formula,
That is, the student’s raw score is 1.77 standard deviations above the mean (see Figure 3–4, Student 1). Using the 7-point scale (from to
), to what extent are you a morning person? Now figure the Z score for your raw score.
Another student has a Z score of , a score well below the mean. (This stu- dent is much less of a morning person than is typically the case for students.) You can find the exact raw score for this student using the formula
That is, the student’s raw score is 1.00 (see Figure 3–4, Student 2). Let’s also consider some examples from the study of students’ stress ratings.
The mean stress rating of the 30 statistics students (using a 0–10 scale) was 6.43 (see Figure 2–3), and the standard deviation was 2.56. Figure 3–5 shows the raw score and Z score scales. Suppose a student’s stress raw score is 10. That student is well above the mean. Specifically, using the formula
Z = X – M
10 – 6.43 2.56
2.56 = 1.39
X = (Z) (SD) + M = (-1.63) (1.47) + 3.40 = -2.40 + 3.40 = 1.00
extremely 7 =1 = not at all
Z = X – M
6 – 3.40 1.47
1.47 = 1.77
Raw score: .46
(1.00) Student 2
(6.00) Student 1
Figure 3–4 Scales of Z scores and raw scores for the example of the extent to which students are morning people, showing the scores of two sample students.
(2.00) Student 2
(10.00) Student 1
Figure 3–5 Scales of Z scores and raw scores for 30 statistics students’ ratings of their stress level, showing the scores of two sample students. (Data based on Aron et al., 1995.)
72 Chapter 3
The student’s stress level is 1.39 standard deviations above the mean (see Figure 3–5, Student 1). On a scale of 0–10, how stressed have you been in the last 21⁄2 weeks? Figure the Z score for your raw stress score.
Another student has a Z score of , a stress level well below the mean. You can find the exact raw stress score for this student using the formula
That is, the student’s raw stress score is 2.00 (see Figure 3–5, Student 2).
The Mean and Standard Deviation of Z Scores The mean of any distribution of Z scores is always 0. This is so because when you change each raw score to a Z score, you take the raw score minus the mean. So the mean is subtracted out of all the raw scores, making the overall mean come out to 0. In other words, in any distribution, the sum of the positive Z scores must always equal the sum of the negative Z scores. Thus, when you add them all up, you get 0.
The standard deviation of any distribution of Z scores is always 1. This is because when you change each raw score to a Z score, you divide by the standard deviation.
A Z score is sometimes called a standard score. There are two reasons: Z scores have standard values for the mean and the standard deviation, and, as we saw earlier, Z scores provide a kind of standard scale of measurement for any variable. (However, sometimes the term standard score is used only when the Z scores are for a distribu- tion that follows a normal curve.)1
X = (Z) (SD) + M = (-1.73) (2.56) + 6.43 = -4.43 + 6.43 = 2.00
How are you doing?
1. How is a Z score related to a raw score? 2. Write the formula for changing a raw score to a Z score, and define each of
the symbols. 3. For a particular group of scores, and . Give the Z score for
(a) 30, (b) 15, (c) 20, and (d) 22.5. 4. Write the formula for changing a Z score to a raw score, and define each of
the symbols. 5. For a particular group of scores, and . Give the raw score for
a Z score of (a) , (b) , (c) 0, and (d) . 6. Suppose a person has a Z score for overall health of and a Z score for
overall sense of humor of . What does it mean to say that this person is healthier than she is funny?
-3+ .5+2 SD = 2M = 10
SD = 5M = 20
1.A Zscore is the number of standard deviations a raw score is above or below the mean.
2.SD.Zis the Zscore; Xis the raw score; Mis the mean; SDis the standard deviation.
3.(a)(b);(c)0;(d).5. 4.Xis the raw score; Zis the Zscore; SDis the standard de-
viation; Mis the mean. 5.(a)(b)11;(c)10;(d) 4. 6.This person is more above the average in health (in terms of how much people
typically vary from average in health) than she is above the average in humor (in terms of how much people typically vary from the average in humor).
X=(Z)(SD)+M. -1 Z=(X-M)/SD=(30-20)/5=10/5=2;
Some Key Ingredients for Inferential Statistics 73
Figure 3–6 A normal curve.
The Normal Curve As noted in Chapter 1, the graphs of the distributions of many of the variables that psychologists study follow a unimodal, roughly symmetrical, bell-shaped curve. These bell-shaped smooth histograms approximate a precise and important mathe- matical distribution called the normal distribution, or, more simply, the normal curve.2 The normal curve is a mathematical (or theoretical) distribution. Re- searchers often compare the actual distributions of the variables they are studying (that is, the distributions they find in research studies) to the normal curve. They don’t expect the distributions of their variables to match the normal curve perfectly (since the normal curve is a theoretical distribution), but researchers often check whether their variables approximately follow a normal curve. (The normal curve or normal distribution is also often called a Gaussian distribution after the astronomer Karl Friedrich Gauss. However, if its discovery can be attributed to anyone, it should really be to Abraham de Moivre—see Box 3–1.) An example of the normal curve is shown in Figure 3–6.
Why the Normal Curve Is So Common in Nature Take, for example, the number of different letters a particular person can remem- ber accurately on various testings (with different random letters each time). On some testings the number of letters remembered may be high, on others low, and on most somewhere in between. That is, the number of different letters a person can recall on various testings probably approximately follows a normal curve. Suppose that the person has a basic ability to recall, say, seven letters in this kind of memory task. Nevertheless, on any particular testing, the actual number re- called will be affected by various influences—noisiness of the room, the person’s mood at the moment, a combination of random letters confused with a familiar name, and so on.
These various influences add up to make the person recall more than seven on some testings and less than seven on others. However, the particular combination of such influences that come up at any testing is essentially random; thus, on most testings, positive and negative influences should cancel out. The chances are not very good of all the negative influences happening to come together on a testing when none of the positive influences show up. Thus, in general, the person remem- bers a middle amount, an amount in which all the opposing influences cancel each other out. Very high or very low scores are much less common.
This creates a unimodal distribution with most of the scores near the middle and fewer at the extremes. It also creates a distribution that is symmetrical, because the number of letters recalled is as likely to be above as below the middle. Being a
normal distribution frequency distri- bution that follows a normal curve.
normal curve specific, mathematically defined, bell-shaped frequency distribu- tion that is symmetrical and unimodal; distributions observed in nature and in research commonly approximate it.
74 Chapter 3
In England, de Moivre was highly esteemed as a man of letters as well as of numbers, being familiar with all the classics and able to recite whole scenes from his beloved Moliére’s Misanthropist. But for all his feelings for his native France, the French Academy elected him a foreign member of the Academy of Sciences just before his death. In England, he was ineligible for a university position because he was a foreigner there as well. He re- mained in poverty, unable even to marry. In his earlier years, he worked as a traveling teacher of mathematics. Later, he was famous for his daily sittings in Slaughter’s Coffee House in Long Acre, making himself available to gamblers and insurance underwriters (two professions equally uncertain and hazardous before statistics were refined), who paid him a small sum for figuring odds for them.
De Moivre’s unusual death generated several legends. He worked a great deal with infinite series, which always converge to a certain limit. One story has it that de Moivre began sleeping 15 more minutes each night until he was asleep all the time, then died. Another version claims that his work at the coffeehouse drove him to such despair that he simply went to sleep until he died. At any rate, in his 80s he could stay awake only four hours a day, although he was said to be as keenly intellectual in those hours as ever. Then his wakefulness was reduced to 1 hour, then none at all. At the age of 87, after eight days in bed, he failed to wake and was declared dead from “somnolence” (sleepiness).
Sources: Pearson (1978); Tankard (1984).
BOX 3–1 de Moivre, the Eccentric Stranger Who Invented the Normal Curve
The normal curve is central to statistics and is the foun- dation of most statistical theories and procedures. If any one person can be said to have discovered this fundamen- tal of the field, it was Abraham de Moivre. He was a French Protestant who came to England at the age of 21 because of religious persecution in France, which in 1685 denied Protestants all their civil liberties. In England, de Moivre became a friend of Isaac Newton, who was sup- posed to have often answered questions by saying, “Ask Mr. de Moivre—he knows all that better than I do.” Yet because he was a foreigner, de Moivre was never able to rise to the same heights of fame as the British-born math- ematicians who respected him so greatly.
Abraham de Moivre was mainly an expert on chance. In 1733, he wrote a “method of approximating the sum of the terms of the binomial expanded into a series.” His paper essentially described the normal curve. The de- scription was only in the form of a law, however; de Moivre never actually drew the curve itself. In fact, he was not very interested in it.
Credit for discovering the normal curve is often given to Pierre Laplace, a Frenchman who stayed home; or Karl Friedrich Gauss, a German; or Thomas Simpson, an Eng- lishman. All worked on the problem of the distribution of errors around a mean, even going so far as describing the curve or drawing approximations of it. But even without drawing it, de Moivre was the first to compute the areas under the normal curve at 1, 2, and 3 standard deviations, and Karl Pearson (discussed in Chapter 13, Box 13–1), a distinguished later statistician, felt strongly that de Moivre was the true discoverer of this important concept.
unimodal symmetrical curve does not guarantee that it will be a normal curve; it could be too flat or too pointed. However, it can be shown mathematically that in the long run, if the influences are truly random, and the number of different influences being combined is large, a precise normal curve will result. Mathematical statisti- cians call this principle the central limit theorem. We have more to say about this principle in Chapter 5.
The Normal Curve and the Percentage of Scores Between the Mean and 1 and 2 Standard Deviations from the Mean The shape of the normal curve is standard. Thus, there is a known percentage of scores above or below any particular point. For example, exactly 50% of the scores in a normal curve are below the mean, because in any symmetrical distribution half
Some Key Ingredients for Inferential Statistics 75
the scores are below the mean. More interestingly, as shown in Figure 3–7, approxi- mately 34% of the scores are always between the mean and 1 standard deviation from the mean.
Consider IQ scores. On many widely used intelligence tests, the mean IQ is 100, the standard deviation is 16, and the distribution of IQs is roughly a normal curve (see Figure 3–8). Knowing about the normal curve and the percentage of scores between the mean and 1 standard deviation above the mean tells you that about 34% of people have IQs between 100, the mean IQ, and 116, the IQ score that is 1 stan- dard deviation above the mean. Similarly, because the normal curve is symmetrical, about 34% of people have IQs between 100 and 84 (the score that is 1 standard devi- ation below the mean), and 68% ( ) have IQs between 84 and 116.
There are many fewer scores between 1 and 2 standard deviations from the mean than there are between the mean and 1 standard deviation from the mean. It turns out that about 14% of the scores are between 1 and 2 standard deviations above the mean (see Figure 3–7). (Similarly, about 14% of the scores are between 1 and 2 standard de- viations below the mean.) Thus, about 14% of people have IQs between 116 (1 stan- dard deviation above the mean) and 132 (2 standard deviations above the mean).
You will find it very useful to remember the 34% and 14% figures. These fig- ures tell you the percentages of people above and below any particular score whenever you know that score’s number of standard deviations above or below the mean. You can also reverse this approach and figure out a person’s number of stan- dard deviations from the mean from a percentage. Suppose you are told that a per- son scored in the top 2% on a test. Assuming that scores on the test are approximately normally distributed, the person must have a score that is at least 2 standard deviations above the mean. This is because a total of 50% of the scores are above the mean, but 34% are between the mean and 1 standard deviation above
34% + 34%
−3 −2 −1 0 2%
14% 34% 34%
+1 +2 +3 Z Scores
Figure 3–7 Normal curve with approximate percentages of scores between the mean and 1 and 2 standard deviations above and below the mean.
68 10084 116 132
Figure 3–8 Distribution of IQ scores on many standard intelligence tests (with a mean of 100 and a standard deviation of 16).
76 Chapter 3
the mean, and another 14% are between 1 and 2 standard deviations above the mean. That leaves 2% of scores (that is, ) that are 2 standard deviations or more above the mean.
Similarly, suppose you were selecting animals for a study and needed to consider their visual acuity. Suppose also that visual acuity was normally distributed and you wanted to use animals in the middle two-thirds (a figure close to 68%) for visual acuity. In this situation, you would select animals that scored between 1 standard deviation above and 1 standard deviation below the mean. (That is, about 34% are between the mean and 1 standard deviation above the mean and another 34% are be- tween the mean and 1 standard deviation below the mean.) Also, remember that a Z score is the number of standard deviations that a score is above or below the mean— which is just what we are talking about here. Thus, if you knew the mean and the standard deviation of the visual acuity test, you could figure out the raw scores (the actual level of visual acuity) for being 1 standard deviation below and 1 standard de- viation above the mean (that is, Z scores of and ). You would do this using the methods of changing raw scores to Z scores and vice versa that you learned earlier in this chapter, which are
The Normal Curve Table and Z Scores The 50%, 34%, and 14% figures are important practical rules for working with a group of scores that follow a normal distribution. However, in many research and ap- plied situations, psychologists need more accurate information. Because the normal curve is a precise mathematical curve, you can figure the exact percentage of scores between any two points on the normal curve (not just those that happen to be right at 1 or 2 standard deviations from the mean). For example, exactly 68.59% of scores have a Z score between and ; exactly 2.81% of scores have a Z score be- tween and ; and so forth.
You can figure these percentages using calculus, based on the formula for the normal curve. However, you can also do this much more simply (which you are probably glad to know!). Statisticians have worked out tables for the normal curve that give the percentage of scores between the mean (a Z score of 0) and any other Z score (as well as the percentage of scores in the tail for any Z score).
Wehave includedanormalcurve table in theAppendix (TableA–1,pp.664–667). Table 3–1 shows the first part of the full table. The first column in the table lists the Z score. The second column, labeled “% Mean to Z,” gives the percentage of scores between the mean and that Z score. The shaded area in the curve at the top of the col- umn gives a visual reminder of the meaning of the percentages in the column. The third column, labeled “% in Tail,” gives the percentage of scores in the tail for that Z score. The shaded tail area in the curve at the top of the column shows the meaning of the percentages in the column. Notice that the table lists only positive Z scores. This is because the normal curve is perfectly symmetrical. Thus, the percentage of scores between the mean and, say, a Z of (which is 33.65%) is exactly the same as the percentage of scores between the mean and a Z of (again 33.65%); and the percentage of scores in the tail for a Z score of (3.84%) is the same as the percentage of scores in the tail for a Z score of (again, 3.84%). Notice that for each Z score, the “% Mean to Z ” value and the “% in Tail” value sum to 50.00. This is because exactly 50% of the scores are above the mean for a normal curve. For ex- ample, for the Z score of .57, the “% Mean to Z” value is 21.57% and the “% in Tail” value is 28.43%, and .
Suppose you want to know the percentage of scores between the mean and a Z score of .64. You just look up .64 in the “Z” column of the table and the “% Mean
21.57% + 28.43% = 50.00%
– .98 + .98
+ .89+ .79 -1.68+ .62
Z = (X – M)/SD and X = (Z)(SD) + M.
50% – 34% – 14% = 2%
T I P F O R S U C C E S S Remember that negative Z scores are scores below the mean and positive Z scores are scores above the mean.
normal curve table table showing percentages of scores associated with the normal curve; the table usually includes percentages of scores between the mean and various numbers of standard devia- tions above the mean and percentages of scores more positive than various num- bers of standard deviations above the mean.
Some Key Ingredients for Inferential Statistics 77
to Z” column tells you that 23.89% of the scores in a normal curve are between the mean and this Z score. These values are highlighted in Table 3–1.
You can also reverse the process and use the table to find the Z score for a par- ticular percentage of scores. For example, imagine that 30% of ninth-grade students had a creativity score higher than Janice’s. Assuming that creativity scores follow a normal curve, you can figure out her Z score as follows: if 30% of students scored higher than she did, then 30% of the scores are in the tail above her score. This is shown in Figure 3–9. So, you would look at the “% in Tail” column of the table until you found the percentage that was closest to 30%. In this example, the closest is 30.15%. Finally, look at the “Z” column to the left of this percentage, which lists a Z score of .52 (these values of 30.15% and .52 are highlighted in Table 3–1). Thus, Janice’s Z score for her level of creativity is .52. If you know the mean and standard deviation for ninth-grade students’ creativity scores, you can figure out Janice’s ac- tual raw score on the test by changing her Z score of .52 to a raw score using the usual formula, .X = (Z)(SD) + (M)
T I P F O R S U C C E S S Notice that the table repeats the basic three columns twice on the page. Be sure to look across to the columns you need.
Table 3–1 Normal Curve Areas: Percentage of the Normal Curve Between the Mean and the Scores Shown and Percentage of Scores in the Tail for the Z Scores Shown (First part of table only: full table is Table A–1 in the Appendix. Highlighted values are examples from the text.)
Z % Mean to Z % in Tail Z % Mean to Z % in Tail
.00 .00 50.00 .45 17.36 32.64
.01 .40 49.60 .46 17.72 32.28
.02 .80 49.20 .47 18.08 31.92
.03 1.20 48.80 .48 18.44 31.56
.04 1.60 48.40 .49 18.79 31.21
.05 1.99 48.01 .50 19.15 30.85
.06 2.39 47.61 .51 19.50 30.50
.07 2.79 47.21 .52 19.85 30.15
.08 3.19 46.81 .53 20.19 29.81
.09 3.59 46.41 .54 20.54 29.46
.10 3.98 46.02 .55 20.88 29.12
.11 4.38 45.62 .56 21.23 28.77
.12 4.78 45.22 .57 21.57 28.43
.13 5.17 44.83 .58 21.90 28.10
.14 5.57 44.43 .59 22.24 27.76
.15 5.96 44.04 .60 22.57 27.43
.16 6.36 43.64 .61 22.91 27.09
.17 6.75 43.25 .62 23.24 26.76
.18 7.14 42.86 .63 23.57 26.43
.19 7.53 42.47 .64 23.89 26.11
.20 7.93 42.07 .65 24.22 25.78
.21 8.32 41.68 .66 24.54 25.46
mean Zmean Zmean Zmean Z
78 Chapter 3
Steps for Figuring the Percentage of Scores Above or Below a Particular Raw Score or Z Score Using the Normal Curve Table Here are the five steps for figuring the percentage of scores.
❶ If you are beginning with a raw score, first change it to a Z score. Use the usual formula, .
❷ Draw a picture of the normal curve, where the Z score falls on it, and shade in the area for which you are finding the percentage. (When marking where the Z score falls on the normal curve, be sure to put it in the right place above or below the mean according to whether it is a positive or negative Z score.)
❸ Make a rough estimate of the shaded area’s percentage based on the 50%–34%–14% percentages. You don’t need to be very exact; it is enough just to estimate a range in which the shaded area has to fall, figuring it is be- tween two particular whole Z scores. This rough estimate step is designed not only to help you avoid errors (by providing a check for your figuring), but also to help you develop an intuitive sense of how the normal curve works.
❹ Find the exact percentage using the normal curve table, adding 50% if nec- essary. Look up the Z score in the “Z” column of Table A–1 and find the percent- age in the “% Mean to Z” column or “% in Tail” column next to it. If you want the percentage of scores between the mean and this Z score, or if you want the percentage of scores in the tail for this Z score, the percentage in the table is your final answer. However, sometimes you need to add 50% to the percentage in the table. You need to do this if the Z score is positive and you want the total percent- age below this Z score, or if the Z score is negative and you want the total per- centage above this Z score. However, you don’t need to memorize these rules; it is much easier to make a picture for the problem and reason out whether the per- centage you have from the table is correct as is or if you need to add 50%.
❺ Check that your exact percentage is within the range of your rough esti- mate from Step ❸.
Examples Here are two examples using IQ scores where and .
Example 1: If a person has an IQ of 125, what percentage of people have higher IQs?
SD = 16M = 100
Z = (X – M)/SD
.52 1 2
Figure 3–9 Distribution of creativity test scores showing area for top 30% of scores and Z score where this area begins.
Some Key Ingredients for Inferential Statistics 79
❶ If you are beginning with a raw score, first change it to a Z score. Using the usual formula, .
❷ Draw a picture of the normal curve, where the Z score falls on it, and shade in the area for which you are finding the percentage. This is shown in Figure 3–10 (along with the exact percentages figured later).
❸ Make a rough estimate of the shaded area’s percentage based on the 50%–34%–14% percentages. If the shaded area started at a Z score of 1, it would have 16% above it. If it started at a Z score of 2, it would have only 2% above it. So, with a Z score of 1.56, the number of scores above it has to be somewhere between 16% and 2%.
❹ Find the exact percentage using the normal curve table, adding 50% if nec- essary. In Table A–1, 1.56 in the “Z” column goes with 5.94 in the “% in Tail” column. Thus, 5.94% of people have IQ scores higher than 125. This is the an- swer to our problem. (There is no need to add 50% to the percentage.)
❺ Check that your exact percentage is within the range of your rough estimate from Step ❸. Our result, 5.94%, is within the 16-to-2% range we estimated.
Example 2: If a person has an IQ of 95, what percentage of people have higher IQs?
❶ If you are beginning with a raw score, first change it to a Z score. Using the usual formula, .
❷ Draw a picture of the normal curve, where the Z score falls on it, and shade in the area for which you are finding the percentage. This is shown in Figure 3–11 (along with the percentages figured later).
Z = (95 – 100)/16 = – .31
Z = (X – M)/SD, Z = (125 – 100)/16 = +1.56
68 10084 116 132125
Figure 3–10 Distribution of IQ scores showing percentage of scores above an IQ score of 125 (shaded area).
68 10084 116 132
0- .31 +2+1−2
Figure 3–11 Distribution of IQ scores showing percentage of scores above an IQ score of 95 (shaded area).
80 Chapter 3
❸ Make a rough estimate of the shaded area’s percentage based on the 50%– 34%–14% percentages. You know that 34% of the scores are between the mean and a Z score of . Also, 50% of the curve is above the mean. Thus, the Z score of has to have between 50% and 84% of scores above it.
❹ Find the exact percentage using the normal curve table, adding 50% if nec- essary. The table shows that 12.17% of scores are between the mean and a Z score of .31. Thus, the percentage of scores above a Z score of is the 12.17% between the Z score and the mean plus the 50% above the mean, which is 62.17%.
❺ Check that your exact percentage is within the range of your rough esti- mate from Step ❸. Our result of 62.17% is within the 50-to-84% range we estimated.
Figuring Z Scores and Raw Scores from Percentages Using the Normal Curve Table Going from a percentage to a Z score or raw score is similar to going from a Z score or raw score to a percentage. However, you reverse the procedure when figuring the exact percentage. Also, any necessary changes from a Z score to a raw score are done at the end.
Here are the five steps.
❶ Draw a picture of the normal curve, and shade in the approximate area for your percentage using the 50%–34%–14% percentages.
❷ Make a rough estimate of the Z score where the shaded area stops. ❸ Find the exact Z score using the normal curve table (subtracting 50% from
your percentage if necessary before looking up the Z score). Looking at your picture, figure out either the percentage in the shaded tail or the percentage be- tween the mean and where the shading stops. For example, if your percentage is the bottom 35%, then the percentage in the shaded tail is 35%. Figuring the per- centage between the mean and where the shading stops will sometimes involve subtracting 50% from the percentage in the problem. For example, if your per- centage is the top 72%, then the percentage from the mean to where that shading stops is 22% ( ).
Once you have the percentage, look up the closest percentage in the appro- priate column of the normal curve table (“% Mean to Z” or “% in Tail”) and find the Z score for that percentage. That Z will be your answer—except it may be negative. The best way to tell if it is positive or negative is by looking at your picture.
❹ Check that your exact Z score is within the range of your rough estimate from Step ❷.
❺ If you want to find a raw score, change it from the Z score. Use the usual for- mula,
Examples Here are three examples. Once again, we use IQ for our examples, with and .
Example 1: What IQ score would a person need to be in the top 5%?
❶ Draw a picture of the normal curve, and shade in the approximate area for your percentage using the 50%–34%–14% percentages. We wanted the top 5%. Thus, the shading has to begin above (to the right of) 1 SD (there are 16%
SD = 16 M = 100
X = (Z)(SD) + M.
72% – 50% = 22%
– .31 -1
Some Key Ingredients for Inferential Statistics 81
of scores above 1 SD). However, it cannot start above 2 SD because only 2% of all the scores are above 2 SD. But 5% is a lot closer to 2% than to 16%. Thus, you would start shading a small way to the left of the 2 SD point. This is shown in Figure 3–12.
❷ Make a rough estimate of the Z score where the shaded area stops. The Z score is between and .
❸ Find the exact Z score using the normal curve table (subtracting 50% from your percentage if necessary before looking up the Z score). We want the top 5%; so we can use the “% in Tail” column of the normal curve table. Looking in that column, the closest percentage to 5% is 5.05% (or you could use 4.95%). This goes with a Z score of 1.64 in the “Z” column.
❹ Check that your exact Z score is within the range of your rough estimate from Step ❷. As we estimated, is between and (and closer to 2).
❺ If you want to find a raw score, change it from the Z score. Using the formula, . In sum, to be in the top
5%, a person would need an IQ of at least 126.24.
Example 2: What IQ score would a person need to be in the top 55%?
❶ Draw a picture of the normal curve and shade in the approximate area for your percentage using the 50%–34%–14% percentages. You want the top 55%. There are 50% of scores above the mean. So, the shading has to begin below (to the left of) the mean. There are 34% of scores between the mean and 1 SD below the mean; so the score is between the mean and 1 SD below the mean. You would shade the area to the right of that point. This is shown in Figure 3–13.
X = (Z)(SD) + M = (1.64)(16) + 100 = 126.24
68 10084 116 132126.24
Figure 3–12 Finding the Z score and IQ raw score for where the top 5% of scores start.
68 10084 116 13297.92
0−1 +2−.13 +1−2
Figure 3–13 Finding the IQ score for where the top 55% of scores start.
82 Chapter 3
❷ Make a rough estimate of the Z score where the shaded area stops. The Z score has to be between 0 and .
❸ Find the exact Z score using the normal curve table (subtracting 50% from your percentage if necessary before looking up the Z score). Being in the top 55% means that 5% of people have IQs between this IQ and the mean (that is,
). In the normal curve table, the closest percentage to 5% in the “% Mean to Z” column is 5.17%, which goes with a Z score of .13. Because you are below the mean, this becomes .
❹ Check that your exact Z score is within the range of your rough estimate from Step ❷. As we estimated, is between 0 and .
➎ If you want to find a raw score, change it from the Z score. Using the usual formula, . So, to be in the top 55% on IQ, a per- son needs an IQ score of 97.92 or higher.
Example 3: What range of IQ scores includes the 95% of people in the middle range of IQ scores? This kind of problem—finding the middle percentage—may seem odd. How-
ever, it is actually a very common situation used in procedures you will learn in later chapters.
Think of this kind of problem in terms of finding the scores that go with the upper and lower ends of this percentage. Thus, in this example, you are trying to find the points where the bottom 2.5% ends and the top 2.5% begins (which, out of 100%, leaves the middle 95%).
❶ Draw a picture of the normal curve, and shade in the approximate area for your percentage using the 50%–34%–14% percentages. Let’s start where the top 2.5% begins. This point has to be higher than 1 SD (16% of scores are higher than 1 SD). However, it cannot start above 2 SD because there are only 2% of scores above 2 SD. But 2.5% is very close to 2%. Thus, the top 2.5% starts just to the left of the 2 SD point. Similarly, the point where the bottom 2.5% comes in is just to the right of SD. The result of all this is that we will shade in two tail areas on the curve: one starting just above SD and the other starting just below SD. This is shown in Figure 3–14.
❷ Make a rough estimate of the Z score where the shaded area stops. You can see from the picture that the Z score for where the shaded area stops above the mean is just below . Similarly, the Z score for where the shaded area stops below the mean is just above .
❸ Find the exact Z score using the normal curve table (subtracting 50% from your percentage if necessary before looking up the Z score). Being in the top 2.5% means that 2.5% of the IQ scores are in the upper tail. In the normal curve table, the closest percentage to 2.5% in the “% in Tail” column is exactly 2.50%,
X = (- .13)(16) + 100 = 97.92
55% – 50% = 5%
68 10084 116 132
Figure 3–14 Finding the IQ scores for where the middle 95% of scores begins and ends.
Some Key Ingredients for Inferential Statistics 83
which goes with a Z score of . The normal curve is symmetrical. Thus, the Z score for the lower tail is .
❹ Check that your exact Z score is within the range of your rough estimate from Step ❷. As we estimated, is between and and is very close to , and is between and and very close to .
❺ If you want to find a raw score, change it from the Z score. For the high end, using the usual formula, . For the low end,
. In sum, the middle 95% of IQ scores run from 68.64 to 131.36. X = (-1.96)(16) + 100 = 68.64
X = (1.96)(16) + 100 = 131.36
How are you doing?
1. Why is the normal curve (or at least a curve that is symmetrical and unimodal) so common in nature?
2. Without using a normal curve table, about what percentage of scores on a normal curve are (a) above the mean, (b) between the mean and 1 SD above the mean, (c) between 1 and 2 SDs above the mean, (d) below the mean, (e) between the mean and 1 SD below the mean, and (f) between 1 and 2 SDs below the mean?
3. Without using a normal curve table, about what percentage of scores on a normal curve are (a) between the mean and 2 SDs above the mean, (b) below 1 SD above the mean, (c) above 2 SDs below the mean?
4. Without using a normal curve table, about what Z score would a person have who is at the start of the top (a) 50%, (b) 16%, (c) 84%, (d) 2%?
5. Using the normal curve table, what percentage of scores are (a) between the mean and a Z score of 2.14, (b) above 2.14, (c) below 2.14?
6. Using the normal curve table, what Z score would you have if (a) 20% are above you and (b) 80% are below you?
1.It is common because any particular score is the result of the random combi- nation of many effects, some of which make the score larger and some of which make the score smaller. Thus, on average these effects balance out near the middle, with relatively few at each extreme, because it is unlikely for most of the increasing and decreasing effects to come out in the same direction.
2.(a)Above the mean: 50%;(b)between the mean and 1 SDabove the mean: 34%;(c)between 1 and 2 SDs above the mean: 14%;(d)below the mean: 50%;(e)between the mean and 1 SDbelow the mean: 34%;(f)between 1 and 2 SDs below the mean: 14%.
3.(a)Between the mean and 2 SDs above the mean: 48%;(b)below 1 SD above the mean: 84%;(c)above 2 SDs below the mean: 98%.
4.(a)50%: 0;(b)16%: 1;(c)84%: ;(d)2%: 2. 5.(a)Between the mean and a Zscore of 2.14: 48.38%;(b)above 2.14: 1.62%;
(c)below 2.14: 98.38%. 6.(a)20% above you: .84;(b)80% below you: .84.
Sample and Population We are going to introduce you to some important ideas by thinking of beans. Sup- pose you are cooking a pot of beans and taste a spoonful to see if they are done. In this example, the pot of beans is a population, the entire set of things of interest. The spoonful is a sample, the part of the population about which you actually have
population entire group of people to which a researcher intends the results of a study to apply; larger group to which inferences are made on the basis of the particular set of people (sample) studied.
sample scores of the particular group of people studied; usually considered to be representative of the scores in some larger population.
84 Chapter 3
information. This is shown in Figure 3–15a. Figures 3–15b and 3–15c are other ways of showing the relation of a sample to a population.
In psychology research, we typically study samples not of beans but of individ- uals to make inferences about some larger group (a population). A sample might con- sist of the scores of 50 Canadian women who participate in a particular experiment, whereas the population might be intended to be the scores of all Canadian women. In an opinion survey, 1,000 people might be selected from the voting-age population of a particular district and asked for whom they plan to vote. The opinions of these 1,000 people are the sample. The opinions of the larger voting public in that country, to which the pollsters apply their results, is the population (see Figure 3–16).
Why Psychologists Study Samples Instead of Populations If you want to know something about a population, your results would be most accu- rate if you could study the entire population rather than a subgroup from it. However, in most research situations this is not practical. More important, the whole point of research usually is to be able to make generalizations or predictions about events be- yond your reach. We would not call it scientific research if we tested three particular cars to see which gets better gas mileage—unless you hoped to say something about the gas mileage of those models of cars in general. In other words, a researcher might do an experiment on how people store words in short-term memory using 20 students as the participants in the experiment. But the purpose of the experiment is not to find out how these particular 20 students respond to the experimental versus the control condition. Rather, the purpose is to learn something about human memory under these conditions in general.
The strategy in almost all psychology research is to study a sample of individu- als who are believed to be representative of the general population (or of some par- ticular population of interest). More realistically, researchers try to study people who do not differ from the general population in any systematic way that should matter for that topic of research.
The sample is what is studied, and the population is an unknown about which researchers draw conclusions based on the sample. Most of what you learn in the rest of this book is about the important work of drawing conclusions about populations based on information from samples.
(a) (b) (c)
Figure 3–15 Populations and samples: (a) The entire pot of beans is the population, and the spoonful is the sample. (b) The entire larger circle is the population, and the circle within it is the sample. (c) The histogram is of the population, and the particular shaded scores make up the sample.
Some Key Ingredients for Inferential Statistics 85
Methods of Sampling Usually, the ideal method of picking out a sample to study is called random selec- tion. The researcher starts with a complete list of the population and randomly se- lects some of them to study. An example of random selection is to put each name on a table tennis ball, put all the balls into a big hopper, shake it up, and have a blindfolded person select as many as are needed. (In practice, most researchers use a computer-generated list of random numbers. Just how computers or persons can create a list of truly random numbers is an interesting question in its own right that we examine in Chapter 14, Box 14–1.)
It is important not to confuse truly random selection with what might be called haphazard selection; for example, just taking whoever is available or happens to be first on a list. When using haphazard selection, it is surprisingly easy to pick
All Canadian Women
50 Canadian Women
Figure 3–16 Additional examples of populations and samples: (a) The population is the scores of all Canadian women, and the sample is the scores of the 50 Canadian women studied. (b) The population is the voting preferences of the entire voting-age population, and the sample is the voting preferences of the 1,000 voting-age people who were surveyed.
random selection method for select- ing a sample that uses truly random pro- cedures (usually meaning that each person in the population has an equal chance of being selected); one procedure is for the researcher to begin with a com- plete list of all the people in the popula- tion and select a group of them to study using a table of random numbers.
86 Chapter 3
accidentally a group of people that is really quite different from the population as a whole. Consider a survey of attitudes about your statistics instructor. Suppose you give your questionnaire only to other students sitting near you in class. Such a sur- vey would be affected by all the things that influence where students choose to sit, some of which have to do with the topic of your study—how much students like the instructor or the class. Thus, asking students who sit near you would likely result in opinions more like your own than a truly random sample would.
Unfortunately, it is often impractical or impossible to study a truly random sam- ple. Much of the time, in fact, studies are conducted with whoever is willing or avail- able to be a research participant. At best, as noted, a researcher tries to study a sample that is not systematically unrepresentative of the population in any known way. For example, suppose a study is about a process that is likely to differ for peo- ple of different age groups. In this situation, the researcher may attempt to include people of all age groups in the study. Alternatively, the researcher would be careful to draw conclusions only about the age group studied.
Methods of sampling is a complex topic that is discussed in detail in research methods textbooks (also see Box 3–2) and in the research methods Web Chapter W1 (Overview of the Logic and Language of Psychology Research) on the Web site for this book http://www.pearsonhighered.com/.
better houses and better neighborhoods. In 1948, the election was very close, and the Republican bias pro- duced the embarrassing mistake that changed survey methods forever.
Since 1948, all survey organizations have used what is called a “probability method.” Simple random sam- pling is the purest case of the probability method, but simple random sampling for a survey about a U.S. presi- dential election would require drawing names from a list of all the eligible voters in the nation—a lot of people. Each person selected would have to be found, in diversely scattered locales. So instead, “multistage cluster sam- pling” is used. The United States is divided into seven size-of-community groupings, from large cities to rural open country; these groupings are divided into seven geographic regions (New England, Middle Atlantic, and so on), after which smaller equal-sized groups are zoned, and then city blocks are drawn from the zones, with the probability of selection being proportional to the size of the population or number of dwelling units. Finally, an interviewer is given a randomly selected starting point on the map and is required to follow a given direction, taking households in sequence.
Actually, telephoning is often the favored method for polling today. Phone surveys cost about one-third of door-to-door polls. Since most people now own phones, this method is less biased than in Truman’s time. Phoning
BOX 3–2 Surveys, Polls, and 1948’s Costly “Free Sample” It is time to make you a more informed reader of polls in the media. Usually the results of properly done public polls are accompanied, somewhere in fine print, by a statement such as, “From a telephone poll of 1,000 American adults taken on June 4 and 5. Sampling error
.” What does a statement like this mean? The Gallup poll is as good an example as any (Gallup,
1972; see also http://www.gallup.com), and there is no better place to begin than in 1948, when all three of the major polling organizations—Gallup, Crossley (for Hearst papers), and Roper (for Fortune)—wrongly pre- dicted Thomas Dewey’s victory over Harry Truman for the U.S. presidency. Yet Gallup’s prediction was based on 50,000 interviews and Roper’s on 15,000. By con- trast, to predict George H. W. Bush’s 1988 victory, Gallup used only 4,089. Since 1952, the pollsters have never used more than 8,144—but with very small error and no outright mistakes. What has changed?
The method used before 1948, and never repeated since, was called “quota sampling.” Interviewers were assigned a fixed number of persons to interview, with strict quotas to fill in all the categories that seemed im- portant, such as residence, sex, age, race, and economic status. Within these specifics, however, they were free to interview whomever they liked. Republicans generally tended to be easier to interview. They were more likely to have telephones and permanent addresses and to live in
Some Key Ingredients for Inferential Statistics 87
Statistical Terminology for Samples and Populations The mean, variance, and standard deviation of a population are called population pa- rameters. A population parameter usually is unknown and can be estimated only from what you know about a sample taken from that population. You do not taste all the beans, just the spoonful. “The beans are done” is an inference about the whole pot.
Population parameters are usually shown as Greek letters (e.g., ). (This is a statistical convention with origins tracing back more than 2,000 years to the early Greek mathematicians.) The symbol for the mean of a population is , the Greek let- ter mu. The symbol for the variance of a population is , and the symbol for its stan- dard deviation is , the lowercase Greek letter sigma. You won’t see these symbols often, except while learning statistics. This is because, again, researchers seldom know the population parameters.
The mean, variance, and standard deviation you figure for the scores in a sample are called sample statistics. A sample statistic is figured from known information. Sample statistics are what we have been figuring all along and are expressed with the roman letters you learned in Chapter 2: M, , and SD. The population parameter and sample statistic symbols for the mean, variance, and standard deviation are sum- marized in Table 3–2.
The use of different types of symbols for population parameters (Greek letters) and sample statistics (roman letters) can take some getting used to; so don’t worry if it seems tricky at first. It’s important to know that the statistical concepts you are
also allows computers to randomly dial phone numbers and, unlike telephone directories, this method calls unlist- ed numbers. However, survey organizations in the United States typically do not call cell phone numbers. Thus, U.S. households that use a cell phone for all calls and do not have a home phone are not usually included in tele- phone opinion polls. Most survey organizations consider the current cell-phone-only rate to be low enough not to cause large biases in poll results (especially since the de- mographic characteristics of individuals without a home phone suggest that they are less likely to vote than indi- viduals who live in households with a home phone). However, anticipated future increases in the cell-phone- only rate will likely make this an important issue for opin- ion polls. Survey organizations will need to consider
additional polling methods, perhaps using the Internet and email.
Whether by telephone or face to face, there will be about 35% nonrespondents after three attempts. This cre- ates yet another bias, dealt with through questions about how much time a person spends at home, so that a slight extra weight can be given to the responses of those reached but usually at home less, to make up for those missed entirely.
Now you know quite a bit about opinion polls, but we have left two important questions unanswered: Why are only about 1,000 included in a poll meant to describe all U.S. adults, and what does the term sampling error mean? For these answers, you must wait for Chapter 5 (Box 5–1).
Table 3–2 Population Parameters and Sample Statistics
Population Parameter (Usually Unknown)
Sample Statistic (Figured from Known Data)
Basis: Scores of entire population Scores of sample only