O Z Scores 68
O The Normal Curve 73
O Sample and Population 83
O Probability 88
O Controversies: Is the Normal Curve Really So Normal? and Using Nonrandom Samples 93
• Z Scores, Normal Curves, Samples and Populations, and Probabilities in Research Articles 95
O Advanced Topic: Probability Rules and Conditional Probabilities 96
O Summary 97
• Key Terms 98
O Example Worked-Out Problems 99
O Practice Problems 102
O Using SPSS 105
O Chapter Notes 106
Some Key Ingredients for Inferential Statistics
Z Scores, the Normal Curve, Sample versus Population, and Probability
0 rdinarily, psychologists conduct research to test a theoretical principle or the effectiveness of a practical procedure. For example, a psychophysiologist might measure changes in heart rate from before to after solving a difficult problem. The measurements are then used to test a theory predicting that heart rate should change following successful problem solving. An applied social psychologist might examine
Before beginning this chapter, be sure you have mastered the mater- ial in Chapter 1 on the shapes of distributions and the material in Chapter 2 on the mean and stan- dard deviation.
68 Chapter 3
Z score number of standard deviations that a score is above (or below, if it is negative) the mean of its distribution; it is thus an ordinary score transformed so that it better describes the score’s location in a distribution.
the effectiveness of a program of neighborhood meetings intended to promote water conservation. Such studies are carried out with a particular group of research partici- pants. But researchers use inferential statistics to make more general conclusions about the theoretical principle or procedure being studied. These conclusions go beyond the particular group of research participants studied.
This chapter and Chapters 4, 5, and 6 introduce inferential statistics. In this chapter, we consider four topics: Z scores, the normal curve, sample versus popula- tion, and probability. This chapter prepares the way for the next ones, which are more demanding conceptually.
Z Scores In Chapter 2, you learned how to describe a group of scores in terms and the mean and variation around the mean. In this section you learn how to describe a particular score in terms of where it fits into the overall group of scores. That is, you learn how to use the mean and standard deviation to create a Z score; a Z score describes a score in terms of how much it is above or below the average.
Suppose you are told that a student, Jerome, is asked the question, “To what extent are you a morning person?” Jerome responds with a 5 on a 7-point scale, where 1 = not at all and 7 = extremely. Now suppose that we do not know anything about how other students answer this question. In this situation, it is hard to tell whether Jerome is more or less of a morning person in relation to other students. However, suppose that we know for students in general, the mean rating (M) is 3.40 and the standard deviation (SD) is 1.47. (These values are the actual mean and standard deviation that we found for this question in a large sample of statistics students from eight different universities across the United States and Canada.) With this knowledge, we can see that Jerome is more of a morning person than is typical among students. We can also see that Jerome is above the average (1.60 units more than average; that is, 5 — 3.40 = 1.60) by a bit more than students typically vary from the average (that is, students typically vary by about 1.47, the standard deviation). This is all shown in Figure 3-1.
What Is a Z Score? A Z score makes use of the mean and standard deviation to describe a particular score. Specifically, a Z score is the number of standard deviations the actual score is above or below the mean. If the actual score is above the mean, the Z score is posi- tive. If the actual score is below the mean, the Z score is negative. The standard deviation now becomes a kind of yardstick, a unit of measure in its own right.
In our example, Jerome has a score of 5, which is 1.60 units above the mean of 3.40. One standard deviation is 1.47 units; so Jerome’s score is a little more than 1 standard
SD SD SD SD “,!< >l< )-1-4 >l<
.4() 1.93 3.40 4.87 6.34
t Mean Jerome’s
Figure 3-1 Score of one student, Jerome, in relation to the overall distribution on the measure of the extent to which students are morning people.
Z score: —3 —2 —1 0 +1 +2 +3
Times spoken per hour: 0 4 8 12 16 20 24
Some Key Ingredients for Inferential Statistics 69
Z score: —2 —1 0 +1 +2
Raw score: .46 1.93 3.40 4.87 6.34
Figure 3-2 Scales of Z scores and raw scores for the example of the extent to which students are morning people.
deviation above the mean. To be precise, Jerome’s Z score is +1.09 (that is, his score of 5 is 1.09 standard deviations above the mean). Another student, Michelle, has a score of 2. Her score is 1.40 units below the mean. Therefore, her score is a little less than 1 stan- dard deviation below the mean (a Z score of -.95). So, Michelle’s score is below the average by about as much as students typically vary from the average.
Z scores have many practical uses. As you will see later in this chapter, they are es- pecially useful for showing exactly where a particular score falls on the normal curve.
Z Scores as a Scale Figure 3-2 shows a scale of Z scores lined up against a scale of raw scores for our example of the degree to which students are morning people. A raw score is an ordi- nary score as opposed to a Z score. The two scales are something like a ruler with inches lined up on one side and centimeters on the other.
Changing a number to a Z score is a bit like converting words for measurement in various obscure languages into one language that everyone can understand—inches, cubits, and zingles (we made up that last one), for example, into centimeters. It is a very valuable tool.
Suppose that a developmental psychologist observed 3-year-old David in a lab- oratory situation playing with other children of the same age. During the observa- tion, the psychologist counted the number of times David spoke to the other children. The result, over several observations, is that David spoke to other children about 8 times per hour of play. Without any standard of comparison, it would be hard to draw any conclusions from this. Let’s assume, however, that it was known from pre- vious research that under similar conditions, the mean number of times children speak is 12, with a standard deviation of 4. With that information, we can see that David spoke less often than other children in general, but not extremely less often. David would have a Z score of -1 (M = 12 and SD = 4, thus a score of 8 is 1 SD below Al), as shown in Figure 3-3.
Suppose Ryan was observed speaking to other children 20 times in an hour. Ryan would clearly be unusually talkative, with a Z score of +2 (see Figure 3-3). Ryan speaks not merely more than the average but more by twice as much as children tend to vary from the average!
raw score ordinary score (or any num- ber in a distribution before it has been made into a Z score or otherwise trans- formed).
Figure 3-3 Number of times each hour that two children spoke, shown as raw scores and Z scores.
Formula to Change a Raw Score to a Z Score A Z score is the number of standard deviations by which the raw score is above or below the mean. To figure a Z score, subtract the mean from the raw score, giving the deviation score. Then divide the deviation score by the standard deviation. The formula is
A Z score is the raw score minus the mean, divided by the standard deviation.
X — M Z =
The raw score is the Z score multiplied by the standard deviation, plus the mean.
For example, using the formula for David, the child who spoke to other children 8 times in an hour (where the mean number of times children speak is 12 and the standard deviation is 4),
8-12 —4 Z=
Steps to Change a Raw Score to a Z Score O Figure the deviation score: subtract the mean from the raw score. • Figure the Z score: divide the deviation score by the standard deviation.
Using these steps for David, the child who spoke with other children 8 times in an hour,
O Figure the deviation score: subtract the mean from the raw score. 8 — 12 = —4.
@ Figure the Z score: divide the deviation score by the standard deviation. —4/4 = —1.
Formula to Change a Z Score to a Raw Score To change a Z score to a raw score, the process is reversed: multiply the Z score by the standard deviation and then add the mean. The formula is
X = (Z) (S D) + M (3-2)
Suppose a child has a Z score of 1.5 on the number of times spoken with another child during an hour. This child is 1.5 standard deviations above the mean. Because the standard deviation in this example is 4 raw score units (times spoken), the child is 6 raw score units above the mean, which is 12. Thus, 6 units above the mean is 18. Using the formula,
X = (Z)(SD) + M = (1.5)(4) + 12 = 6 + 12 = 18
Steps to Change a Z Score to a Raw Score O Figure the deviation score: multiply the Z score by the standard deviation. @ Figure the raw score: add the mean to the deviation score.
Using these steps for the child with a Z score of 1.5 on the number of times spoken with another child during an hour:
O Figure the deviation score: multiply the Z score by the standard deviation. 1.5 X 4 = 6.
@ Figure the raw score: add the mean to the deviation score. 6 + 12 = 18.
(1.00) Student 2
Z score: -2 i’ -1 0 +1 I I I I
Raw score: .46 1.93 3.40 4.87
(6.00) Student I
1 +2 I
(2.00) (10.00) Student 2 Student 1
1 1 Z score: -3 -2 -1 0 +1 +2 +3
I F I I I I I Stress rating: -1.25 1.31 3.87 6.43 8.99 11.55 14.11
Some Key Ingredients for Inferential Statistics 71
Figure 3-4 Scales of Z scores and raw scores for the example of the extent to which students are morning people, showing the scores of two sample students.
Additional Examples of Changing Z Scores to Raw Scores and Vice Versa Consider again the example from the start of the chapter in which students were asked the extent to which they were a morning person. Using a scale from 1 (not at all) to 7 (extremely), the mean was 3.40 and the standard deviation was 1.47. Sup- pose a student’s raw score is 6. That student is well above the mean. Specifically, using the formula,
X – M 6 – 3.40 2.60 Z = = 1.77
SD 1.47 1.47
That is, the student’s raw score is 1.77 standard deviations above the mean (see Figure 3-4, Student 1). Using the 7-point scale (from 1 = not at all to 7 = extremely), to what extent are you a morning person? Now figure the Z score for your raw score.
Another student has a Z score of -1.63, a score well below the mean. (This stu- dent is much less of a morning person than is typically the case for students.) You can find the exact raw score for this student using the formula
X = (Z)(SD) + M = (-1.63)(1.47) + 3.40 = -2.40 + 3.40 = 1.00
That is, the student’s raw score is 1.00 (see Figure 3-4, Student 2). Let’s also consider some examples from the study of students’ stress ratings.
The mean stress rating of the 30 statistics students (using a 0-10 scale) was 6.43 (see Figure 2-3), and the standard deviation was 2.56. Figure 3-5 shows the raw score and Z score scales. Suppose a student’s stress raw score is 10. That student is well above the mean. Specifically, using the formula
X – M 10 – 6.43 3.57 Z
– = 1.39
SD 2.56 2.56
Figure 3-5 Scales of Z scores and raw scores for 30 statistics students’ ratings of their stress level, showing the scores of two sample students. (Data based on Aron et al., 1995.)
72 Chapter 3
The student’s stress level is 1.39 standard deviations above the mean (see Figure 3-5, Student 1). On a scale of 0-10, how stressed have you been in the last TA weeks? Figure the Z score for your raw stress score.
Another student has a Z score of —1.73, a stress level well below the mean. You can find the exact raw stress score for this student using the formula
X = (Z)(SD) + M = (-1.73)(2.56) + 6.43 = —4.43 + 6.43 = 2.00
That is, the student’s raw stress score is 2.00 (see Figure 3-5, Student 2).
The Mean and Standard Deviation of Z Scores The mean of any distribution of Z scores is always 0. This is so because when you change each raw score to a Z score, you take the raw score minus the mean. So the mean is subtracted out of all the raw scores, making the overall mean come out to 0. In other words, in any distribution, the sum of the positive Z scores must always equal the sum of the negative Z scores. Thus, when you add them all up, you get 0.
The standard deviation of any distribution of Z scores is always 1. This is because when you change each raw score to a Z score, you divide by the standard deviation.
A Z score is sometimes called a standard score. There are two reasons: Z scores have standard values for the mean and the standard deviation, and, as we saw earlier, Z scores provide a kind of standard scale of measurement for any variable. (However, sometimes the term standard score is used only when the Z scores are for a distribu- tion that follows a normal curve.) 1
1. How is a Z score related to a raw score? 2. Write the formula for changing a raw score to a Z score, and define each of
the symbols. 3. For a particular group of scores, M = 20 and SD = 5. Give the Z score for
(a) 30, (b) 15, (c) 20, and (d) 22.5. 4. Write the formula for changing a Z score to a raw score, and define each of
the symbols. 5. For a particular group of scores, M = 10 and SD = 2. Give the raw score for
a Z score of (a) +2, (b) +.5, (c) 0, and (d) —3. 6. Suppose a person has a Z score for overall health of +2 and a Z score for
overall sense of humor of +1. What does it mean to say that this person is healthier than she is funny?
•ciownq ul abalene WOJJ. Ann Alleo!dAl eicload gonw moq Jo suaaat LAO Jownq ul abe.Jane agt anoqe sl eqs ueqi. 86EJOAE 1.1104 AJEA AIla0!PAT aidoed Lionw moq ui) gtieeq ul abe,iene eqt 8AOCIE 8.10W sl uosJad situ. .9
’17 (P) !ol. (0) (q) !i71. = + b = 0i. + (z)(z) = w + (as)(z) = x (e) •ueew NI. SI W :uon.e!A
-aPPJaPuala 01-11 a! as :WOOS Z Z :8.100S Mal NI a! X ‘IN + (GS)(Z) = X 17 . S . (P) !O (0) !1- (q) Z = 9/01- = 9/(OZ – oc) = as/(o/ – x) = z (E) •E
•uop.einap pepuels agt si as !ueaw ay), si W :WOOS M8a age sl x :WOOS z a! Z ‘OS/(1A1 — X) = Z ‘Z
•ueew moied Jo anode si alOOS MEJ e suoileinap piepuels Jeciwnu OJOOS z y •
Some Key Ingredients for Inferential Statistics 73
The Normal Curve As noted in Chapter 1, the graphs of the distributions of many of the variables that psychologists study follow a unimodal, roughly symmetrical, bell-shaped curve. These bell-shaped smooth histograms approximate a precise and important mathe- matical distribution called the normal distribution, or, more simply, the normal curve.2 The normal curve is a mathematical (or theoretical) distribution. Re- searchers often compare the actual distributions of the variables they are studying (that is, the distributions they find in research studies) to the normal curve. They don’t expect the distributions of their variables to match the normal curve perfectly (since the normal curve is a theoretical distribution), but researchers often check whether their variables approximately follow a normal curve. (The normal curve or normal distribution is also often called a Gaussian distribution after the astronomer Karl Friedrich Gauss. However, if its discovery can be attributed to anyone, it should really be to Abraham de Moivre—see Box 3-1.) An example of the normal curve is shown in Figure 3-6.
Why the Normal Curve Is So Common in Nature Take, for example, the number of different letters a particular person can remem- ber accurately on various testings (with different random letters each time). On some testings the number of letters remembered may be high, on others low, and on most somewhere in between. That is, the number of different letters a person can recall on various testings probably approximately follows a normal curve. Suppose that the person has a basic ability to recall, say, seven letters in this kind of memory task. Nevertheless, on any particular testing, the actual number re- called will be affected by various influences—noisiness of the room, the person’s mood at the moment, a combination of random letters confused with a familiar name, and so on.
These various influences add up to make the person recall more than seven on some testings and less than seven on others. However, the particular combination of such influences that come up at any testing is essentially random; thus, on most testings, positive and negative influences should cancel out. The chances are not very good of all the negative influences happening to come together on a testing when none of the positive influences show up. Thus, in general, the person remem- bers a middle amount, an amount in which all the opposing influences cancel each other out. Very high or very low scores are much less common.
This creates a unimodal distribution with most of the scores near the middle and fewer at the extremes. It also creates a distribution that is symmetrical, because the number of letters recalled is as likely to be above as below the middle. Being a
normal distribution frequency distri- bution that follows a normal curve.
normal curve specific, mathematically defined, bell-shaped frequency distribu- tion that is symmetrical and unimodal; distributions observed in nature and in research commonly approximate it.
Figure 3 -6 A normal curve.
74 Chapter 3
BOX 3-1 de Moivre, the Eccentric Stranger Who Invented the Normal Curve
The normal curve is central to statistics and is the foun- dation of most statistical theories and procedures. If any one person can be said to have discovered this fundamen- tal of the field, it was Abraham de Moivre. He was a French Protestant who came to England at the age of 21 because of religious persecution in France, which in 1685 denied Protestants all their civil liberties. In England, de Moivre became a friend of Isaac Newton, who was sup- posed to have often answered questions by saying, “Ask Mr. de Moivre—he knows all that better than I do.” Yet because he was a foreigner, de Moivre was never able to rise to the same heights of fame as the British-born math- ematicians who respected him so greatly.
Abraham de Moivre was mainly an expert on chance. In 1733, he wrote a “method of approximating the sum of the terms of the binomial expanded into a series.” His paper essentially described the normal curve. The de- scription was only in the form of a law, however; de Moivre never actually drew the curve itself. In fact, he was not very interested in it.
Credit for discovering the normal curve is often given to Pierre Laplace, a Frenchman who stayed home; or Karl Friedrich Gauss, a German; or Thomas Simpson, an Eng- lishman. All worked on the problem of the distribution of errors around a mean, even going so far as describing the curve or drawing approximations of it. But even without drawing it, de Moivre was the first to compute the areas under the normal curve at 1, 2, and 3 standard deviations, and Karl Pearson (discussed in Chapter 13, Box 13-1), a distinguished later statistician, felt strongly that de Moivre was the true discoverer of this important concept.
In England, de Moivre was highly esteemed as a man of letters as well as of numbers, being familiar with all the classics and able to recite whole scenes from his beloved Moliere’s Misanthropist. But for all his feelings for his native France, the French Academy elected him a foreign member of the Academy of Sciences just before his death. In England, he was ineligible for a university position because he was a foreigner there as well. He re- mained in poverty, unable even to marry. In his earlier years, he worked as a traveling teacher of mathematics. Later, he was famous for his daily sittings in Slaughter’s Coffee House in Long Acre, making himself available to gamblers and insurance underwriters (two professions equally uncertain and hazardous before statistics were refined), who paid him a small sum for figuring odds for them.
De Moivre’s unusual death generated several legends. He worked a great deal with infinite series, which always converge to a certain limit. One story has it that de Moivre began sleeping 15 more minutes each night until he was asleep all the time, then died. Another version claims that his work at the coffeehouse drove him to such despair that he simply went to sleep until he died. At any rate, in his 80s he could stay awake only four hours a day, although he was said to be as keenly intellectual in those hours as ever. Then his wakefulness was reduced to 1 hour, then none at all. At the age of 87, after eight days in bed, he failed to wake and was declared dead from “somnolence” (sleepiness).
Sources: Pearson (1978); Tankard (1984).
unimodal symmetrical curve does not guarantee that it will be a normal curve; it could be too flat or too pointed. However, it can be shown mathematically that in the long run, if the influences are truly random, and the number of different influences being combined is large, a precise normal curve will result. Mathematical statisti- cians call this principle the central limit theorem. We have more to say about this principle in Chapter 5.
The Normal Curve and the Percentage of Scores Between the Mean and 1 and 2 Standard Deviations from the Mean The shape of the normal curve is standard. Thus, there is a known percentage of scores above or below any particular point. For example, exactly 50% of the scores in a normal curve are below the mean, because in any symmetrical distribution half
Z Scores —3 —2 —1 0 + +2 +3
Some Key Ingredients for Inferential Statistics 75
Figure 3-7 Normal curve with approximate percentages of scores between the mean and 1 and 2 standard deviations above and below the mean.
the scores are below the mean. More interestingly, as shown in Figure 3-7, approxi- mately 34% of the scores are always between the mean and 1 standard deviation from the mean.
Consider IQ scores. On many widely used intelligence tests, the mean IQ is 100, the standard deviation is 16, and the distribution of IQs is roughly a normal curve (see Figure 3-8). Knowing about the normal curve and the percentage of scores between the mean and 1 standard deviation above the mean tells you that about 34% of people have IQs between 100, the mean IQ, and 116, the IQ score that is 1 stan- dard deviation above the mean. Similarly, because the normal curve is symmetrical, about 34% of people have IQs between 100 and 84 (the score that is 1 standard devi- ation below the mean), and 68% (34% + 34%) have IQs between 84 and 116.
There are many fewer scores between 1 and 2 standard deviations from the mean than there are between the mean and 1 standard deviation from the mean. It turns out that about 14% of the scores are between 1 and 2 standard deviations above the mean (see Figure 3-7). (Similarly, about 14% of the scores are between 1 and 2 standard de- viations below the mean.) Thus, about 14% of people have IQs between 116 (1 stan- dard deviation above the mean) and 132 (2 standard deviations above the mean).
You will find it very useful to remember the 34% and 14% figures. These fig- ures tell you the percentages of people above and below any particular score whenever you know that score’s number of standard deviations above or below the mean. You can also reverse this approach and figure out a person’s number of stan- dard deviations from the mean from a percentage. Suppose you are told that a per- son scored in the top 2% on a test. Assuming that scores on the test are approximately normally distributed, the person must have a score that is at least 2 standard deviations above the mean. This is because a total of 50% of the scores are above the mean, but 34% are between the mean and 1 standard deviation above
68 84 100 116 132
Figure 3-8 Distribution of IQ scores on many standard intelligence tests (with a mean of 100 and a standard deviation of 16).
wIll17111111711111ATT71171111.1 Remember that negative Z scores are scores below the mean and positive Z scores are scores above the mean.
normal curve table table showing percentages of scores associated with the
normal curve; the table usually includes
percentages of scores between the mean
and various numbers of standard devia-
tions above the mean and percentages of
scores more positive than various num-
bers of standard deviations above the
the mean, and another 14% are between 1 and 2 standard deviations above the mean. That leaves 2% of scores (that is, 50% – 34% – 14% = 2%) that are 2 standard deviations or more above the mean.
Similarly, suppose you were selecting animals for a study and needed to consider their visual acuity. Suppose also that visual acuity was normally distributed and you wanted to use animals in the middle two-thirds (a figure close to 68%) for visual acuity. In this situation, you would select animals that scored between 1 standard deviation above and 1 standard deviation below the mean. (That is, about 34% are between the mean and 1 standard deviation above the mean and another 34% are be- tween the mean and 1 standard deviation below the mean.) Also, remember that a Z score is the number of standard deviations that a score is above or below the mean— which is just what we are talking about here. Thus, if you knew the mean and the standard deviation of the visual acuity test, you could figure out the raw scores (the actual level of visual acuity) for being 1 standard deviation below and 1 standard de- viation above the mean (that is, Z scores of –1 and +1). You would do this using the methods of changing raw scores to Z scores and vice versa that you learned earlier in this chapter, which are Z = (X – M)/ SD and X = (Z)(SD) + M.
The Normal Curve Table and Z Scores The 50%, 34%, and 14% figures are important practical rules for working with a group of scores that follow a normal distribution. However, in many research and ap- plied situations, psychologists need more accurate information. Because the normal curve is a precise mathematical curve, you can figure the exact percentage of scores between any two points on the normal curve (not just those that happen to be right at 1 or 2 standard deviations from the mean). For example, exactly 68.59% of scores have a Z score between +.62 and –1.68; exactly 2.81% of scores have a Z score be- tween +.79 and +.89; and so forth.
You can figure these percentages using calculus, based on the formula for the normal curve. However, you can also do this much more simply (which you are probably glad to know!). Statisticians have worked out tables for the normal curve that give the percentage of scores between the mean (a Z score of 0) and any other Z score (as well as the percentage of scores in the tail for any Z score).
We have included a normal curve table in the Appendix (Table A-1, pp. 664– 667). Table 3-1 shows the first part of the full table. The first column in the table lists the Z score. The second column, labeled “% Mean to Z,” gives the percentage of scores between the mean and that Z score. The shaded area in the curve at the top of the col- umn gives a visual reminder of the meaning of the percentages in the column. The third column, labeled “% in Tail,” gives the percentage of scores in the tail for that Z score. The shaded tail area in the curve at the top of the column shows the meaning of the percentages in the column. Notice that the table lists only positive Z scores. This is because the normal curve is perfectly symmetrical. Thus, the percentage of scores between the mean and, say, a Z of +.98 (which is 33.65%) is exactly the same as the percentage of scores between the mean and a Z of –.98 (again 33.65%); and the percentage of scores in the tail for a Z score of +1.77 (3.84%) is the same as the percentage of scores in the tail for a Z score of –1.77 (again, 3.84%). Notice that for each Z score, the “% Mean to Z” value and the “% in Tail” value sum to 50.00. This is because exactly 50% of the scores are above the mean for a normal curve. For ex- ample, for the Z score of .57, the “% Mean to Z” value is 21.57% and the “% in Tail” value is 28.43%, and 21.57% + 28.43% = 50.00%.
Suppose you want to know the percentage of scores between the mean and a Z score of .64. You just look up .64 in the “Z” column of the table and the “% Mean
Some Key Ingredients for Inferential Statistics 77
Table 3-1 Normal Curve Areas: Percentage of the Normal Curve Between the Mean and the Scores Shown and Percentage of Scores in the Tail for the Z Scores Shown (First part of table only: full table is Table A-1 in the Appendix. Highlighted values are examples from the text.)
% Mean to Z
% in Tail Z
% Mean to Z
% in Tail
.00 .00 50.00 .45 17.36 32.64
.01 .40 49.60 .46 17.72 32.28
.02 .80 49.20 .47 18.08 31.92
.03 1.20 48.80 .48 18.44 31.56
.04 1.60 48.40 .49 18.79 31.21
.05 1.99 48.01 .50 19.15 30.85
.06 2.39 47.61 .51 19.50 30.50
.07 2.79 47.21 .52 19.85 30.15
.08 3.19 46.81 .53 20.19 29.81
.09 3.59 46.41 .54 20.54 29.46
.10 3.98 46.02 .55 20.88 29.12
.11 4.38 45.62 .56 21.23 28.77
.12 4.78 45.22 .57 21.57 28.43
.13 5.17 44.83 .58 21.90 28.10
.14 5.57 44.43 .59 22.24 27.76
.15 5.96 44.04 .60 22.57 27.43
.16 6.36 43.64 .61 22.91 27.09
.17 6.75 43.25 .62 23.24 26.76
.18 7.14 42.86 .63 23.57 26.43
.19 7.53 42.47 .64 23.89 26.11
.20 7.93 42.07 .65 24.22 25.78
.21 8.32 41.68 .66 24.54 25.46
to Z” column tells you that 23.89% of the scores in a normal curve are between the mean and this Z score. These values are highlighted in Table 3-1.
You can also reverse the process and use the table to find the Z score for a par- ticular percentage of scores. For example, imagine that 30% of ninth-grade students had a creativity score higher than Janice’s. Assuming that creativity scores follow a normal curve, you can figure out her Z score as follows: if 30% of students scored higher than she did, then 30% of the scores are in the tail above her score. This is shown in Figure 3-9. So, you would look at the “% in Tail” column of the table until you found the percentage that was closest to 30%. In this example, the closest is 30.15%. Finally, look at the “Z” column to the left of this percentage, which lists a Z score of .52 (these values of 30.15% and .52 are highlighted in Table 3-1). Thus, Janice’s Z score for her level of creativity is .52. If you know the mean and standard deviation for ninth-grade students’ creativity scores, you can figure out Janice’s ac- tual raw score on the test by changing her Z score of .52 to a raw score using the usual formula, X = (Z)(SD) + (M).
Notice that the table repeats the basic three columns twice on the page. Be sure to look across to the columns you need.
78 Chapter 3
0 .52 1 2
Figure 3 -9 Distribution of creativity test scores showing area for top 30% of scores and Z score where this area begins.
Steps for Figuring the Percentage of Scores Above or Below a Particular Raw Score or Z Score Using the Normal Curve Table Here are the five steps for figuring the percentage of scores.
O If you are beginning with a raw score, first change it to a Z score. Use the usual formula, Z = (X — M)/SD.
O Draw a picture of the normal curve, where the Z score falls on it, and shade in the area for which you are finding the percentage. (When marking where the Z score falls on the normal curve, be sure to put it in the right place above or below the mean according to whether it is a positive or negative Z score.)
O Make a rough estimate of the shaded area’s percentage based on the 50%-34%-14% percentages. You don’t need to be very exact; it is enough just to estimate a range in which the shaded area has to fall, figuring it is be- tween two particular whole Z scores. This rough estimate step is designed not only to help you avoid errors (by providing a check for your figuring), but also to help you develop an intuitive sense of how the normal curve works.
• Find the exact percentage using the normal curve table, adding 50% if nec- essary. Look up the Z score in the “Z” column of Table A-1 and find the percent- age in the “% Mean to Z” column or “% in Tail” column next to it. If you want the percentage of scores between the mean and this Z score, or if you want the percentage of scores in the tail for this Z score, the percentage in the table is your final answer. However, sometimes you need to add 50% to the percentage in the table. You need to do this if the Z score is positive and you want the total percent- age below this Z score, or if the Z score is negative and you want the total per- centage above this Z score. However, you don’t need to memorize these rules; it is much easier to make a picture for the problem and reason out whether the per- centage you have from the table is correct as is or if you need to add 50%.
O Check that your exact percentage is within the range of your rough esti- mate from Step 0.
Examples Here are two examples using IQ scores where M = 100 and SD = 16.
Example 1: If a person has an IQ of 125, what percentage of people have higher IQs?
IQ Score: 68 84 95 100 116 132 Z Score: —2 —1 — .31 0 +1 +2
Some Key Ingredients for Inferential Statistics 79
I I I IQ Score: 68 84 100 116 125 132
Z Score: —2 —1 0 +1 +1.56 +2
Figure 3 -10 Distribution of IQ scores showing percentage of scores above an IQ score of 125 (shaded area).
O If you are beginning with a raw score, first change it to a Z score. Using the usual formula, Z = (X — M)/SD, Z = (125 — 100)/16 = +1.56.
• Draw a picture of the normal curve, where the Z score falls on it, and shade in the area for which you are finding the percentage. This is shown in Figure 3-10 (along with the exact percentages figured later).
O Make a rough estimate of the shaded area’s percentage based on the 50 %-34 %-14 % percentages. If the shaded area started at a Z score of 1, it would have 16% above it. If it started at a Z score of 2, it would have only 2% above it. So, with a Z score of 1.56, the number of scores above it has to be somewhere between 16% and 2%.
O Find the exact percentage using the normal curve table, adding 50% if nec- essary. In Table A-1, 1.56 in the “Z” column goes with 5.94 in the “% in Tail” column. Thus, 5.94% of people have IQ scores higher than 125. This is the an- swer to our problem. (There is no need to add 50% to the percentage.)
O Check that your exact percentage is within the range of your rough estimate from Step 0. Our result, 5.94%, is within the 16-to-2% range we estimated.
Example 2: If a person has an IQ of 95, what percentage of people have higher IQs?
0 If you are beginning with a raw score, first change it to a Z score. Using the usual formula, Z = (95 — 100)/16 = —.31. Draw a picture of the normal curve, where the Z score falls on it, and shade in the area for which you are finding the percentage. This is shown in Figure 3-11 (along with the percentages figured later).
Figure 3-11 Distribution of IQ scores showing percentage of scores above an IQ score of 95 (shaded area).
O Make a rough estimate of the shaded area’s percentage based on the 50%– 34 %-14 % percentages. You know that 34% of the scores are between the mean and a Z score of –1. Also, 50% of the curve is above the mean. Thus, the Z score of –.31 has to have between 50% and 84% of scores above it.
O Find the exact percentage using the normal curve table, adding 50% if nec- essary. The table shows that 12.17% of scores are between the mean and a Z score of .31. Thus, the percentage of scores above a Z score of –.31 is the 12.17% between the Z score and the mean plus the 50% above the mean, which is 62.17%.
O Check that your exact percentage is within the range of your rough esti- mate from Step 0. Our result of 62.17% is within the 50-to-84% range we estimated.
Figuring Z Scores and Raw Scores from Percentages Using the Normal Curve Table Going from a percentage to a Z score or raw score is similar to going from a Z score or raw score to a percentage. However, you reverse the procedure when figuring the exact percentage. Also, any necessary changes from a Z score to a raw score are done at the end.
Here are the five steps.
O Draw a picture of the normal curve, and shade in the approximate area for your percentage using the 50 %-34 %-14 % percentages.
• Make a rough estimate of the Z score where the shaded area stops. • Find the exact Z score using the normal curve table (subtracting 50% from
your percentage if necessary before looking up the Z score). Looking at your picture, figure out either the percentage in the shaded tail or the percentage be- tween the mean and where the shading stops. For example, if your percentage is the bottom 35%, then the percentage in the shaded tail is 35%. Figuring the per- centage between the mean and where the shading stops will sometimes involve subtracting 50% from the percentage in the problem. For example, if your per- centage is the top 72%, then the percentage from the mean to where that shading stops is 22% (72% – 50% = 22%).
Once you have the percentage, look up the closest percentage in the appro- priate column of the normal curve table (“% Mean to Z” or “% in Tail”) and find the Z score for that percentage. That Z will be your answer—except it may be negative. The best way to tell if it is positive or negative is by looking at your picture.
O Check that your exact Z score is within the range of your rough estimate from Step 0.
O If you want to find a raw score, change it from the Z score. Use the usual for- mula, X = (Z)(SD) + M.
Examples Here are three examples. Once again, we use IQ for our examples, with M = 100 and SD = 16.
Example 1: What IQ score would a person need to be in the top 5%?
O Draw a picture of the normal curve, and shade in the approximate area for your percentage using the 50 %-34 %-14 % percentages. We wanted the top 5%. Thus, the shading has to begin above (to the right of) 1 SD (there are 16%
IQ Score: 68 84 97 – 92 100 116 132
Z Score: —2 —1—.13 0 +1 +2
Some Key Ingredients for Inferential Statistics 81
I IQ Score: 68 84 100 116 126 . 24 132
Z Score: —2 —1 0 +1 #1.64 +2
Figure 3-12 Finding the Z score and IQ raw score for where the top 5% of scores start.
of scores above 1 SD). However, it cannot start above 2 SD because only 2% of all the scores are above 2 SD. But 5% is a lot closer to 2% than to 16%. Thus, you would start shading a small way to the left of the 2 SD point. This is shown in Figure 3-12.
A Make a rough estimate of the Z score where the shaded area stops. The Z score is between +1 and +2.
0 Find the exact Z score using the normal curve table (subtracting 50% from your percentage if necessary before looking up the Z score). We want the top 5%; so we can use the “% in Tail” column of the normal curve table. Looking in that column, the closest percentage to 5% is 5.05% (or you could use 4.95%). This goes with a Z score of 1.64 in the “Z” column.
O Check that your exact Z score is within the range of your rough estimate from Step A. As we estimated, +1.64 is between +1 and +2 (and closer to 2).
O If you want to find a raw score, change it from the Z score. Using the formula, X = (Z)(SD) + M = (1.64)(16) + 100 = 126.24. In sum, to be in the top 5%, a person would need an IQ of at least 126.24.
Example 2: What IQ score would a person need to be in the top 55%?
O Draw a picture of the normal curve and shade in the approximate area for your percentage using the 50 %-34 %-14 % percentages. You want the top 55%. There are 50% of scores above the mean. So, the shading has to begin below (to the left of) the mean. There are 34% of scores between the mean and 1 SD below the mean; so the score is between the mean and 1 SD below the mean. You would shade the area to the right of that point. This is shown in Figure 3-13.
Figure 3-13 Finding the IQ score for where the top 55% of scores start.
• Make a rough estimate of the Z score where the shaded area stops. The Z score has to be between 0 and –1.
A Find the exact Z score using the normal curve table (subtracting 50% from your percentage if necessary before looking up the Z score). Being in the top 55% means that 5% of people have IQs between this IQ and the mean (that is, 55% – 50% = 5%). In the normal curve table, the closest percentage to 5% in the “% Mean to Z” column is 5.17%, which goes with a Z score of .13. Because you are below the mean, this becomes –.13.
O Check that your exact Z score is within the range of your rough estimate from Step A. As we estimated, –.13 is between 0 and –1.
O If you want to find a raw score, change it from the Z score. Using the usual formula, X = ( –.13)(16) + 100 = 97.92. So, to be in the top 55% on IQ, a per- son needs an IQ score of 97.92 or higher.
Example 3: What range of IQ scores includes the 95% of people in the middle range of IQ scores? This kind of problem—finding the middle percentage—may seem odd. How-
ever, it is actually a very common situation used in procedures you will learn in later chapters.
Think of this kind of problem in terms of finding the scores that go with the upper and lower ends of this percentage. Thus, in this example, you are trying to find the points where the bottom 2.5% ends and the top 2.5% begins (which, out of 100%, leaves the middle 95%).
O Draw a picture of the normal curve, and shade in the approximate area for your percentage using the 50%-34%-14% percentages. Let’s start where the top 2.5% begins. This point has to be higher than 1 SD (16% of scores are higher than 1 SD). However, it cannot start above 2 SD because there are only 2% of scores above 2 SD. But 2.5% is very close to 2%. Thus, the top 2.5% starts just to the left of the 2 SD point. Similarly, the point where the bottom 2.5% comes in is just to the right of –2 SD. The result of all this is that we will shade in two tail areas on the curve: one starting just above –2 SD and the other starting just below +2 SD. This is shown in Figure 3-14.
• Make a rough estimate of the Z score where the shaded area stops. You can see from the picture that the Z score for where the shaded area stops above the mean is just below +2. Similarly, the Z score for where the shaded area stops below the mean is just above –2.
A Find the exact Z score using the normal curve table (subtracting 50% from your percentage if necessary before looking up the Z score). Being in the top 2.5% means that 2.5% of the IQ scores are in the upper tail. In the normal curve table, the closest percentage to 2.5% in the “% in Tail” column is exactly 2.50%,
2 5% I1
IQ Score: I \
68 \ I
84 100 116 /132 -1.96 +1.96
Z Score: —2 —1 0 +1 +2
Figure 3-14 Finding the IQ scores for where the middle 95% of scores begins and ends.
Some Key Ingredients for Inferential Statistics 83
which goes with a Z score of +1.96. The normal curve is symmetrical. Thus, the Z score for the lower tail is —1.96.
0 Check that your exact Z score is within the range of your rough estimate from Step @. As we estimated, +1.96 is between +1 and +2 and is very close to +2, and —1.96 is between —1 and —2 and very close to —2. If you want to find a raw score, change it from the Z score. For the high end, using the usual formula, X = (1.96)(16) + 100 = 131.36. For the low end, X = (-1.96)(16) + 100 = 68.64. In sum, the middle 95% of IQ scores run from 68.64 to 131.36.
How are you doing?
1. Why is the normal curve (or at least a curve that is symmetrical and unimodal)
so common in nature?
2. Without using a normal curve table, about what percentage of scores on a
normal curve are (a) above the mean, (b) between the mean and 1 SD above the mean, (c) between 1 and 2 SDs above the mean, (d) below the mean, (e) between the mean and 1 SD below the mean, and (f) between 1 and 2 SDs below the mean?
3. Without using a normal curve table, about what percentage of scores on a
normal curve are (a) between the mean and 2 SDs above the mean, (b) below 1 SD above the mean, (c) above 2 SDs below the mean?
4. Without using a normal curve table, about what Z score would a person have who is at the start of the top (a) 50%, (b) 16%, (c) 84%, (d) 2%?
5. Using the normal curve table, what percentage of scores are (a) between the
mean and a Z score of 2.14, (b) above 2.14, (c) below 2.14? 6. Using the normal curve table, what Z score would you have if (a) 20% are
above you and (b) 80% are below you?
•178 . :noA moiaq %08 (q) :vg . :not enoqe (e) ‘9 %8E . 86 MOieg (o)
!%n* L :j71,7 enoqe (q) !%8E . 8.17 :171.7 Jo OJOOS Z e pue ueew au’ ueemleg (e)
. 3:%3 (p) :%t78 (o) :%91. (q) (e) %86 :ueew eql moiaq sps Z enoqe (o) :von :ueew enoqe
as i. moieq (q) :%817, :ueew au), anoqe spy 3 pue ueew eul ueemqes (e) ‘E – 0/0 17 l :ueew eql moieq spy Z pun
ueemi.eq (;) !cyovc :ueew eql moiaq as I. pue ueaw eql ueemleq (a) •%og :ueew 8ql Anotaq (p) :0/0 17 1 :ueaw aul anoqe SOS E pue uaamiaq (a) :(yo tE :ueaw aul anoqe as [. pun ueew eql ueempq (q) :ueaw eql anoqv (e) ‘Z
. uo!loalp awes atn u!Ino °woo of spajja bu!seaJoap pue bu!seaJou! aul ISOLL1 .101 Aie>lijun sl ll esneoaq `ewagxe tpee Aienileiai tam `9 1PIDP-u 81-11 Jeau lno aoueleq walla aseul abeJene uo `sniu -Jellews 0.100S eql Neu.i gown 10 awos pue JabJel WOOS aul New gown awos Auew Io uollsu Howoo wopueJ at41 linseJ aul s! 8.100S Jeinoped Aue asneoaq uowwoo sill ‘L
Sample and Population We are going to introduce you to some important ideas by thinking of beans. Sup- pose you are cooking a pot of beans and taste a spoonful to see if they are done. In this example, the pot of beans is a population, the entire set of things of interest. The spoonful is a sample, the part of the population about which you actually have
population entire group of people to which a researcher intends the results of
a study to apply; larger group to which
inferences are made on the basis of the
particular set of people (sample) studied.
sample scores of the particular group
of people studied; usually considered to
be representative of the scores in some
• • • 611E_ n11111 • •
• • • • •
(b) (c) (a)
c4 Chapter 3
Figure 3-15 Populations and samples: (a) The entire pot of beans is the population, and the spoonful is the sample. (b) The entire larger circle is the population, and the circle within it is the sample. (c) The histogram is of the population, and the particular shaded scores make up the sample.
information. This is shown in Figure 3-15a. Figures 3-15b and 3-15c are other ways of showing the relation of a sample to a population.
In psychology research, we typically study samples not of beans but of individ- uals to make inferences about some larger group (a population). A sample might con- sist of the scores of 50 Canadian women who participate in a particular experiment, whereas the population might be intended to be the scores of all Canadian women. In an opinion survey, 1,000 people might be selected from the voting-age population of a particular district and asked for whom they plan to vote. The opinions of these 1,000 people are the sample. The opinions of the larger voting public in that country, to which the pollsters apply their results, is the population (see Figure 3-16).
Why Psychologists Study Samples Instead of Populations If you want to know something about a population, your results would be most accu- rate if you could study the entire population rather than a subgroup from it. However, in most research situations this is not practical. More important, the whole point of research usually is to be able to make generalizations or predictions about events be- yond your reach. We would not call it scientific research if we tested three particular cars to see which gets better gas mileage—unless you hoped to say something about the gas mileage of those models of cars in general. In other words, a researcher might do an experiment on how people store words in short-term memory using 20 students as the participants in the experiment. But the purpose of the experiment is not to find out how these particular 20 students respond to the experimental versus the control condition. Rather, the purpose is to learn something about human memory under these conditions in general.
The strategy in almost all psychology research is to study a sample of individu- als who are believed to be representative of the general population (or of some par- ticular population of interest). More realistically, researchers try to study people who do not differ from the general population in any systematic way that should matter for that topic of research.
The sample is what is studied, and the population is an unknown about which researchers draw conclusions based on the sample. Most of what you learn in the rest of this book is about the important work of drawing conclusions about populations based on information from samples.
All Canadian Women
50 C’anadian Women
Some Key Ingredients for Inferential Statistics 85
Figure 3-16 Additional examples of populations and samples: (a) The population is the scores of all Canadian women, and the sample is the scores of the 50 Canadian women studied. (b) The population is the voting preferences of the entire voting-age population, and the sample is the voting preferences of the 1,000 voting-age people who were surveyed.
Methods of Sampling Usually, the ideal method of picking out a sample to study is called random selec- tion. The researcher starts with a complete list of the population and randomly se- lects some of them to study. An example of random selection is to put each name on a table tennis ball, put all the balls into a big hopper, shake it up, and have a blindfolded person select as many as are needed. (In practice, most researchers use a computer-generated list of random numbers. Just how computers or persons can create a list of truly random numbers is an interesting question in its own right that we examine in Chapter 14, Box 14-1.)
It is important not to confuse truly random selection with what might be called haphazard selection; for example, just taking whoever is available or happens to be first on a list. When using haphazard selection, it is surprisingly easy to pick
random selection method for select- ing a sample that uses truly random pro- cedures (usually meaning that each person in the population has an equal chance of being selected); one procedure is for the researcher to begin with a com- plete list of all the people in the popula- tion and select a group of them to study using a table of random numbers.
86 Chapter 3
accidentally a group of people that is really quite different from the population as a whole. Consider a survey of attitudes about your statistics instructor. Suppose you give your questionnaire only to other students sitting near you in class. Such a sur- vey would be affected by all the things that influence where students choose to sit, some of which have to do with the topic of your study—how much students like the instructor or the class. Thus, asking students who sit near you would likely result in opinions more like your own than a truly random sample would.
Unfortunately, it is often impractical or impossible to study a truly random sam- ple. Much of the time, in fact, studies are conducted with whoever is willing or avail- able to be a research participant. At best, as noted, a researcher tries to study a sample that is not systematically unrepresentative of the population in any known way. For example, suppose a study is about a process that is likely to differ for peo- ple of different age groups. In this situation, the researcher may attempt to include people of all age groups in the study. Alternatively, the researcher would be careful to draw conclusions only about the age group studied.
Methods of sampling is a complex topic that is discussed in detail in research methods textbooks (also see Box 3-2) and in the research methods Web Chapter W1 (Overview of the Logic and Language of Psychology Research) on the Web site for this book http://www.pearsonhighe red. coin/
BOX 3-2 Surveys, Polls, and 1948’s Costly “Free Sample” It is time to make you a more informed reader of polls in the media. Usually the results of properly done public polls are accompanied, somewhere in fine print, by a statement such as, “From a telephone poll of 1,000 American adults taken on June 4 and 5. Sampling error ±3%.” What does a statement like this mean?
The Gallup poll is as good an example as any (Gallup, 1972; see also http://www.gallup.com ), and there is no better place to begin than in 1948, when all three of the major polling organizations—Gallup, Crossley (for Hearst papers), and Roper (for Fortune)—wrongly pre- dicted Thomas Dewey’s victory over Harry Truman for , the U.S. presidency. Yet Gallup’s prediction was based on 50,000 interviews and Roper’s on 15,000. By con- trast, to predict George H. W. Bush’s 1988 victory, Gallup used only 4,089. Since 1952, the pollsters have never used more than 8,144—but with very small error and no outright mistakes. What has changed?
The method used before 1948, and never repeated since, was called “quota sampling.” Interviewers were assigned a fixed number of persons to interview, with strict quotas to fill in all the categories that seemed im- portant, such as residence, sex, age, race, and economic status. Within these specifics, however, they were free to interview whomever they liked. Republicans generally tended to be easier to interview. They were more likely to have telephones and permanent addresses and to live in
better houses and better neighborhoods. In 1948, the election was very close, and the Republican bias pro- duced the embarrassing mistake that changed survey methods forever.
Since 1948, all survey organizations have used what is called a “probability method.” Simple random sam- pling is the purest case of the probability method, but simple random sampling for a survey about a U.S. presi- dential election would require drawing names from a list of all the eligible voters in the nation—a lot of people. Each person selected would have to be found, in diversely scattered locales. So instead, “multistage cluster sam- pling” is used. The United States is divided into seven size-of-community groupings, from large cities to rural open country; these groupings are divided into seven geographic regions (New England, Middle Atlantic, and so on), after which smaller equal-sized groups are zoned, and then city blocks are drawn from the zones, with the probability of selection being proportional to the size of the population or number of dwelling units. Finally, an interviewer is given a randomly selected starting point on the map and is required to follow a given direction, taking households in sequence.
Actually, telephoning is often the favored method for polling today. Phone surveys cost about one-third of door-to-door polls. Since most people now own phones, this method is less biased than in Truman’s time. Phoning
Some Key Ingredients for Inferential Statistics 87
also allows computers to randomly dial phone numbers and, unlike telephone directories, this method calls unlist- ed numbers. However, survey organizations in the United States typically do not call cell phone numbers. Thus, U.S. households that use a cell phone for all calls and do not have a home phone are not usually included in tele- phone opinion polls. Most survey organizations consider the current cell-phone-only rate to be low enough not to cause large biases in poll results (especially since the de- mographic characteristics of individuals without a home phone suggest that they are less likely to vote than indi- viduals who live in households with a home phone). However, anticipated future increases in the cell-phone- only rate will likely make this an important issue for opin- ion polls. Survey organizations will need to consider
additional polling methods, perhaps using the Internet and email.
Whether by telephone or face to face, there will be about 35% nonrespondents after three attempts. This cre- ates yet another bias, dealt with through questions about how much time a person spends at home, so that a slight extra weight can be given to the responses of those reached but usually at home less, to make up for those missed entirely.
Now you know quite a bit about opinion polls, but we have left two important questions unanswered: Why are only about 1,000 included in a poll meant to describe all U.S. adults, and what does the term sampling error mean? For these answers, you must wait for Chapter 5 (Box 5-1).
Statistical Terminology for Samples and Populations The mean, variance, and standard deviation of a population are called population pa- rameters. A population parameter usually is unknown and can be estimated only from what you know about a sample taken from that population. You do not taste all the beans, just the spoonful. “The beans are done” is an inference about the whole pot.
Population parameters are usually shown as Greek letters (e.g., II). (This is a statistical convention with origins tracing back more than 2,000 years to the early Greek mathematicians.) The symbol for the mean of a population is p, the Greek let- ter mu. The symbol for the variance of a population is cr 2 , and the symbol for its stan- dard deviation is cr, the lowercase Greek letter sigma. You won’t see these symbols often, except while learning statistics. This is because, again, researchers seldom know the population parameters.
The mean, variance, and standard deviation you figure for the scores in a sample are called sample statistics. A sample statistic is figured from known information. Sample statistics are what we have been figuring all along and are expressed with the roman letters you learned in Chapter 2: M, SD2 , and SD. The population parameter and sample statistic symbols for the mean, variance, and standard deviation are sum- marized in Table 3-2.
The use of different types of symbols for population parameters (Greek letters) and sample statistics (roman letters) can take some getting used to; so don’t worry if it seems tricky at first. It’s important to know that the statistical concepts you are
Table 3-2 Population Parameters and Sample Statistics
Population Parameter Sample Statistic (Usually Unknown) (Figured from Known Data)
Basis: Scores of entire population Scores of sample only
Standard deviation cr SD
Variance 0′ 2
population parameter actual value of the mean, standard deviation, and so on,
for the population; usually population
parameters are not known, though often
they are estimated based on information
1-1. population mean.
iy2 population variance.
0. population standard deviation.
sample statistics descriptive statistic, such as the mean or standard deviation,
figured from the scores in a group of
learning—such as the mean, variance, and standard deviation—are the same for both a population and a sample. So, for example, you have learned that the standard devi- ation provides a measure of the variability of the scores in a distribution—whether we are talking about a sample or a population. (You will learn in later chapters that the variance and standard deviation are figured in a different way for a population than for a sample, but the concepts do not change). We use different symbols for population parameters and sample statistics to make it clear whether we are referring to a population or a sample. This is important, because some of the formulas you will encounter in later chapters use both sample statistics and population parameters.
Now are you doing?
1. Explain the difference between the population and a sample for a research
2. Why do psychologists usually study samples and not populations?
3. Explain the difference between random sampling and haphazard sampling.
4. Explain the difference between a population parameter and a sample statistic.
5. Give the symbols for the population parameters for (a) the mean and (b) the
6. Why are different symbols (Greek versus roman letters) used for population
parameters and sample statistics?
‘eldwes a JO
uopindod a of waja.i loqwAs a Jaqieqm of se uo!snluoo ou s! weql 4eql. scans
-ue soilsims eldwes pue welawaied uoileindod JOI sioqwAs luwamp 6u!sn •9
•.o :uoReinap pepuels (q) (e) •9
•(eidwes agl ul aidoed eql lo MOOS eql jo ueew eql se
Lions) eldwas Jelnop.led a lnoqe s! op!lels eldwas a :(uo!leindod et.il u! SeJOOS
OUR Ile jo ueew eql se Lions) uoReindod au; lnoqe s! Jwawaied uopindod V ’17
.Apms of weiuwwoo we own JO eiqeuene Ausee ace ounn sienplAlpu! sloops Jell0Je9S0J au’ ‘6undwes puezeqdeu tit •eidwes aw. Ul pepniou! 6u!eq
epueuo lenbe tie seu lenpvqpu! ‘pee leg} os `poui.ew wopuea Aleleldwoo a 6ulsn uopeindod agl buowe woe} uesogo s! °Owes eql ‘6uudwes wopuw ul .e
. uon.eindod eiqua eul Apnis saseo sow ul leoqoaid esneoeq suopindod iou pue seldwes Auensn sispoloqoAsd ‘Z
– paipnis Alienloe sienpinipui jo dnw6 Jellews `Jeinowed eql sl eldwas eql •Arlde
of papuelui we Apnis e jo slinsw yo!Lim of dnal6 wilue eql sl uo!leindod
S. 9MS UV
Probability The purpose of most psychological research is to examine the truth of a theory or the effectiveness of a procedure. But scientific research of any kind can only make that truth or effectiveness seem more or less likely; it cannot give us the luxury of know- ing for certain. Probability is very important in science. In particular, probability is very important in inferential statistics, the methods psychologists use to go from re- sults of research studies to conclusions about theories or applied procedures.
Probability has been studied for centuries by mathematicians and philosophers. Yet even today the topic is full of controversy. Fortunately, however, you need to know only a few key ideas to understand and carry out the inferential statistical pro- cedures you learn in this book. These few key points are not very difficult; indeed, some students find them to be quite intuitive.
Some Key Ingredients for Inferential Statistics 89
Interpretations of Probability In statistics, we usually define probability as the expected relative frequency of a particular outcome. An outcome is the result of an experiment (or just about any sit- uation in which the result is not known in advance, such as a coin coming up heads or it raining tomorrow). Frequency is how many times something happens. The relative frequency is the number of times something happens relative to the number of times it could have happened; that is, relative frequency is the proportion of times something happens. (A coin might come up heads 8 times out of 12 flips, for a rela- tive frequency of 8/12, or 2/3.) Expected relative frequency is what you expect to get in the long run if you repeat the experiment many times. (In the case of a coin, in the long run you would expect to get 1/2 heads). This is called the long-run relative- frequency interpretation of probability.
We also use probability to express how certain we are that a particular thing will happen. This is called the subjective interpretation of probability. Suppose that you say there is a 95% chance that your favorite restaurant will be open tonight. You could be using a kind of relative frequency interpretation. This would imply that if you were to check whether this restaurant was open many times on days like today, you would find it open on 95% of those days. However, what you mean is probably more subjective: on a scale of 0% to 100%, you would rate your confidence that the restaurant is open at 95%. To put it another way, you would feel that a fair bet would have odds based on a 95% chance of the restaurant’s being open.
The interpretation, however, does not affect how probability is figured. We men- tion these interpretations because we want to give you a deeper insight into the mean- ing of the term probability, which is such a prominent concept throughout statistics.
Figuring Probabilities Probabilities are usually figured as the proportion of successful possible outcomes— the number of possible successful outcomes divided by the number of all possible outcomes. That is,
Possible successful outcomes Probability =
All possible outcomes
Consider the probability of getting heads when flipping a coin. There is one possi- ble successful outcome (getting heads) out of two possible outcomes (getting heads or getting tails). This makes a probability of 1/2, or .5. In a throw of a single die, the probability of a 2 (or any other particular side of the six-sided die) is 1/6, or .17. This is because there can be only one successful outcome out of six possible outcomes. The probability of throwing a die and getting a number 3 or lower is 3/6, or .5. There are three possible successful outcomes (a 1, a 2, or a 3) out of six possible outcomes.
probability expected relative frequency
of an outcome; the proportion of suc-
cessful outcomes to all outcomes.
outcome term used in discussing
probability for the result of an experi-
ment (or almost any event, such as a
coin coming up heads or it raining
expected relative frequency number
of successful outcomes divided by the
number of total outcomes you would ex-
pect to get if you repeated an experiment
a large number of times.
long-run relative-frequency interpre- tation of probability understanding
of probability as the proportion of a par-
ticular outcome that you would get if the
experiment were repeated many times.
subjective interpretation of probabil-
ity way of understanding probability as
the degree of one’s certainty that a par-
ticular outcome will occur.
BOX 3-3 Pascal Begins Probability Theory at the Gambling Table, Then Learns to Bet on God
Whereas in England, statistics were used to keep track of death rates and to prove the existence of God (see Chapter 1, Box 1-1), the French and Italians developed statistics at the gaming table. In particular, there was the “problem of points”—the division of the stakes in a game after it has been interrupted. If a certain number of plays were
planned, how much of the stakes should each player walk away with, given the percentage of plays completed?
The problem was discussed at least as early as 1494 by Luca Pacioli, a friend of Leonardo da Vinci. But it was unsolved until 1654, when it was presented to Blaise Pascal by the Chevalier de Mere. Pascal, a French child
90 Chapter 3
prodigy, attended meetings of the most famous adult French mathematicians and at 15 proved an important theorem in geometry. In correspondence with Pierre de Fermat, another famous French mathematician, Pascal solved the problem of points and in so doing began the field of probability theory and the work that would lead to the normal curve. (For more information on the prob- lem of points, including its solution, see http://mathforum. org/isaac/problems/probl.html).
Not long after solving this problem, Pascal became as religiously devout as the English statisticians He was in a runaway horse-drawn coach on a bridge and was saved
from drowning by the traces (the straps between the horses and the carriage) breaking at the last possible mo- ment. He took this as a warning to abandon his mathe- matical work in favor of religious writings and later formulated “Pascal’s wager”: that the value of a game is the value of the prize times the probability of winning it; therefore, even if the probability is low that God exists, we should gamble on the affirmative because the value of the prize is infinite, whereas the value of not believing is only finite worldly pleasure.
Source: Tankard (1984).
Now consider a slightly more complicated example. Suppose a class has 200 people in it, and 30 are seniors. If you were to pick someone from the class at random, the probability of picking a senior would be 30/200, or .15. This is because there are 30 possible successful outcomes (getting a senior) out of 200 possible outcomes.
Steps for Finding Probabilities There are three steps for finding probabilities.
O Determine the number of possible successful outcomes. • Determine the number of all possible outcomes. A Divide the number of possible successful outcomes (Step 0) by the number
of all possible outcomes (Step @).
Let’s apply these steps to the probability of getting a number 3 or lower on a throw of a die.
O Determine the number of possible successful outcomes. There are three out- comes of 3 or lower: 1, 2, or 3.
A Determine the number of all possible outcomes. There are six possible out- comes in the throw of a die: 1, 2, 3, 4, 5, or 6.
A Divide the number of possible successful outcomes (Step 0) by the number of all possible outcomes (Step @). 3/6 = .5.
Range of Probabilities A probability is a proportion, the number of possible successful outcomes to the total number of possible outcomes. A proportion cannot be less than 0 or greater than 1. In terms of percentages, proportions range from 0% to 100%. Something that has no chance of happening has a probability of 0, and something that is certain to happen has a probability of 1. Notice that when the probability of an event is 0, the event is completely impossible; it cannot happen. But when the probability of an event is low, say 5% or even 1%, the event is improbable or unlikely, but not impossible.
Probabilities Expressed as Symbols Probability is usually symbolized by the letter p. The actual probability number is usually given as a decimal, though sometimes fractions or percentages are used. A 50-50 chance is usually written as p = .5, but it could also be written as p = 1/2 or
TIP FOR SUCCESS To change a proportion into a percentage, multiply by 100. So, a proportion of .13 is equivalent to .13 x 100 = 13%. To change a percentage into a proportion, di- vide by 100. So, 3% is a propor- tion of 3/100 = .03.
Some Key Ingredients for Inferential Statistics 91
p = 50%. It is also common to see probability written as being less than some number, using the “less than” sign. For example, p < .05 means “the probability is less than .05.”
Probability Rules As already noted, our discussion only scratches the surface of probability. One of the topics we have not considered is the rules for figuring probabilities for multiple out- comes: for example, what is the chance of flipping a coin twice and both times get- ting heads? These are called probability rules, and they are important in the mathematical foundation of many aspects of statistics. However, you don’t need to know these probability rules to understand what we cover in this book. Also, the rules are rarely used directly in analyzing results of psychology research. Neverthe- less, you occasionally see such procedures referred to in research articles. Thus, the most widely mentioned probability rules are described in the Advanced Topics sec- tion toward the end of this chapter.
Probability, Z Scores, and the Normal Distribution So far, we mainly have discussed probabilities of specific events that might or might not happen. We also can talk about a range of events that might or might not happen. The throw of a die coming out 3 or lower is an example (it includes the range 1, 2, and 3). Another example is the probability of selecting someone on a city street who is between the ages of 30 and 40.
If you think of probability in terms of the proportion of scores, probability fits in well with frequency distributions (see Chapter 1). In the frequency distribution shown in Figure 3-17, 3 of the total of 50 people scored 9 or 10. If you were select- ing people from this group of 50 at random, there would be 3 chances (possible suc- cessful outcomes) out of 50 (all possible outcomes) of selecting one that was 9 or 10. Thus, p = 3/50 = .06.
You can also think of the normal distribution as a probability distribution. With a normal curve, the percentage of scores between any two Z scores is known. The percentage of scores between any two Z scores is the same as the probability of se- lecting a score between those two Z scores. As you saw earlier in this chapter, ap- proximately 34% of scores in a normal curve are between the mean and one standard deviation from the mean. You should therefore not be surprised to learn that the probability of a score being between the mean and a Z score of + 1 is about .34 (that is, p = .34).
Figure 3-17 Frequency distribution (shown as a histogram) of 50 people, in which p = .06 (3/50) of randomly selecting a person with a score of 9 or 10.
In a previous IQ example in the normal curve section of this chapter, we fig- ured that 95% of the scores in a normal curve are between a Z score of +1.96 and a Z score of —1.96 (see Figure 3-14). Thus, if you were to select a score at ran- dom from a distribution that follows a normal curve, the probability of selecting a score between Z scores of +1.96 and —1.96 is .95 (that is, a 95% chance). This is a very high probability. Also, the probability of selecting a score from such a dis- tribution that is either greater than a Z score of +1.96 or less than a Z score of —1.96 is .05 (that is, a 5% chance). This is a very low probability. It helps to think about this visually. If you look back to Figure 3-14 on page 82, the .05 probabil- ity of selecting a score that is either greater than a Z score of +1.96 or less than a Z score of —1.96 is represented by the tail areas in the figure. The probability of a score being in the tail of a normal curve is a topic you will learn more about in the next chapter.
Probability, Samples, and Populations Probability is also relevant to samples and populations. You will learn more about this topic in Chapters 4 and 5, but we will use an example to give you a sense of the role of probability in samples and populations. Imagine you are told that a sample of one person has a score of 4 on a certain measure. However, you do not know whether this person is from a population of women or of men. You are told that a population of women has scores on this measure that are normally distributed with a mean of 10 and a standard deviation of 3. How likely do you think it is that your sample of 1 person comes from this population of women? From your knowledge of the normal curve (see Figure 3-7), you know there are very few scores as low as 4 in a normal distribution that has a mean of 10 and a standard deviation of 3. So there is a very low likelihood that this person comes from the population of women. Now, what if the sample person had a score of 9? In this case, there is a much greater likelihood that this person comes from the population of women because there are many scores of 9 in that population. This kind of reasoning provides an in- troduction to the process of hypothesis testing that is the focus of the remainder of the book.
1. The probability of an event is defined as the expected relative frequency of a particular outcome. Explain what is meant by (a) relative frequency and (b) outcome.
2. List and explain two interpretations of probability. 3. Suppose you have 400 coins in a jar and 40 of them are more than 9 years
old. You then mix up the coins and pull one out. (a) What is the probability of getting one that is more than 9 years old? (b) What is the number of possible successful outcomes? (c) What is the number of all possible outcomes?
4. Suppose people’s scores on a particular personality test are normally distrib- uted with a mean of 50 and a standard deviation of 10. If you were to pick a person completely at random, what is the probability you would pick some- one with a score on this test higher than 60?
5. What is meant by p < .01?
Some Key Ingredients for Inferential Statistics 93
1p Liall ssal Si Alificleqoxl aqi – 9 •(ueaw ayl anoqe
uoRe!nap p.mpuels eU0 mg; Wow ale SWOOS ell} %9 G aou!s) = d s! 09 ueqi.Jaq6N ;sal sly} uo woos e qi!m auoewos )1°4 prom noA Am!qeqoxl au
‘0017 s! sawoolno arussod !in Jaqwnu ayl (o) •017 s! sewoolno injsseoons anssod Jaqwnu ayl (q)
’01: = 0017/017s! pp aleaA 6 uein wow sl leql auo 644.e6 lo AmpeqoAd eui (e)
‘epos %001 e uo gala, uaddeu !dm 6u!wawos leg), aouep!luoo Jo ,sues ,no
S! Ai !qeqo.id leyl s! Am!qeqad uoRelaidiaiu! anipaiqns ayl (q)
Jaqwnu e6,el Gan e peleadal warm uoRenl!s aq14! (uaddeu wool! ual4o moq
of ammaJ) uaddeq of 6umewos padxa am saw!} jo uo!podoid ayl s! Aimq
-egad s! Am!quqad uoplaidialu! Aouenball an!Tela., um-buo! ayl (e) •
-eouenpe Li! umowi Silou uaddeq Minn Teo\ wet” uoprills a ul suaddeu mum—wewpadxe ue
40 visa! et41 Si awoolno uy (q) •peueddeq eAeq woo sawn j.o Jaquunu ay}
of uoRele.! Li! suaddeu 6umewos saw!’ Jaqwnu ayl s! Aouanbali animaki (e) • 1.
Controversies: Is the Normal Curve Really So Normal? and Using Nonrandom Samples Basic though they are, there is considerable controversy about the topics we have in- troduced in this chapter. In this section we consider a major controversy about the normal curve and nonrandom samples.
Is the Normal Curve Really So Normal? We’ve said that real distributions in the world often closely approximate the normal curve. Just how often real distributions closely follow a normal curve turns out to be very important, not just because normal curves make Z scores more useful. As you will learn in later chapters, the main statistical methods psychologists use assume that the samples studied come from populations that follow a normal curve. Re- searchers almost never know the true shape of the population distribution; so if they want to use the usual methods, they have to just assume it is normal, making this as- sumption because most populations are normal. Yet there is a long-standing debate in psychology about just how often populations really are normally distributed. The predominant view has been that, given how psychology measures are developed, a bell-shaped distribution “is almost guaranteed” (Walberg et al., 1984, p. 107). Or, as Hopkins and Glass (1978) put it, measurements in all disciplines are such good ap- proximations to the curve that one might think “God loves the normal curve!”
On the other hand, there has been a persistent line of criticism about whether na- ture really packages itself so neatly. Micceri (1989) showed that many measures commonly used in psychology are not normally distributed “in nature.” His study in- cluded achievement and ability tests (such as the SAT and the GRE) and personality tests (such as the Minnesota Multiphasic Personality Inventory, MMPI). Micceri ex- amined the distributions of scores of 440 psychological and educational measures that had been used on very large samples. All of the measures he examined had been
studied in samples of over 190 individuals, and the majority had samples of over 1,000 (14.3% even had samples of 5,000 to 10,293). Yet large samples were of no help. No measure he studied had a distribution that passed all checks for normality (mostly, Micceri looked for skewness, kurtosis, and “lumpiness”). Few measures had distributions that even came reasonably close to looking like the normal curve. Nor were these variations predictable: “The distributions studied here exhibited al- most every conceivable type of contamination” (p. 162), although some were more common with certain types of tests. Micceri discusses many obvious reasons for this nonnormality, such as ceiling or floor effects (see Chapter 1).
How much has it mattered that the distributions for these measures were so far from normal? According to Micceri, the answer is just not known. And until more is known, the general opinion among psychologists will no doubt remain supportive of traditional statistical methods, with the underlying mathematics based on the as- sumption of normal population distributions.
What is the reason for this nonchalance in the face of findings such as Micceri’s? It turns out that under most conditions in which the standard methods are used, they give results that are reasonably accurate even when the formal requirement of a nor- mal population distribution is not met (e.g., Sawilowsky & Blair, 1992). In this book, we generally adopt this majority position favoring the use of the standard methods in all but the most extreme cases. But you should be aware that a vocal minority of psy- chologists disagrees. Some of the alternative statistical techniques they favor (ones that do not rely on assuming a normal distribution in the population) are presented in Chapter 14. These techniques include the use of nonparametric statistics that do not have assumptions about the shape of the population distribution.
Francis Galton (1889), one of the major pioneers of statistical methods (see Chapter 11, Box 11-1), said of the normal curve, “I know of scarcely anything so apt to impress the imagination…. [It] would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement amidst the wild confusion” (p. 66). Ironically, it may be true that in psychology, at least, it truly reigns in pure and austere isolation, with no even close-to-perfect real- life imitators.
Using Nonrandom Samples Most of the procedures you learn in the rest of this book are based on mathematics that assume the sample studied is a random sample of the population. As we pointed out, however, in most psychology research the samples are nonrandom, including whatev- er individuals are available to participate in the experiment. Most studies are done with college students, volunteers, convenient laboratory animals, and the like.
Some psychologists are concerned about this problem and have suggested that researchers need to use different statistical approaches that make generalizations only to the kinds of people that are actually being used in the study. 3 For example, these psychologists would argue that, if your sample has a particular nonnormal dis- tribution, you should assume that you can generalize only to a population with the same particular nonnormal distribution. We will have more to say about their sug- gested solutions in Chapter 14.
Sociologists, as compared to psychologists, are much more concerned about the representativeness of the groups they study. Studies reported in sociology journals (or in sociologically oriented social psychology journals) are much more likely to use formal methods of random selection and large samples, or at least to address the issue in their articles.
Some Key Ingredients for Inferential Statistics 95
Why are psychologists more comfortable with using nonrandom samples? The main reason is that psychologists are mainly interested in the relationships among variables. If in one population the effect of experimentally changing X leads to a change in Y, this relationship should probably hold in other populations. This rela- tionship should hold even if the actual levels of Y differ from population to popula- tion. Suppose that a researcher conducts an experiment testing the relation of number of exposures to a list of words to number of words remembered. Suppose further that this study is done with undergraduates taking introductory psychology and that the result is that the greater the number of exposures is, the greater is the number of words remembered. The actual number of words remembered from the list might well be different for people other than introductory psychology students. For example, chess masters (who probably have highly developed memories) may recall more words; people who have just been upset may recall fewer words. How- ever, even in these groups, we would expect that the more times someone is exposed to the list, the more words will be remembered. That is, the relation of number of exposures to number of words recalled will probably be about the same in each population.
In sociology, the representativeness of samples is much more important. This is because sociologists are more concerned with the actual mean and variance of a vari- able in a particular society. Thus, a sociologist might be interested in the average at- titude towards older people in the population of a particular country. For this purpose, how sampling is done is extremely important.
Z Scores, Normal Curves, Samples and Populations, and Probabilities in Research Articles You need to understand the topics we covered in this chapter to learn what comes next. However, the topics of this chapter are rarely mentioned directly in research articles (except in articles about methods or statistics). Although Z scores are extremely impor- tant as steps in advanced statistical procedures, they are rarely reported directly in research articles. Sometimes you will see the normal curve mentioned, usually when a researcher is describing the pattern of scores on a particular variable. (We say more about this and give some examples from published articles in Chapter 14, where we consider situations in which the scores do not follow a normal curve.)
Research articles will sometimes briefly mention the method of selecting the sample from the population. For example, Viswanath and colleagues (2006) used data from the U.S. National Cancer Institute (NCI) Health Information National Trends Survey (HINTS) to examine differences in knowledge about cancer across individuals from varying socioeconomic and racial/ethnic groups. They described the method of their study as follows:
The data from this study come from the NCI HINTS, based on a random-digit-dial
(RDD) sample of all working telephones in the United States. One adult was selected
at random within each household using the most recent birthday method in the case of
more than three adults in a given household.. . . Vigorous efforts were made to increase
response rates through advanced letters and $2 incentives to households. (p. 4)
Whenever possible, researchers report the proportion of individuals approached for the study who actually participated in the study. This is called the response rate. Viswanath and colleagues (2006) noted that “The final sample size was 6,369, yield- ing a response rate of 55%” (p. 4).
96 Chapter 3
Researchers sometimes also check whether their sample is similar to the popu- lation as a whole, based on any information they may have about the overall popula- tion. For example, Schuster and colleagues (2001) conducted a national survey of stress reactions of U.S. adults after the September 11, 2001, terrorist attacks. In this study, the researchers compared their sample to 2001 census records and reported that the “sample slightly overrepresented women, non-Hispanic whites, and persons with higher levels of education and income” (p. 1507). Schuster and colleagues went on to note that overrepresentation of these groups “is typical of samples selected by means of random-digit dialing” (pp. 1507-1508).
However, even survey studies typically are not able to use such rigorous meth- ods and have to rely on more haphazard methods of getting their samples. For exam- ple, in a study of relationship distress and partner abuse (Heyman et al., 2001), the researchers describe their method of gathering research participants to interview as follows: “Seventy-four couples of varying levels of relationship adjustment were re- cruited through community newspaper advertisements” (p. 336). In a study of this kind, one cannot very easily recruit a random sample of abusers since there is no list of all abusers to recruit from! This could be done with a very large national random sample of couples, who would then include a random sample of abusers. Indeed, the authors of this study are very aware of the issues. At the end of the article, when dis- cussing “cautions necessary when interpreting our results,” they note that before their conclusions can be taken as definitive “our study must be replicated with a rep- resentative sample” (p. 341).
Finally, probability is rarely discussed directly in research articles, except in rela- tion to statistical significance, a topic we discuss in the next chapter. In almost any ar- ticle you look at, the results section will be strewn with descriptions of various methods having to do with statistical significance, followed by something like “p < .05” or “p < .01.” The p refers to probability, but the probability of what? This is the main topic of our discussion of statistical significance in the next chapter.
Advanced Topic: Probability Rules and Conditional Probabilities This advanced topic section provides additional information on probability, focusing specifically on probability rules and conditional probabilities. Probability rules are pro- cedures for figuring probabilities in more complex situations than we have considered so far. This section considers the two most widely used such rules and also explains the concept of conditional probabilities that is used in advanced discussions of probability.
Addition Rule The addition rule (also called the or rule) is used when there are two or more mutually exclusive outcomes. “Mutually exclusive” means that, if one outcome hap- pens, the others can’t happen. For example, heads or tails on a single coin flip are mutually exclusive because the result has to be one or the other, but can’t be both. With mutually exclusive outcomes, the total probability of getting either outcome is the sum of the individual probabilities. Thus, on a single coin flip, the total chance of getting either heads (which is .5) or tails (also .5) is 1.0 (.5 plus .5). Similarly, on a single throw of a die, the chance of getting either a 3 (1/6) or a 5 (1/6) is 1/3 (1/6 + 1/6). If you are picking a student at random from your university in which 30% are seniors and 25% are juniors, the chance of picking someone who is either a senior or a junior is 55%.
Some Key Ingredients for Inferential Statistics 97
Even though we have not used the term addition rule, we have already used this rule in many of the examples we considered in this chapter. For example, we used this rule when we figured that the chance of getting a 3 or lower on the throw of a die is .5.
Multiplication Rule The multiplication rule (also called the and rule), however, is completely new. You use the multiplication rule to figure the probability of getting both of two (or more) independent outcomes. Independent outcomes are those for which getting one has no effect on getting the other. For example, getting a head or tail on one flip of a coin is an independent outcome from getting a head or tail on a second flip of a coin. The probability of getting both of two independent outcomes is the product of (the result of multiplying) the individual probabilities. For example, on a single coin flip, the chance of getting a head is .5. On a second coin flip, the chance of getting a head (re- gardless of what you got on the first flip) is also .5. Thus, the probability of getting heads on both coin flips is .25 (.5 multiplied by .5). On two throws of a die, the chance of getting a 5 on both throws is 1/36—the probability of getting a 5 on the first throw (1/6) multiplied by the probability of getting a 5 on the second throw (1/6). Similarly, on a multiple choice exam with four possible answers to each item, the chance of getting both of two questions correct just by guessing is 1/16—that is, the chance of getting one question correct just by guessing (1/4) multiplied by the chance of getting the other correct just by guessing (1/4). To take one more example, suppose you have a 20% chance of getting accepted into one graduate school and a 30% chance of getting accepted into another graduate school. Your chance of getting accepted at both graduate schools is just 6% (that is, 20% X 30% = 6%).
Conditional Probabilities There are several other probability rules, some of which are combinations of the ad- dition and multiplication rules. Most of these other rules have to do with what are called conditional probabilities. A conditional probability is the probability of one outcome, assuming some other outcome will happen. That is, the probability of the one outcome depends on—is conditional on—the probability of the other out- come. Thus, suppose that college A has 50% women and college B has 60% women. If you select a person at random, what is the chance of getting a woman? If you know the person is from college A, the probability is 50%. That is, the probability of getting a woman, conditional upon her coming from college A, is 50%.
1. A Z score is the number of standard deviations that a raw score is above or below the mean.
2. The scores on many variables in psychology research approximately follow a bell-shaped, symmetrical, unimodal distribution called the normal curve. Be- cause the shape of this curve follows an exact mathematical formula, there is a specific percentage of scores between any two points on a normal curve.
3. A useful working rule for normal curves is that 50% of the scores are above the mean, 34% are between the mean and 1 standard deviation above the mean, and 14% are between 1 and 2 standard deviations above the mean.
98 Chapter 3
4. A normal curve table gives the percentage of scores between the mean and any particular Z score, as well as the percentage of scores in the tail for any Z score. Using this table, and knowing that the curve is symmetrical and that 50% of the scores are above the mean, you can figure the percentage of scores above or below any particular Z score. You can also use the table to figure the Z score for the point where a particular percentage of scores begins or ends.
5. A sample is an individual or group that is studied—usually as representative of some larger group or population that cannot be studied in its entirety. Ideally, the sample is selected from a population using a strictly random procedure. The mean (M), variance (SD 2), standard deviation (SD), and so forth of a sam- ple are called sample statistics. When of a population, the sample statistics are called population parameters and are symbolized by Greek letters—u, for mean, o.2 for variance, and o for standard deviation.
6. Most psychology researchers consider the probability of an event to be its ex- pected relative frequency. However, some think of probability as the subjective degree of belief that the event will happen. Probability is figured as the propor- tion of successful outcomes to total possible outcomes. It is symbolized by p and has a range from 0 (event is impossible) to 1 (event is certain). The normal curve provides a way to know the probabilities of scores being within particular ranges of values.
7. There are controversies about many of the topics in this chapter. One is about whether normal distributions are truly typical of the populations of scores for the variables we study in psychology. In another controversy, some researchers have questioned the use of standard statistical methods in the typical psychology research situation that does not use strict random sampling.
8. Research articles rarely discuss Z scores, normal curves (except briefly when a variable being studied seems not to follow a normal curve), or probability (except in relation to statistical significance). Procedures of sampling, particu- larly when the study is a survey, are sometimes described, and the representa- tiveness of a sample may also be discussed.
9. ADVANCED TOPIC: In situations where there are two or more mutually exclu- sive outcomes, probabilities are figured following an addition rule, in which the total probability is the sum of the individual probabilities. A multiplication rule (in which probabilities are multiplied together) is followed to figure the proba- bility of getting both of two (or more) independent outcomes. A conditional probability is the probability of one outcome, assuming some other particular outcome will happen.
Z score (p. 68) raw score (p. 69) normal distribution (p. 73) normal curve (p. 73) normal curve table (p.76) population (p. 83) sample (p. 83)
random selection (p. 85) population parameters (p. 87) 11 (population mean) (p. 87)
.2 (population variance) (p. 87) cr (population standard
deviation) (p. 87) sample statistics (p. 87)
probability (p) (p. 89) outcome (p. 89) expected relative frequency (p. 89) long-run relative-frequency interpre-
tation of probability (p. 89) subjective interpretation of
probability (p. 89)
Example Worked-Out Problems
Changing a Raw Score to a Z Score A distribution has a mean of 80 and a standard deviation of 20. Find the Z score for a raw score of 65.
You can change a raw score to a Z score using the formula or the steps. Using the formula: Z = (X — M)/ SD = (65 — 80)/20 = —15/20 = —.75. Using the steps:
0 Figure the deviation score: subtract the mean from the raw score. 65 — 80 = —15. Figure the Z score: divide the deviation score by the standard deviation. —15/20 = .75.
Changing a Z Score to a Raw Score A distribution has a mean of 200 and a standard deviation of 50. A person has a Z score of 1.26. What is the person’s raw score?
You can change a Z score to a raw score using the formula or the steps. Using the formula: X = (Z)(SD) + M = (1.26)(50) + 200 = 63 + 200 = 263. Using the steps:
0 Figure the deviation score: multiply the Z score by the standard deviation. 1.26 X 50 = 63. Figure the raw score: add the mean to the deviation score. 63 + 200 = 263.
Outline for Writing Essays Involving Z Scores 1. If required by the question, explain the mean, variance, and standard deviation
(using the points in the essay outlined in Chapter 2). 2. Describe the basic idea of a Z score as a way of describing where a particular
score fits into an overall group of scores. Specifically, a Z score shows the num- ber of standard deviations a score is above or below the mean.
3. Explain the steps for figuring a Z score from a raw (ordinary) score. 4. Mention that changing raw scores to Z scores puts scores that are for different
variables onto the same scale, which makes it easier to make comparisons be- tween scores on the variables.
Figuring the Percentage Above or Below a Particular Raw Score or Z Score Suppose a test of sensitivity to violence is known to have a mean of 20, a standard deviation of 3, and a normal curve shape. What percentage of people have scores above 24?
0 If you are beginning with a raw score, first change it to a Z score. Using the usual formula, Z = (X — M)/SD, Z = (24 — 20)/3 = 1.33.
Some Key Ingredients for Inferential Statistics 99
14 17 20
23 24 26
+1 +1.33 +2
Figure 3-18 Distribution of sensitivity to violence scores showing the percentage of scores above a score of 24 (shaded area).
Draw a picture of the normal curve, decide where the Z score falls on it, and shade in the area for which you are finding the percentage. This is shown in Figure 3-18.
e Make a rough estimate of the shaded area’s percentage based on the 50 %-34 %-14 % percentages. If the shaded area started at a Z score of 1, it would include 16%. If it started at a Z score of 2, it would include only 2%. So with a Z score of 1.33, it has to be somewhere between 16% and 2%.
0 Find the exact percentage using the normal curve table, adding 50% if nec- essary. In Table A-1 (in the Appendix), 1.33 in the “Z” column goes with 9.18% in the “% in Tail” column. This is the answer to our problem: 9.18% of people have a higher score than 24 on the sensitivity to violence measure. (There is no need to add 50% to the percentage.)
@ Check that your exact percentage is within the range of your rough esti- mate from Step (0. Our result, 9.18%, is within the 16% to 2% range estimated.
Note: If the problem involves Z scores that are all 0, 1, or 2 (or —1 or —2), you can work the problem using the 50%-34%-14% figures and without using the normal curve table (although you should still draw a figure and shade in the appropriate area).
Figuring Z Scores and Raw Scores From Percentages Consider the same situation: A test of sensitivity to violence is known to have a mean of 20, a standard deviation of 3, and a normal curve shape. What is the minimum score a person needs to be in the top 75%?
0 Draw a picture of the normal curve, and shade in the approximate area for your percentage using the 50 %-34 %-14 % percentages. The shading has to begin between the mean and 1 SD below the mean. (There are 50% above the mean and 84% above 1 SD below the mean). This is shown in Figure 3-19. Make a rough estimate of the Z score where the shaded area stops. The Z score has to be between 0 and —1.
e Find the exact Z score using the normal curve table (subtracting 50% from your percentage if necessary before looking up the Z score). Since 50% of people have IQs above the mean, for the top 75% you need to include the 25% below the mean (that is, 75% — 50% = 25%). Looking in the “% Mean to Z”