regression surface

Nonconstant Error Variance In: Regression Diagnostics

By: John Fox Pub. Date: 2011 Access Date: October 16, 2019 Publishing Company: SAGE Publications, Inc. City: Thousand Oaks Print ISBN: 9780803939714 Online ISBN: 9781412985604 DOI: https://dx.doi.org/10.4135/9781412985604 Print pages: 49-53

© 1991 SAGE Publications, Inc. All Rights Reserved. This PDF has been generated from SAGE Research Methods. Please note that the pagination of the online version will vary from the pagination of the print book.

https://dx.doi.org/10.4135/9781412985604
Nonconstant Error Variance

Detecting Nonconstant Error Variance

One of the assumptions of the regression model is that the variation of the dependent variable around the

regression surface—the error variance—is everywhere the same: V(?) = V(y|x1, …, xk) = σ2. Nonconstant

error variance is often termed “heteroscedasticity.” Although the least-squares estimator is unbiased and consistent even when the error variance is not constant, its efficiency is impaired and the usual formulas for coefficient standard errors are inaccurate, the degree of the problem depending on the degree to which error variances differ. I describe graphical methods for detecting nonconstant error variance in this chapter. Tests for heteroscedasticity are discussed in Chapter 8 on discrete data and in Chapter 9 on maximum-likelihood methods.

Because the regression surface is k dimensional, and imbedded in a space of k + 1 dimensions, it is generally impractical to assess the assumption of constant error variance by direct graphical examination of the data for k larger than 1 or 2. Nevertheless, it is common for error variance to increase as the expectation of y grows larger, or there may be a systematic relationship between error variance and a particular x. The former situation can be detected by plotting residuals against fitted values, and the latter by plotting residuals against each x. It is worth noting that plotting residuals against y (as opposed to ŷ) is generally unsatisfactory. The plot will be tilted: There is a built-in linear correlation between e and y, because y = ŷ + e; in fact, the correlation between y and e is r(y, e) = . In contrast, the least-squares fit insures that r(ŷ, e) = 0, producing a plot that is much easier to examine for evidence of nonconstant spread.

Because the least-squares residuals have unequal variances even when the errors have constant variance, I prefer plotting the studentized residuals against fitted values. Finally, a pattern of changing spread is often

more easily discerned in a plot of |ti| or ti2 versus ŷ, perhaps augmented by a lowess scatterplot smooth (see

Appendix A6.1); smoothing this plot is particularly useful when the sample size is very large or the distribution of ŷ is very uneven. An example appears in Figure 6.2.

An illustrative plot of studentized residuals against fitted values is shown in Figure 6.1a. In Figure 6.1b, studentized residuals are plotted against log2(3 + ŷ); by correcting the positive skew in these ŷ values,

the second plot makes it easier to discern the tendency of the residual spread to increase with ŷ. The data for this example are drawn from work by Ornstein (1976) on interlocking directorates among the 248 largest Canadian firms. The number of interlocking directorate and executive positions maintained by the corporations is regressed on corporate assets (square-root transformed to make the relationship linear; see Chapter 7); 9 dummy variables representing 10 industrial classes, with heavy manufacturing serving as the baseline category; and 3 dummy variables representing 4 nations of control, with Canada as the baseline category. The results of the regression are given in the left-hand columns of Table 6.1; the results shown on

SAGE 1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 2 of 6 Nonconstant Error Variance

the right of the table are discussed below. Note that part of the tendency for the residual scatter to increase with ŷ is due to the lower bound of 0 for y: Because e = y – ŷ, the smallest possible residual corresponding to a particular ŷ value is e = 0 – ŷ = -ŷ.

Figure 6.1. Plots of studentized residuals versus fitted values for Ornstein’s interlocking-directorate

regression. (a) t versus ŷ. (b) t versus log2(3 + ŷ). The log transformation serves to reduce the skew

of the fitted values, making the increasing residual spread easier to discern.

Correcting Nonconstant Error Variance

Transformations frequently serve to correct a tendency of the error variance to increase or, less commonly, decrease with the magnitude of the dependent variable: Move y down the ladder of powers and roots if the residual scatter broadens with the fitted values; move y up the ladder if the residual scatter narrows. An effective transformation may be selected by trial and error (but see Chapter 9 for an analytic method of selecting a variance-stabilizing transformation).

SAGE 1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 3 of 6 Nonconstant Error Variance

TABLE 6.1 Regression of Number of Interlocking Directorate and Executive Positions Maintained by

248 Major Canadian Corporations on Corporate Assets, Sector, and Nation of Control

If the error variance is proportional to a particular x, or if the pattern of V(?i) is otherwise known up to

a constant of proportionality, then an alternative to transformation of y is weighted-least-squares (WLS) estimation (see Appendix A6.2). It also is possible to correct the estimated standard errors of the least- squares coefficients for heteroscedasticity: A method proposed by White (1980) is described in Appendix A6.3. An advantage of this approach is that knowledge of the pattern of nonconstant error variance (e.g., increased variance with the level of y or with an x) is not required. If, however, the heteroscedasticity problem is severe, and the corrected standard errors therefore are substantially larger than those produced by the usual formula, then discovering the pattern of nonconstant variance and correcting for it—by a transformation or WLS estimation—offers the possibility of more efficient estimation. In any event, unequal error variances are worth correcting only when the problem is extreme—where, for example, the spread of the errors varies by a factor of about three or more (i.e., the error variance varies by a factor of about 10 or more; see Appendix A6.4).

SAGE 1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 4 of 6 Nonconstant Error Variance

Figure 6.2. Plot of absolute studentized residuals versus fitted values for the square-root transformed

interlocking-directorate data. The line on the graph is a lowess smooth using f = 0.5 and 2 robustness

iterations.

For Ornstein’s interlocking-directorate regression, for example, a square-root transformation appears to correct the dependence of the residual spread on the level of the dependent variable. A plot of |ti| versus ŷi for the transformed data is given in Figure 6.2, and the regression results appear in the right-hand columns of Table 6.1. The lowess smooth in Figure 6.2 (see Appendix A6.1) shows little change in the average absolute studentized residuals as the fitted values increase.

The coefficients for the original and transformed regressions in Table 6.1 cannot be compared directly, because the scale of the dependent variable has been altered. It is clear, however, that assets retain their positive effect and that the nations of control maintain their ranks. The sectoral ranks are also similar across the two analyses, although not identical. In comparing the two sets of results, recall that the baseline categories for the sets of dummy regressors—Canada and heavy manufacturing—implicitly have coefficients of zero.

Transforming y also changes the shape of the error distribution and alters the shape of the regression of y on the xs. It is frequently the case that producing constant residual variation through a transformation also makes the distribution of the residuals more symmetric. At times, eliminating nonconstant spread also makes the relationship of y to the xs more nearly linear (see the next chapter). These by-products are not necessary consequences of correcting nonconstant error variance, however, and it is particularly important to check data for non-linearity following a transformation of y. Of course, because there generally is no reason to suppose that the regression is linear prior to transforming y, we should check for nonlinearity even when y is not transformed.

Finally, nonconstant residual spread sometimes is evidence for the omission of important effects from the model. Suppose, for example, that there is an omitted categorical independent variable, such as regional location, that interacts with assets in affecting interlocks; in particular, the assets slope, although positive in every region, is steeper in some regions than in others. Then the omission of region and its interactions with

SAGE 1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 5 of 6 Nonconstant Error Variance

assets could produce a fan-shaped residual plot even if the errors from the correct model have constant spread. The detection of this type of specification error therefore requires substantive insight into the process generating the data and cannot rely on diagnostics alone.

http://dx.doi.org/10.4135/9781412985604.n6

SAGE 1991 SAGE Publications, Ltd. All Rights Reserved.

SAGE Research Methods

Page 6 of 6 Nonconstant Error Variance

http://dx.doi.org/10.4135/9781412985604.n6
Nonconstant Error Variance
In: Regression Diagnostics