1. 1) Show that Total SS = Error SS + Regression SS. You can find the equations for SST, SSR, SSE on page 9 of the Chapter 2 slide.

2) Show that the average residual from a simple linear regression ?ei / n = 0. (ei = yi – yhati)

“UNLifeExpectancy.csv” contains health care systems from n = 133 countries throughout the world

Dependent Variable

LIFEEXP: the life expectancy at birth in years

Independent Variables

REGION: Categorical variable for region of the world

COUNTRY: The name of the country

ILLITERATE: Adult illiteracy rate, % aged 15 and older

POP: 2005 population, in millions

FERTILITY: Total fertility rate, births per woman

PRIVATEHEALTH: 2004 Private expenditure on health, % of GDP

PUBLICEDUCATION: Public expenditure on education, % of GDP

HEALTHEXPEND: 2004 Health expenditure per capita, PPP in USD

BIRTHATTEND: Births attended by skilled health personnel (%)

PHYSICIAN: Physicians per 100,000 people

2. Answer the following questions

1) Create a histogram and a qqplot (normality plot) of LIFEEXP.

2) Create a histogram and a qqplot (normality plot) of ln(LIFEEXP).

3) Calculate the correlation between LIFEEXP and FERTILITY. Do these variables appear highly correlated?

4) Create a scatter plot: LIFEEXP vs FERTILITY. Is there any outliers or high leverage points? Comment on the plot.

5) Fit a simple linear regression modeling using LIFEEXP as the outcome variable and FERTILITY as the explanatory variable. Summarize the intercept and the coefficient.

6) What is the coefficient of determination, R2, and the interpretation?

7) Conduct hypothesis testing for following two cases. Use alpha = 0.05. Report the test statistic, p-value, and the conclusion.

i) Test H0: b1 = 0 vs Ha: b1 ? 0

ii) Test H0: b1 = -5 vs Ha: b1 ? -5

8) Provide 95% confidence interval for b1

9) Predict value when FERTILITY = 1.7.

10) Calculate the residual ei when FERTILITY = 1.7.

11) Construct 95% prediction interval when FERTILITY = 1.7.

3. We will continue to work with UNLifeExpectancy.csv

1) Fit a basic linear regression modeling using LIFEEXP as the outcome variable andFERTILITY, PRIVATEHEALTH, PUBLICEDUCATION, HEALTHEXPEND, BIRTHATTEND, PHYSICIAN as explanatory variables. Summarize the estimate of the intercept and slopes for the regression model.

2) What is the coefficient of determination R2 and its adjusted version Ra2

3) Interpret the value of the coefficient (FERTILITY)

4) Interpret the value of the intercept b0

5) Find ANOVA table. (SSR in one row) What is the p-value? What is the null and alternative of ANOVA F-test? State the conclusion.

6) Find the covariance matrix: s2(X’X)-1

7) Fit a more parsimonious model, using LIFEEXP as the dependent variable andFERTILITY, HEALTHEXPEND, and BIRTHATTEND as explanatory variables. Summarize the estimate of the intercept and slopes for the regression model.

8) What is the coefficient of determination R2 and its adjusted version Ra2

9) Compare the result #8 to 2). What do you find?