Assignment Tasks Question 1 (a) Use SAS to study the distribution of charges), the log trans- formation (log_charges), the square-root transformation (sqrt_charges), where the new variables can be generated in SAS by log_charges= log(charges); sqrt_charges= sqrt(charges); Obtain measures of location, dispersion, skewness and kurtosis. Obtain a boxplot, histogram and a quantile-quantile plot. Also carry out Normal Goodness-of-fit tests. What are the key features of these distributions? Which transformation results in the most normal distribution? (b) Now use SAS to obtain boxplots of charges by smoker and weight_range. Also obtain boxplots of charges by smoker and region. What do the boxplots suggest about the pattern and trend, if any, of charges. Note: You may find boxplot with both group and category options useful. Question 2 (a) Obtain a Pearson correlation matrix relating variables sqrt_charges, log_charges, age, and bmi. Also obtain a scatterplot ma- trix of those variables. Discuss the relationships. Explain why sqrt_charge is more suitable than log_charges as the response variable for linear regres- sion. (b) Fit a simple regression model relating sqrt_charges to age, with sqrt_charges as the dependent variable. Discuss the fitted relationship and the goodness of fit. Examine residual plots and influence diagnostics and comment on the residual behaviour. (c) Build a multiple regression model for non-smokers, relating sqrt_charges to the other variables. In building your model consider as many potential explanatory variables as possible (you may need to define additional dummy variables). You can use stepwise or Rsquare selection to help you find the most parsimonious (simplest) model with the highest R-square. Be sure to check for collinearity. Summarise how your final model was obtained, including rationale for any modelling decisions you have made, and indicate why that final was considered the ‘best’. Report and interpret your final model in detail, including a discussion of model diagnostics. Are there any observations that may require further in- spection due to their influence on the model?

Attachments:

assignment1.docx