* Lab 7, Stat 539 * To accompany the second half of lec6.pdf * Two-factor experiments * the code below reproduces the results in the lecture notes, * as well as all the steps required to perform a full analysis * double stars ** after commands are my comments on the output * from the course website download and save the Insecticide Data (insecticide.dta) and Rat Insulin Data (ratInsulin.dta) ******************************************************************************** * insecticide analysis, summary * First we fit the interaction model, but the model assuptions are violated. * We complete the analysis anyway, for illustration. * By transforming the response variable, the model assumptions are satisfied. * We complete the analysis using the inverse of time (1/time) as the response. **************************************** * interaction model where model assumptions are violated clear use insecticide // load the dataset list // print to screen tabulate poison dose, summarize(time) // table of means, standard deviations, and frequencies tabulate poison dose, summarize(time) means // table of means only anova time poison dose poison*dose // fit the two-way anova with interaction ** both poison and dose are significant, but the interaction is not (pval=0.1123) predict yhat, xb // create fitted values, yhat predict residual,r // create residuals, residual robvar(time), by(dose) // check for equal variances by dose robvar(time), by(poison) // check for equal variances by poison ** variance by either dose or poison appear to not be constant -- a violation of the assuption of constant variance * Create a four-in-one plot quietly qnorm residual, name(probplot,replace) nodraw title(Normal Prob. Plot of Residuals) quietly rvfplot, name(respredplot,replace) nodraw title(Residuals vs. Fitted Values) quietly hist residual, freq name(hist,replace) nodraw title(Histogram of the Residuals) generate obs_order = _n quietly twoway connect residual obs_order, name(obs_order,replace) nodraw title(Residuals vs. Order of the Data) drop obs_order graph combine probplot respredplot hist obs_order, title(Residual Plots) ** The residuals are not normal (S-shape in normal probability plot, not straight line) ** The residuals do not have constant variance, but increase in variance with the fitted values ** The observation order reveals two periods of time that have higher variation -- you want to investigate this if your study swilk residual // print the Shapiro-Wilks normality test results ** confirms the residuals are not normal * Cook's distance to examine if any observations have great influence on the regression predict cooksd,cooksd gene obs_order = _n twoway spike cooksd obs_order drop obs_order cooksd ** the set of tall spikes appear when our observation order plot had wild observations -- investigate ** in spite of the violations in the model assuptions, ** let's look at the interaction plots because that's what happens next in the notes. * interaction plot, dose is lines, poison is horizontal axis sort dose poison scatter yhat poison, connect(ascending) mlabel(dose) title(Dose x Poison Profiles) // connect(ascending) = c(L) scatter yhat poison, connect(ascending) mlabel(dose) name(iplot1,replace) nodraw title(Dose x Poison Profiles) ** these lines are not strictly parallel, so there appears to be some interaction ** however, remember the interaction was not significant in the model ** however, our model assumptions were violated, so we can not trust our model conclusions. * interaction plot, dose is lines, poison is horizontal axis sort poison dose scatter yhat dose, connect(ascending) mlabel(poison) title(Poison x Dose Profiles) // connect(ascending) = c(L) scatter yhat dose, connect(ascending) mlabel(poison) name(iplot2,replace) nodraw title(Poison x Dose Profiles) ** these lines are not strictly parallel, so there appears to be some interaction graph combine iplot1 iplot2, title(Interaction Plots) // two plots together * assuming the interaction is not important * pairwise comparisons of main effects for dose test _b[dose[1]]=_b[dose[2]] test _b[dose[1]]=_b[dose[3]] test _b[dose[2]]=_b[dose[3]] ** 1 and 2 are not different, but 3 is different from both 1 and 2 * assuming the interaction is not important * pairwise comparisons of main effects for dose test _b[poison[1]]=_b[poison[2]] test _b[poison[1]]=_b[poison[3]] test _b[poison[1]]=_b[poison[4]] test _b[poison[2]]=_b[poison[3]] test _b[poison[2]]=_b[poison[4]] test _b[poison[3]]=_b[poison[4]] ** the poison main effects are not different from each other (with the interaction in the model) * assuming the interaction is not important * obtain the estimated pairwise differences in the three doses lincom _b[dose[1]]-_b[dose[2]] lincom _b[dose[1]]-_b[dose[3]] lincom _b[dose[2]]-_b[dose[3]] ** mean time for dose 1 is -0.0575 less than dose 2, with a 95% CI of (-.2713767, .1563767) **************************************** * Because the residual plot indicates nonconstant variance, we can transform our response variable. * We try the inverse (1/time), and see whether the assumptions are met (they are), * but the remaining analysis is as above -- except the conclusions apply to the inverse of time. rvfplot * generate the inverse time variable, itime generate itime = 1/time list time itime anova itime poison dose poison*dose // fit the two-way anova with interaction ** both poison and dose are significant, but the interaction is not (pval=0.3867) drop yhat residual predict yhat, xb // create fitted values, yhat predict residual,r // create residuals, residual robvar(itime), by(dose) // check for equal variances by dose robvar(itime), by(poison) // check for equal variances by poison ** variance by either dose or poison appear to be constant -- no violation of the assuption of constant variance * Create a four-in-one plot quietly qnorm residual, name(probplot,replace) nodraw title(Normal Prob. Plot of Residuals) quietly rvfplot, name(respredplot,replace) nodraw title(Residuals vs. Fitted Values) quietly hist residual, freq name(hist,replace) nodraw title(Histogram of the Residuals) generate obs_order = _n quietly twoway connect residual obs_order, name(obs_order,replace) nodraw title(Residuals vs. Order of the Data) drop obs_order graph combine probplot respredplot hist obs_order, title(Residual Plots) ** The residuals are a tiny bit skewed to the right, but are much more normal than before ** The residuals appear to have constant variance ** The observation order reveals no patterns swilk residual // print the Shapiro-Wilks normality test results ** confirms the residuals are normal (pval=0.21480) * Cook's distance to examine if any observations have great influence on the regression predict cooksd,cooksd gene obs_order = _n twoway spike cooksd obs_order drop obs_order cooksd ** no problem with influential observations ** at this point, procede with the analysis. ** look at interaction plots * interaction plot, dose is lines, poison is horizontal axis sort dose poison scatter yhat poison, connect(ascending) mlabel(dose) title(Dose x Poison Profiles) // connect(ascending) = c(L) scatter yhat poison, connect(ascending) mlabel(dose) name(iplot1,replace) nodraw title(Dose x Poison Profiles) ** these lines look pretty parallel, so little evidence of interaction * interaction plot, dose is lines, poison is horizontal axis sort poison dose scatter yhat dose, connect(ascending) mlabel(poison) title(Poison x Dose Profiles) // connect(ascending) = c(L) scatter yhat dose, connect(ascending) mlabel(poison) name(iplot2,replace) nodraw title(Poison x Dose Profiles) ** these lines look pretty parallel, so little evidence of interaction graph combine iplot1 iplot2, title(Interaction Plots) // two plots together ** the interaction was not significant in the anova either, ** so drop the interaction term and refit the main-effects model and do it all over again! **************************************** * dropping the interaction term and fitting the main-effects model anova itime poison dose // fit the two-way anova with interaction ** both poison and dose are significant drop yhat residual predict yhat, xb // create fitted values, yhat predict residual,r // create residuals, residual robvar(itime), by(dose) // check for equal variances by dose robvar(itime), by(poison) // check for equal variances by poison ** variance by either dose or poison appear to be constant -- no violation of the assuption of constant variance * Create a four-in-one plot quietly qnorm residual, name(probplot,replace) nodraw title(Normal Prob. Plot of Residuals) quietly rvfplot, name(respredplot,replace) nodraw title(Residuals vs. Fitted Values) quietly hist residual, freq name(hist,replace) nodraw title(Histogram of the Residuals) generate obs_order = _n quietly twoway connect residual obs_order, name(obs_order,replace) nodraw title(Residuals vs. Order of the Data) drop obs_order graph combine probplot respredplot hist obs_order, title(Residual Plots) ** The residuals look great, they are normal ** The residuals appear to have constant variance ** The observation order reveals no patterns swilk residual // print the Shapiro-Wilks normality test results ** confirms the residuals are normal (pval=0.54512) * Cook's distance to examine if any observations have great influence on the regression predict cooksd,cooksd gene obs_order = _n twoway spike cooksd obs_order drop obs_order cooksd ** no problem with influential observations ** at this point, procede with the analysis. ** no interaction plots this time, because we have no term for interaction * pairwise comparisons of main effects for dose test _b[dose[1]]=_b[dose[2]] test _b[dose[1]]=_b[dose[3]] test _b[dose[2]]=_b[dose[3]] ** each dose is different from the other doses * pairwise comparisons of main effects for dose test _b[poison[1]]=_b[poison[2]] test _b[poison[1]]=_b[poison[3]] test _b[poison[1]]=_b[poison[4]] test _b[poison[2]]=_b[poison[3]] test _b[poison[2]]=_b[poison[4]] test _b[poison[3]]=_b[poison[4]] ** only poisons 2 and 4 are not different, all other pairwise differences are significant * obtain the estimated pairwise differences in the three doses lincom _b[dose[1]]-_b[dose[2]] lincom _b[dose[1]]-_b[dose[3]] lincom _b[dose[2]]-_b[dose[3]] ** mean INVERSE time for dose 1 is -.4686412 less than dose 2, with a 95% CI of (-.8204964, -.1167861) * obtain the estimated pairwise differences in the four poisons lincom _b[poison[1]]-_b[poison[2]] lincom _b[poison[1]]-_b[poison[3]] lincom _b[poison[1]]-_b[poison[4]] lincom _b[poison[2]]-_b[poison[3]] lincom _b[poison[2]]-_b[poison[4]] lincom _b[poison[3]]-_b[poison[4]] ** mean INVERSE time for poison 1 is 1.657402 more than poison 2, with a 95% CI of (1.251115, 2.06369) ** That's everything. ** You can get your estimated times by inverting all of your estimates. ** For example, the mean time for poison 1 is 1/1.657402 more than poison 2, with a 95% CI of (1/2.06369, 1/1.251115) ******************************************************************************** ******************************************************************************** ******************************************************************************** * When performing an unbalanced anova (different frequencies in each cell or sets of treatments) * the analysis is exactly the same in stata. * Provided you estimate the difference between treatments as we do above using 'lincom', * and don't try to do it from the table averages (from tabulate), everything is correct. * In class, we will use the ideas and code above to reproduce the second example (ratInsulin) * in the lecture notes. This should prepare you well for the homework. * END