Design
Linear model plotting functions
We need to define some simple functions for plotting linear models for Chapter 4. The existing functions are not a good fit:
The
seabornfunctionslmplot,regplot, andresidplotare great for simple linear regression, but don’t generalize to multiple predictors.The
statsmodelsfunctionsplot_partregressdoes the right thing, but has a complicated API using argument names likeexog,endog, etc. which will be complicated to explain.
Here are the requirements:
Section 4.1: Simple linear regression
plot simple linear regression model (scatterplot and best-fitting line)
plot simple linear regression residuals with LOWESS
qqplot of residuals
plot simple linear regression model with mean CI
plot simple linear regression model with prediction CI
Section 4.2: Multiple linear regression
partial regression plots panel for the 3x predictors
residual plots panel for the 3x predictors (no LOWESS)
Section 4.3: Interpreting linear models
residual plots panel with LOWESS (vs. 3x predictors and vs. fitted values)
Section 4.4: Regression with categorical predictors
partial regression with groupby categorical variable
Section 4.5: Causal effects and confounders
simple linear regression plot
partial regression plot
Section 4.6: Generalized linear models
logistic regression vis
Poisson regression vis
To get these, we’ll define the following core functions:
plot_reg(lmres): plot a simple linear regression modelplot_resid(lmres, pred=None, lowess=False): scatter plot of residuals versus the predictorpred. Ifpredis None, we plot the residuals versus the fitted values of the outcome variable. The plot contains shows a dashed horizontal line aty=0and an optional LOWESS curve.plot_partreg(lmres, pred): partial regression plot that uses regression to “subtract” all other variables from both the outcome variable (plot residuals ofoutcome ~ 1+otherson y-axis) and the predictorpred(plot residualspred ~ 1+otherson the x-axis).
Optional methods (for completeness):
plot_projreg(lmres, pred, others="mean"): use the replace-predictor-by-their-mean to plotplot_scaleloc(lmres, lowess=True): scale-location plot
Convenience methods (for generating panels with one command):
plot_partregs(lmres): callsplot_partregfor each predictorplot_resids(lmres): callsplot_residto plot residuals vs. each predictor and vs. fitted values
Figures-only plots:
plot_residuals: plot residuals between the points (x,y) and the line y = b0 + b1*x.plot_residuals2:plot_lm_ttest:plot_lm_anova: