In contrast with linear regression models, in nonlinear regression models one cannot interpret the shape of the regression line from the summary. Therefore, visualization is an important tool for interpretating nonlinear regression models. Here a selection of visualization functions is summarized. Detailed information on the functions and examples are found in the help files of each function.
Before summarizing the function, we would like to highlight different concepts
Loading the data:
library(itsadug)
library(mgcv)
data(simdat)
# add missing values to simdat:
simdat[sample(nrow(simdat), 15),]$Y <- NA
We start with a somewhat weird model, for illustration purposes. We
would like to include some factorial predictors, random smooths, and a
nonlinear interaction. Therefore, the continuous predictor
Condition
is transformed to a categorical predictor:
simdat$Condition <- as.factor(simdat$Condition)
# Note: this is really not the best fitting model for the data:
m1 <- bam(Y ~ Group * Condition + s(Time) + s(Trial) + ti(Time, Trial) + s(Time, Subject, bs='fs', m=1, k=5), data=simdat, discrete=TRUE)
The summary shows two parts: the first summary are the parametric terms, i.e., the ‘linear’ part, and the second summary presents the nonlinear smooths.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## Y ~ Group * Condition + s(Time) + s(Trial) + ti(Time, Trial) +
## s(Time, Subject, bs = "fs", m = 1, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.431925 0.056431 7.654 1.97e-14 ***
## GroupAdults 2.285744 0.079802 28.643 < 2e-16 ***
## Condition0 -0.420383 0.079801 -5.268 1.38e-07 ***
## Condition1 -0.008202 0.079800 -0.103 0.918
## Condition2 1.191566 0.079801 14.932 < 2e-16 ***
## Condition3 2.931693 0.079801 36.738 < 2e-16 ***
## Condition4 5.581639 0.079801 69.944 < 2e-16 ***
## GroupAdults:Condition0 -0.093919 0.112856 -0.832 0.405
## GroupAdults:Condition1 0.050621 0.112856 0.449 0.654
## GroupAdults:Condition2 0.641225 0.112856 5.682 1.34e-08 ***
## GroupAdults:Condition3 1.755812 0.112856 15.558 < 2e-16 ***
## GroupAdults:Condition4 3.200884 0.112856 28.363 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Time) 8.554 8.665 371.4 <2e-16 ***
## s(Trial) 8.820 8.990 1433.0 <2e-16 ***
## ti(Time,Trial) 12.818 14.556 2879.5 <2e-16 ***
## s(Time,Subject) 161.915 168.000 4829.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.966 Deviance explained = 96.6%
## fREML = 1.4607e+05 Scale est. = 2.7427 n = 75585
Note that although the model looks good
(deviance explained of 96.62
%!), there are some issues with
this model, which we won’t address here.
gamtabs
With the function gamtabs
the summary of the model can
be converted to a Latex of HTML table, easy to integrate in a knitr or R
Markdown report (see here):
An important distinction in regression model is the distinction between partial effects and summed effects:
partial effects are the isolated effects of one
particular predictor or interaction. Only one line of the summary… If
you use the function plot
for visualizing the model terms,
you will see the partial effects. Try
plot(m1, all.terms=TRUE, rug=FALSE)
.
summed effects are the predicted response measures for a
certain situation or condition. So all the partial effects that apply to
that situation are summed up, including the intercept. If not for all
predictors a value is given, the functions automatically select a value
for unspecified predictors. More information about these settings are
provided in the summary in the console. If you do not see this summary,
use the command infoMessages("on")
.
plot_smooth
The function plot_smooth
visualizes 1-dimensional
nonlinear effects as summed effects. The summary of the first plot below
shows that this plot is specifically plotting the summed effect of a
certain Subject and a certain Trial for the “Adults” age group.
## Summary:
## * Group : factor; set to the value(s): Adults.
## * Condition : factor; set to the value(s): 4.
## * Time : numeric predictor; with 30 values ranging from 0.000000 to 2000.000000.
## * Trial : numeric predictor; set to the value(s): 0.
## * Subject : factor; set to the value(s): c18. (Might be canceled as random effect, check below.)
## * NOTE : The following random effects columns are canceled: s(Time,Subject)
##
Generally, we would like to generalize over the random effects
predictors, such as Subject
. We can exclude the random
effects from the summed effects plots by the argument
rm.ranef=TRUE
.
Note the labels at the right side of the plot indicate the chosen settings.
par(mfrow=c(1,2), cex=1.1)
# including random effects:
plot_smooth(m1, view="Time", cond=list(Group="Adults"))
# excluding random effects:
plot_smooth(m1, view="Time", cond=list(Group="Adults"), rm.ranef=TRUE)
The plots below show two different ways for combining the two age groups in one plot:
par(mfrow=c(1,2), cex=1.1)
# version 1:
plot_smooth(m1, view="Time", plot_all="Group", rm.ranef=TRUE, ylim=c(-15,15))
# version 2:
plot_smooth(m1, view="Time", cond=list(Group="Adults"), rm.ranef=TRUE, rug=FALSE, col="red", ylim=c(-15,15))
plot_smooth(m1, view="Time", cond=list(Group="Children"), rm.ranef=TRUE, rug=FALSE, col="cyan", add=TRUE)
fvisgam
Two dimensional surfaces are used for visualizing higher order
interactions, such as Time
xTrial
. Again, the
argument rm.ranef=TRUE
is used for canceling the random
effects.
Note the difference in scale on the color legend:
par(mfrow=c(1,2), cex=1.1)
# including random effects:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Adults"))
## Warning in gradientLegend(c(min.z, max.z), n.seg = 3, pos = 0.875, color = pal,
## : Increase right margin to fit labels or decrease the number of decimals, see
## help(gradientLegend).
# excluding random effects:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Adults"), rm.ranef=TRUE)
## Warning in gradientLegend(c(min.z, max.z), n.seg = 3, pos = 0.875, color = pal,
## : Increase right margin to fit labels or decrease the number of decimals, see
## help(gradientLegend).
Setting the color range to the same scale is important when comparing surfaces:
par(mfrow=c(1,2), cex=1.1)
# group children:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Children"), rm.ranef=TRUE, zlim=c(-15,15), main="Children")
## Warning in gradientLegend(c(min.z, max.z), n.seg = 3, pos = 0.875, color = pal,
## : Increase right margin to fit labels or decrease the number of decimals, see
## help(gradientLegend).
# group Adults:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Adults"), rm.ranef=TRUE, zlim=c(-15,15), main="Adults")
## Warning in gradientLegend(c(min.z, max.z), n.seg = 3, pos = 0.875, color = pal,
## : Increase right margin to fit labels or decrease the number of decimals, see
## help(gradientLegend).
plot_parametric
The function plot parametric plots the summed effects of the parametric terms.
par(mfrow=c(1,3), cex=1.1)
# all:
plot_parametric(m1, pred=list(Condition=levels(simdat$Condition),Group=c('Adults', 'Children')), rm.ranef=TRUE)
# children:
plot_parametric(m1, pred=list(Condition=levels(simdat$Condition)), cond=list(Group="Children"), rm.ranef=TRUE, main="Children")
# adults:
plot_parametric(m1, pred=list(Condition=levels(simdat$Condition)), cond=list(Group="Adults"), rm.ranef=TRUE, main="Adults")
plot
(mgcv)par(mfrow=c(1,3), cex=1.1)
# first smooth of time:
plot(m1, select=1, shade=TRUE, scale=0, ylim=c(-15,15))
abline(h=0)
# second smooth, of trial:
plot(m1, select=2, shade=TRUE, scale=0, ylim=c(-15,15))
abline(h=0)
# third smooth, nonlienar interaction between Time and Trial:
plot(m1, select=3, scale=0, rug=FALSE)
pvisgam
The function pvisgam
provides a more colorful version of
the contour plot that visualizes the partial interaction
between Time
and Trial
. The function can also
plot two dimensional representations of higher order interactions.
Below a comparison:
par(mfrow=c(1,3), cex=1.1)
# Partial effect of nonlinear interaction between Time and Trial:
plot(m1, select=3, scale=0, rug=FALSE)
# Partial effect of nonlinear interaction between Time and Trial:
pvisgam(m1, select=3, view=c("Time", "Trial"), main="pvisgam", dec=1)
# Summed effect of nonlinear interaction between Time and Trial:
fvisgam(m1, view=c("Time", "Trial"), rm.ranef=TRUE, main="fvisgam", dec=1)
inspect_random
The function plot
plots also random smooths, just as the
function inspect_random
:
The function inspect_random
can also plot selective
random effects, or apply a function over the random effects:
par(mfrow=c(1,2), cex=1.1)
# PLOT 1: selection
# ... the adults in black:
inspect_random(m1, select=4, cond=list(Subject=unique(simdat[simdat$Group=="Adults", "Subject"])), col=1, ylim=c(-15, 15), xpd=TRUE)
# ... add children in red:
inspect_random(m1, select=4, cond=list(Subject=unique(simdat[simdat$Group=="Children", "Subject"])), col=2, add=TRUE, xpd=TRUE)
# PLOT 2: averages
# ... of the adults:
inspect_random(m1, select=4, fun=mean, ylim=c(-15,15), cond=list(Subject=unique(simdat[simdat$Group=="Adults", "Subject"])), lwd=2)
# ... of the children:
inspect_random(m1, select=4, fun=mean, cond=list(Subject=unique(simdat[simdat$Group=="Children", "Subject"])), lwd=2, col=2, add=TRUE)
Note that this last plot suggest that children and
adults not oly differ overall, but also differ in their pattern over
time. So it may be preferred to include the grouping predictor
Group
in the nonlinear ‘fixed’ (non-random) effects, to
capture this difference in the fixed effects terms of the model.
plot_data
Inspect the data on which the model is based:
plot_modelfit
Inspect the model fit with plot_modelfit
:
plot_topo
The function plot_topo
is a variant on the functions
pvisgam
and fvisgam
for plotting EEG
topographies. See the help file of plot_topo
for more
examples.
data(eeg)
# simple GAMM model:
m1 <- gam(Ampl ~ te(Time, X, Y, k=c(10,5,5),
d=c(1,2)), data=eeg)
# get electrode postions:
electrodes <- eeg[,c('X','Y','Electrode')]
electrodes <- as.list( electrodes[!duplicated(electrodes),] )
# topo plot, by default uses fvisgam
# and automatically selects a timestamp (270ms):
plot_topo(m1, view=c("X", "Y"), el.pos=electrodes, dec=2)