---
title: "Visual inspection of GAMM models"
author: "Jacolien van Rij"
date: "15 March 2016"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Visual inspection of GAMM models}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(itsadug)
```
In contrast with *linear* regression models, in *nonlinear* regression models one cannot interpret the shape of the regression line from the summary. Therefore, visualization is an important tool for interpretating nonlinear regression models. Here a selection of visualization functions is summarized. Detailed information on the functions and examples are found in the help files of each function.
Before summarizing the function, we would like to highlight different concepts
## Example
Loading the data:
```{r}
library(itsadug)
library(mgcv)
data(simdat)
# add missing values to simdat:
simdat[sample(nrow(simdat), 15),]$Y <- NA
```
#### Model
We start with a somewhat weird model, for illustration purposes. We would like to include some factorial predictors, random smooths, and a nonlinear interaction. Therefore, the continuous predictor `Condition` is transformed to a categorical predictor:
```{r}
simdat$Condition <- as.factor(simdat$Condition)
# Note: this is really not the best fitting model for the data:
m1 <- bam(Y ~ Group * Condition + s(Time) + s(Trial) + ti(Time, Trial) + s(Time, Subject, bs='fs', m=1, k=5), data=simdat, discrete=TRUE)
```
The summary shows two parts: the first summary are the parametric terms, i.e., the 'linear' part, and the second summary presents the nonlinear smooths.
```{r}
summary(m1)
```
**Note** that although the model *looks* good (deviance explained of ```r round(summary(m1)$dev.expl*100,2)```%!), there are some issues with this model, which we won't address here.
#### Function `gamtabs`
With the function `gamtabs` the summary of the model can be converted to a Latex of HTML table, easy to integrate in a knitr or R Markdown report (see [here](http://www.jacolienvanrij.com/Tutorials/tutorialMarkdown.html)):
```{r, results="asis"}
gamtabs(m1, type="HTML")
```
## Partial versus Summed effects
An important distinction in regression model is the distinction between *partial* effects and *summed* effects:
- *partial* effects are the isolated effects of one particular predictor or interaction. Only one line of the summary... If you use the function `plot` for visualizing the model terms, you will see the partial effects. Try `plot(m1, all.terms=TRUE, rug=FALSE)`.
- *summed* effects are the predicted response measures for a certain situation or condition. So all the partial effects that apply to that situation are summed up, including the intercept. If not for all predictors a value is given, the functions automatically select a value for unspecified predictors. More information about these settings are provided in the summary in the console. If you do not see this summary, use the command `infoMessages("on")`.
## Inspection of summed effects
#### Function `plot_smooth`
The function `plot_smooth` visualizes 1-dimensional nonlinear effects as summed effects. The summary of the first plot below shows that this plot is specifically plotting the summed effect of a certain Subject and a certain Trial for the "Adults" age group.
```{r, echo=FALSE, fig.show='hide'}
plot_smooth(m1, view="Time", cond=list(Group="Adults"), rug=FALSE)
infoMessages('off')
```
Generally, we would like to generalize over the random effects predictors, such as `Subject`. We can exclude the random effects from the summed effects plots by the argument `rm.ranef=TRUE`.
**Note** the labels at the right side of the plot indicate the chosen settings.
```{r, fig.width=8, fig.height=4}
par(mfrow=c(1,2), cex=1.1)
# including random effects:
plot_smooth(m1, view="Time", cond=list(Group="Adults"))
# excluding random effects:
plot_smooth(m1, view="Time", cond=list(Group="Adults"), rm.ranef=TRUE)
```
The plots below show two different ways for combining the two age groups in one plot:
```{r, fig.width=8, fig.height=4}
par(mfrow=c(1,2), cex=1.1)
# version 1:
plot_smooth(m1, view="Time", plot_all="Group", rm.ranef=TRUE, ylim=c(-15,15))
# version 2:
plot_smooth(m1, view="Time", cond=list(Group="Adults"), rm.ranef=TRUE, rug=FALSE, col="red", ylim=c(-15,15))
plot_smooth(m1, view="Time", cond=list(Group="Children"), rm.ranef=TRUE, rug=FALSE, col="cyan", add=TRUE)
```
#### Function `fvisgam`
Two dimensional surfaces are used for visualizing higher order interactions, such as `Time`x`Trial`. Again, the argument `rm.ranef=TRUE` is used for canceling the random effects.
**Note** the difference in scale on the color legend:
```{r, fig.width=8, fig.height=4}
par(mfrow=c(1,2), cex=1.1)
# including random effects:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Adults"))
# excluding random effects:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Adults"), rm.ranef=TRUE)
```
Setting the color range to the same scale is important when comparing surfaces:
```{r, fig.width=8, fig.height=4}
par(mfrow=c(1,2), cex=1.1)
# group children:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Children"), rm.ranef=TRUE, zlim=c(-15,15), main="Children")
# group Adults:
fvisgam(m1, view=c("Time", "Trial"), cond=list(Group="Adults"), rm.ranef=TRUE, zlim=c(-15,15), main="Adults")
```
#### Function `plot_parametric`
The function plot parametric plots the summed effects of the parametric terms.
```{r, fig.width=12, fig.height=4}
par(mfrow=c(1,3), cex=1.1)
# all:
plot_parametric(m1, pred=list(Condition=levels(simdat$Condition),Group=c('Adults', 'Children')), rm.ranef=TRUE)
# children:
plot_parametric(m1, pred=list(Condition=levels(simdat$Condition)), cond=list(Group="Children"), rm.ranef=TRUE, main="Children")
# adults:
plot_parametric(m1, pred=list(Condition=levels(simdat$Condition)), cond=list(Group="Adults"), rm.ranef=TRUE, main="Adults")
```
## Inspection of *partial* effects
#### Function `plot` (mgcv)
```{r, fig.width=12, fig.height=4}
par(mfrow=c(1,3), cex=1.1)
# first smooth of time:
plot(m1, select=1, shade=TRUE, scale=0, ylim=c(-15,15))
abline(h=0)
# second smooth, of trial:
plot(m1, select=2, shade=TRUE, scale=0, ylim=c(-15,15))
abline(h=0)
# third smooth, nonlienar interaction between Time and Trial:
plot(m1, select=3, scale=0, rug=FALSE)
```
#### Function `pvisgam`
The function `pvisgam` provides a more colorful version of the contour plot that visualizes the *partial* interaction between `Time` and `Trial`. The function can also plot two dimensional representations of higher order interactions.
Below a comparison:
```{r, fig.width=12, fig.height=4}
par(mfrow=c(1,3), cex=1.1)
# Partial effect of nonlinear interaction between Time and Trial:
plot(m1, select=3, scale=0, rug=FALSE)
# Partial effect of nonlinear interaction between Time and Trial:
pvisgam(m1, select=3, view=c("Time", "Trial"), main="pvisgam", dec=1)
# Summed effect of nonlinear interaction between Time and Trial:
fvisgam(m1, view=c("Time", "Trial"), rm.ranef=TRUE, main="fvisgam", dec=1)
```
#### Function `inspect_random`
The function `plot` plots also random smooths, just as the function `inspect_random`:
```{r, fig.width=8, fig.height=4}
par(mfrow=c(1,2), cex=1.1)
plot(m1, select=4)
abline(h=0)
inspect_random(m1, select=4)
```
The function `inspect_random` can also plot selective random effects, or apply a function over the random effects:
```{r, fig.width=8, fig.height=4}
par(mfrow=c(1,2), cex=1.1)
# PLOT 1: selection
# ... the adults in black:
inspect_random(m1, select=4, cond=list(Subject=unique(simdat[simdat$Group=="Adults", "Subject"])), col=1, ylim=c(-15, 15), xpd=TRUE)
# ... add children in red:
inspect_random(m1, select=4, cond=list(Subject=unique(simdat[simdat$Group=="Children", "Subject"])), col=2, add=TRUE, xpd=TRUE)
# PLOT 2: averages
# ... of the adults:
inspect_random(m1, select=4, fun=mean, ylim=c(-15,15), cond=list(Subject=unique(simdat[simdat$Group=="Adults", "Subject"])), lwd=2)
# ... of the children:
inspect_random(m1, select=4, fun=mean, cond=list(Subject=unique(simdat[simdat$Group=="Children", "Subject"])), lwd=2, col=2, add=TRUE)
```
**Note** that this last plot suggest that children and adults not oly differ overall, but also differ in their pattern over time. So it may be preferred to include the grouping predictor `Group` in the nonlinear 'fixed' (non-random) effects, to capture this difference in the fixed effects terms of the model.
## Other functions
#### Function `plot_data`
Inspect the data on which the model is based:
```{r, fig.width=4, fig.height=4}
# All data, clustered by Group (very small dots):
plot_data(m1, view="Time", split_by="Group", cex=.5)
```
#### Function `plot_modelfit`
Inspect the model fit with `plot_modelfit`:
```{r, fig.width=4, fig.height=4}
simdat$Event <- interaction(simdat$Subject, simdat$Trial)
plot_modelfit(m1, view="Time", event=simdat$Event, n=8)
```
#### Function `plot_topo`
The function `plot_topo` is a variant on the functions `pvisgam` and `fvisgam` for plotting EEG topographies. See the help file of `plot_topo` for more examples.
```{r, fig.width=4, fig.height=4}
data(eeg)
# simple GAMM model:
m1 <- gam(Ampl ~ te(Time, X, Y, k=c(10,5,5),
d=c(1,2)), data=eeg)
# get electrode postions:
electrodes <- eeg[,c('X','Y','Electrode')]
electrodes <- as.list( electrodes[!duplicated(electrodes),] )
# topo plot, by default uses fvisgam
# and automatically selects a timestamp (270ms):
plot_topo(m1, view=c("X", "Y"), el.pos=electrodes, dec=2)
```