\documentclass[11pt]{article}

\usepackage{graphicx}
\DeclareGraphicsExtensions{.pdf}
\usepackage{listings}
\usepackage{verbatim}
\setlength{\oddsidemargin}{0in}
\setlength{\textwidth}{6.5in}
\newcommand{\code}[1]{{\tt #1}}
\bibliographystyle{chicago}


\begin{document}
\title{Wildflower writeup}
\author{Elizabeth Crone, Mollie Brooks, \& Perry de Valpine}
\date{\today}
\maketitle

\SweaveOpts{keep.source=TRUE}
<<options,echo=FALSE>>=
options(continue=" ")
if (require(cacheSweave,quietly=TRUE)) {
  setCacheDir("wildflower_cache")
}
@ 

\section{Summary}
\label{sec:summary}

This is a binomial GLMM with the interesting features of multiple (3) random
effects, including two that are crossed.  

\section{Introduction}
\label{sec:introduction}

These data are from E. Crone and colleagues' long-term study of
stages, flowering, and seed pod production of \textit{Astragulus
  scaphoides}.  This model looks at 
individual flowering as a function of the previous year's stage and
seed production (which is 0 if the previous stage was not
``Flowering'').  With this model, the effect (slope) of previous
year's seed production is interpretable as a cost
of reproduction.  At the population level, the percent of plants
flowering shows a strongly oscillating pattern, and experiments have shown that preventing reproduction allows individual plants to flower in successive years.  

The purpose of this analysis is to quantify how much the costs differ
among individual plants.  We estimate this variation using random
effects for both the slope and intercept of flowering over individual
ID.  (Use of random effects is justified because we have very little
data for some plants, and because we think a logit-normal distribution
is appropriate for variation among individuals.)  We also included
random effects of year, to account for background variation in
flowering probability.  Therefore the statistical interest in this
project is the ability of different packages to estimate crossed
random effects.  Of note, some plants never produced seeds, so most or
all of the information specifically about individual variation in cost
of reproduction comes from a subset of the data.

The stage categories are
\begin{itemize}
\item D = Dormant
\item S = Small
\item M = Medium
\item L = Large
\item F = Flowering (reference category for contrasts)  
\end{itemize}

<<>>=
flowers <- read.csv("../DATA/wildflower_pods_complete.csv",
                    header=TRUE, row.names = 1)
@ 

The total number of flowering plants and seed pods by year look like:

\includegraphics{wildflower_summary_figures.pdf}

The relation between flowering and pods in the previous, colored by
stage in previous year, looks like:

<<fig = TRUE>>=
plot(jitter(toF) ~ Pods, col = State, data = flowers, pch = 20)
legend(x = "topright", legend = levels(flowers[["State"]]), 
       pch = rep(20, 5), col = 1:5)
@ 

As a preliminary step, lme4 was used to determine
a reasonable set of random effects for use in software comparisons.
These include a random intercept for years as well as random intercept
and slope for individual plants.  For the two random effects for
plants, there is little support for needing correlation between these,
so they are treated as independent.  Since plants are crossed with years,
two of these three effects are crossed.



\section{Basics}
\label{sec:basics}

This model fits into a GLMM format. 

\section{R}
\label{sec:r}

In R, the package lme4 was used for fits\footnote{at least
  temporarily, using the \code{lme4.0} package on R-forge;
  this is equivalent to the stable version \code{lme4} on CRAN}.
  
<<>>=
library(lme4.0)
@ 

Set the ordering of the state levels to be biologically decreasing,
so \code{"F"} is the reference level.

<<setlevels>>=
flowers$State <- factor(flowers$State, levels = c("F","L","M","S","D"))
@ 

<<Rfit, cache = TRUE>>=
fit <- glmer(toF ~ State + Pods + (1 | PlantID) + (1 | Year)
             + (Pods-1 | PlantID), family = binomial, 
             data = flowers)
summary(fit)
@ 

\section{BUGS}
\label{sec:bugs}

Bugs was run using JAGS via R2jags.  Several preliminary runs were
used to navigate typical types of choices about priors, data
transformations, and chain length.  

\subsection{Data transformations}
\label{sec:data-transformations}

A common issue for including explanatory variables (covariates) in
logistic regression is whether to center (subtract the mean) and
standardize (divide by the standard deviation) before using BUGS.
Centering often leads to better mixing and standardizing can allow
interpretations that compare results for different variables.  On the
other hand, centering and standardizing require user work to present
result on the original scale of biological measurements, if desired.
Particularly when interest centers on the standard deviation of the
random effects -- rather than having these only represent nuisances --
transforming results back to the original scale offers a headache for
users.  Furthermore, we wished to compare results with R (lme4) and
ADMB, and both of these can achieve fits with the untransformed
covariates.  We conducted preliminary runs with and without
transformation and found that for these data, transformation makes
little difference for mixing.  Therefore we did not transform.

\subsection{Priors}
\label{sec:priors}

For priors, we focused on options for the random effects variances.
One option was to use the conjugate prior, which is a gamma
distribution for each random effect precision (1/variance).  This has
the advantage of being conjugate but is not flat and has been
questioned in some cases.  Another option was to use a uniform prior
on each random effect standard deviation, which some researchers
believe is more sound.  For the uniform prior we used an upper bound
of 5, which is far beyond any sampled value, and considered lower
bounds of 0.0001 and 0.01.  A lower bound is useful because the chain
can get sticky at very low values that overall have little support,
inhibiting mixing.

Priors for intercepts and the slope parameter were very wide normals
(variance = $10^6$).  Choice of these priors is worth careful
consideration because different priors for these -- which enter a
linear calculation on a logit scale -- correspond to different implied
priors on the inverse-logit (probability) scale.  However, we did not
explore other options for these priors as part of this exercise.

We compared the options for standard deviation priors based on three
chains run for 10000 iterations, thinned by 10 to record 1000
samples.  The Gelman-Rubin statistics for the uniform priors on the
standard deviations with a lower bound of 0.0001 were:

<<GRuniform, cache = TRUE>>=
library(R2jags)
load('../BUGS/test_SDflatprior_10000.Rdata')
gelman.diag(as.mcmc(wfit_jags_0001boundary_10000))
##rm(wfit_jags_0001boundary)
@ 

The Gelman-Rubin statistics for the gamma priors on the precisions were:
<<GRgamma, cache = TRUE>>=
load('../BUGS/test_SDgammaprior_10000.Rdata')
gelman.diag(as.mcmc(wfit_jags_gamma_priors_10000))
##rm(wfit_jags_gamma_priors)
@ 

For these sample sizes the gamma prior appears to have better mixing,
but we were not fully convinced of this conclusion.  Next we illustrate the mixing of these cases. The worst mixing is for
the standard deviation of the random slope of Pods for each plant.
For the uniform prior it looks like this:
<<jagsmix1,fig = TRUE>>=
matplot( wfit_jags_0001boundary_10000$BUGSoutput$sims.array[,1:3,'plantSlopeSD'],
        type="l",col=c(1,2,4),lty=1,
        xlab = 'iteration', ylab = 'SD of random Pods|PlantID effects',
        main = 'Mixing of SD for random Pods | PlantID effects\n (3 chains with uniform priors)' )
@ 
while for the gamma prior it looks like this:
<<jagsmix2,fig = TRUE>>=
matplot( wfit_jags_gamma_priors_10000$BUGSoutput$sims.array[,1:3,'plantSlopeSD'],
        type="l",col=c(1,2,4),lty=1,
        xlab = 'iteration', ylab = 'SD of random Pods|PlantID effects',
        main = 'Mixing of SD for random Pods | PlantID effects\n (3 chains with gamma priors)' )
@ 

While neither of these mixing figures appears to be great, the gamma
prior leads to more stickiness near zero.  Therefore we choose the
uniform prior.  In addition we observed from these and other trial
runs that there is negligible posterior support below, say, 0.01 for
the standard deviations, but mixing stickiness is worse as values move
very close to zero.  Therefore we made the \textit{ad hoc} choice of
0.01 as a minimum value for the standard deviations.  We believe these
types of navigation choices and mixing experiments, which can take
hours of human time, are not atypical as steps to using BUGS for final
results.

Other parameters appear to mix well, but it must be recognized
that all dimensions need to mix well in order to trust results.  Here
is an example of good mixing of the intercept (corresponding to the
F(lowering) state). (This sample is from the uniform SD priors):
<<jagsmix3,fig = TRUE>>=
matplot( wfit_jags_gamma_priors_10000$BUGSoutput$sims.array[,1:3,'intercept[1]'],
        type="l",col=c(1,2,4),lty=1,
        xlab = 'iteration', ylab = 'intercept (F state)',
        main = 'Mixing of intercept (F state) \n (3 chains with uniform priors for std. dev\'s)' )
@ 

\subsection{Chain length}
\label{sec:chain-length}

Finally, with our choice of priors for the random effects standard
deviations, we experimented with runs of 1000, 10000, and 100000
iterations to achieve acceptable mixing.  For each sample size we used
thinning to record an output sample size of 1000.  The Gelman-Rubin
statistics for these were as follows

<<GRsamplesizes, cache = TRUE>>=
load('../BUGS/test_different_run_lengths.Rdata')
writeLines('1000')
gelman.diag(as.mcmc(wfit_jags_1000))
writeLines('10000')
gelman.diag(as.mcmc(wfit_jags_10000))
writeLines('100000')
gelman.diag(as.mcmc(wfit_jags_100000))
@ 

Interpretation of Gelman-Rubin statistics requires judgment. For
1000 iterations, the large values for plantSlopeSD, moderate values
for some others, and large value for the combined multivariate result
are clearly not acceptable.  For 10000 samples, many of the point
estimates are not too bad (not too far above 1), but the upper
C.I. values indicate they could be unacceptable.  Considering the plots
of the plantSlopeSD samples, we find 10000 samples unacceptable.  We
decided to use 100000 samples, although for a single analysis of real
data for a publication, we would be likely to increase this even more
to alleviate any concern about mixing of the plant slope random effect
standard deviation (see above figures).

\subsection{Final BUGS code}
\label{sec:final-bugs-code}

The final BUGS code was as follows:

\lstinputlisting[breaklines = true, frame=single, numbers=left, basicstyle=\small]{../BUGS/wildflower_BUGS.txt}

\section{AD Model Builder (via R2admb)}
\subsection{Data Structure}
To implement the model in ADMB, we started by organizing the data in R. We made a design matrix for the fixed effects part of the model. In a first attempt of the model, we also used design matrices for the random effects, but the model ran very slowly (more than 6 minutes), so we abandoned that formulation. Instead, we decided to exploit the separability of the individual observations. The benefit of exploiting the separability is that this makes the Hessian matrix sparse so that inversion is much faster. The command line option \code{-shess} tells ADMB to use methods specific to a sparse Hessian.  To enable this separability, we created index vectors for the random effects , \code{plant} and \code{year}, that are used to access the appropriate elements of the random effects vectors for each observation. 
<<admbsetup>>=
setup_wildflower_data <- function(fdat) {
	year <- fdat$Year - min(fdat$Year) + 1
	plant <- as.integer(fdat$PlantID)
	
	fdat$Year=as.factor(fdat$Year)
	fdat$State =factor(fdat$State, levels = c("F","L","M","S","D"))
	
	fdesign=model.matrix(~State+Pods, data=fdat)
	
	list(nobs=nrow(fdat), 
		nplants=length(unique(plant)),
		nyears=length(unique(year)), 
		nfixed=ncol(fdesign),
		obs=fdat$toF, 
		fdesign=fdesign, 
		year=year,
		plant=plant,
		pods=fdat$Pods
	)
}

fdat=read.csv("../DATA/wildflower_pods_complete.csv")
data4ADMB=setup_wildflower_data(fdat)
@

Then we used \code{R2admb} to write the data to wildflowers.dat which ADMB reads in.  The elements of the list sent to this function must be in the same order as they appear in the \code{DATA\_SECTION} of the .tpl file (see below).
<<>>=
library(R2admb)
write_dat("wildflowers", L=data4ADMB)
@

\subsection{Initial Parameter Values}
 We expected our fixed effects coefficients to be negative because they're on the logit scale. So we thought it would be a good idea to start these parameters at -1. However, these starting values led the optimizer into a region of parameter space that produced NAs. So instead, we started the fixed effects coefficients at 0. We started the three standard deviations of the random effects at 1. All random effects were started at 0.001.
 
 
<<>>=
setup_wildflower_params <- function(nfixed, nplants, nyears){
	list(fcoeffs=rep(0,nfixed), 
		plant_sigma_intercept=1, 
		plant_sigma_slope=1,
		year_sigma=1,
		rcoeffs_plant_intercept=rep(0.001, nplants),
		rcoeffs_plant_slope=rep(0.001, nplants),
		rcoeffs_year=rep(0.001, nyears)
	)
}			
 params4ADMB=setup_wildflower_params(data4ADMB$nfixed, data4ADMB$nplants, data4ADMB$nyears)
@

 Then we used \code{R2admb} to write the initial values to wildflowers.pin
<<>>=
write_pin("wildflowers", L=params4ADMB)
@

 \subsection{ADMB code}

 \subsubsection{Data Section}
 This is where you define data objects to be read in from the .dat file. It is common practice to first define some integers that are later used as the lengths of dimensions of the data. This is so that the same .tpl file can be used with different data sets.  Lines 2 through 5 of the code define these types of data: number of observations, number of plants, number of years, and number of fixed effects. When defining vectors, matrices, or arrays, the range of indices (or sometimes a vector of indices) is written in parentheses beside the object name. Here, \code{obs, year, plant, }and \code{pods} are vectors with indices from 1 to \code{nobs}. The design matrix for the fixed effect part of the model (line 7 of the code) has rows indexed 1 to \code{nobs} and columns indexed 1 to \code{nfixed}. There can be no spaces in the parentheses.  
 
 \subsubsection{Parameter Section}
 This section is where parameters are defined. We put all the fixed effects coefficients together in a vector called \code{fixedcoeffs} so that we can multiply it by the design matrix of predictors for efficient computation; this is more efficient than having all the predictors in separate vectors and the coefficients defined separately. The indices of this vector must match the column indices of the design matrix.
 
  The ADMB code requires parameters for the standard deviations of each of the random effects because \code{random\_effects\_vector} objects in ADMB must be distributed as standard normal and transformed to create the desired distribution.  The standard deviations of our random effects are defined on lines 14, 15, and 16. We bounded them between 0.001 and 100; these bounds are defined in parentheses beside the definition. The 2 in the last position of the parentheses indicates that these should be optimized in phase 2 (see next section for explanation).  
 
 We define a number, \code{dummy} that will fill in as a place holder  for \code{plant\_int\_slope} in parts of the code (description below). It will not be optimized over, so there is no "\code{init\_}" in its definition.
 
 Our random effects vectors are defined on lines 18,19,and 20. These definitions have to come after all the other parameter definitions. Random effects vectors do get initialized from the pin file, but their definition doesn't require  "\code{init\_}" in front. According to out model, each plant will have some deviation from the overall intercept and slope; so \code{plant\_int} and \code{plant\_slope} have elements for each plant, indexed 1 to \code{nplants}.   Similarly, each year deviates from the overall intercept; so \code{year\_int} has elements for each year, indexed from 1 to \code{nyears}. Again, the 2 in the last position of the parentheses indicates that these should be optimized in phase 2 (see next section for explanation).  
 
The last definition is the \code{objective\_function\_value}; we've called it \code{jnll} to stand for "joint negative log-likelihood" which is what we calculate in the \code{PROCEDURE\_SECTION} and store in this object.

  \subsubsection{Phases}
 A useful feature of ADMB is that it can optimize over a subset of the parameters and later bring in additional parameters to be optimized in additional phases. It is common to first optimize over the fixed effects of a model and then bring in the random effects in subsequent phases. That's what we did here. We found that without using this feature, the model didn't converge. Using this feature is simple; any parameter that you want to optimize in a phase other than the first one, put the number of that phase as the last number in the parentheses where you define that parameter in the \code{PARAMETER\_SECTION}. To optimize the random effects in phase 2, we placed a \code{2} in the parentheses beside \code{plant\_int\_std, plant\_slope\_std, year\_int\_std, plant\_int, plant\_slope, }and \code{year\_int}. These are on lines 14,15,16,18,19,and 20 of the code below. 
 
  \subsubsection{Procedure Section}
  This section is where we calculate the joint negative log-likelihood by adding up the log-likelihoods of the binomially distributed observations and the standard normally distributed random effects. It's important to initialize \code{jnll} (line 25) because we're adding to it. The first for loop adds the negative log-likelihood of the observations to \code{jnll} (lines 26 to 36),  the second for loop adds the negative log-likelihood of the random effects of plant on slope and intercept (lines 37 to 41), and the third loop adds the negative log-likelihood of the random effects of year on the intercept (lines 42 to 45). 
  
  The point of using separable functions is to make the Hessian as sparse as possible, so that efficient algorithms can be used. For ADMB to know how sparse the Hessian is, only send parameters to the separable function that must be used together for a particular observation. The if-else conditional statement on lines 28 to 35 is there to make the Hessian as sparse as possible to make the computation more efficient. We avoid unnecessarily sending the random effect for \code{plant\_slope} when there are zero pods, because this term of the predictor equation is automatically zero. 
    
 \lstinputlisting[breaklines=true,frame=single,numbers=left, basicstyle=\small, language=C++]{../ADMB/wildflowers.tpl}

\subsection{Running the code in R via R2admb and getting results}
Load the package and let it get its bearings.
<<>>=
library(R2admb)
setup_admb()
@
 First we compile the code to make the executable
<<eval=FALSE>>=
compile_admb("wildflowers", safe=TRUE, re=TRUE, verbose=TRUE)
@
Then we can run the executable
<<eval=FALSE>>=
run_admb("wildflowers", verbose=FALSE, extra.args="-shess -noinit -nox")
@

The extra argument "\code{-shess}" tells ADMB that the Hessian will be sparse and to use algorithms that efficiently operate on this type of matrix. The extra argument "\code{-noinit}" tells ADMB to start the random effects from the last optimum values, instead of  the pin file values, when doing the Laplace approximation. "\code{nox}" reduces the amount of information output while it's running. 

Then we can read in the results and view them
<<eval=FALSE>>=
fit_admb <- read_admb("wildflowers")
@
<<echo=FALSE>>=
load("../ADMB/fit_admb.RData")
@
The whole summary contains all of the random effect estimates for all years and plants so we'll only output the first 9 coefficients which are the main ones of interest.
<<>>=
coef(summary(fit_admb))[1:9,]
@

\subsection{MCMC}

To get confidence intervals on estimates of variance, it's best to do MCMC sampling. This can be done in ADMB. The default is to have uniform priors. We need to add an sdreport declaration to the tpl file's \code{PARAMETER\_SECTION}

\code{	sdreport\_number plant\_int\_std2;}\\
and at the bottom of the \code{PROCEDURE\_SECTION} store the appropriate value in that object

\code{	plant\_int\_std2=plant\_int\_std;}\\
Then to run it and see the results, we would use the following code 
<<eval=FALSE>>=
compile_admb("wildflowers_mcmc", safe=TRUE, re=TRUE, verbose=TRUE)

run_admb("wildflowers_mcmc", verbose=TRUE, extra.args="-shess -noinit -nox",
	mcmc=TRUE,mcmc.opts=mcmc.control(mcmc=50000,
                    mcmcpars=c("plant_int_std", "plant_slope_std", "year_int_std")))

fit_mcmc <- read_admb("wildflowers_mcmc",mcmc=TRUE)
@
<<readADMBmcmc,echo=FALSE>>=
load("../ADMB/fit_mcmc.RData")
@
Then we can look at the effective sample size and the trace of the parameters of interest to see if our chain ran OK:

<<fig=TRUE>>=
library(coda)
mmc <- as.mcmc(fit_mcmc$mcmc)
effectiveSize(mmc[,1:9])
print(xyplot(mmc[,1:9],layout=c(3,3),asp="fill"))
@
The trace looks ok so we want to look at the density of the standard deviation parameters
<<fig=TRUE>>=
print(densityplot(mmc[,7:9]))
@
\section{Postscript: Confidence Limits in lmer}

lme4 does not have built-in routines for generating confidence limits of random-effects variance estimates for GLMMs.  The only way to generate confidence limits for fixed effects is by using the coefficient standard errors.  We evaluated two ways of generating confidence limits by bootstrapping: (1) parametric bootstrapping, i.e., evaluating the distribution across bootstrap data sets generated using the simulate() function in lmer; (2) nonparametric bootstrapping, i.e., evaluating the distribution across bootstrap data sets generated by resampling the original data frame with replacement.  (In this case, we did not stratify sampling by PlantID or by year.)  
Code for one parametric bootstrap replicate (calling a model called fit, assumes lme4 is loaded, fruitdat1 is the original data, and output created to hold coefficients):

<<doboot,echo=FALSE,cache = TRUE>>=
flowers$tmp.resp = simulate(fit)$sim_1
t1 <- system.time(tmp.fit <- lmer(tmp.resp ~ State + Pods + (1 + Pods|PlantID) + (1|Year), 
  family = binomial, data = flowers))["elapsed"]
t2 <- system.time(tmp.fit <- refit(fit,simulate(fit)$sim_1))["elapsed"]
@ 
<<showboot,eval=FALSE>>=
tmp.fit = refit(fit,simulate(fit)$sim_1)
@ 

Note that using refit is faster than refitting the model from scratch or
using \code{update}.

<<>>=
getcoefs = function(fit) {
  c(coef(summary(fit))[,1],
  unlist(sapply(VarCorr(fit),attr,"stddev")))
  ## attr(VarCorr(fit)[["PlantID"]],"correlation")[1,2])
}
coef.tmp = getcoefs(tmp.fit)
@ 

Code for one nonparametric bootstrap replicate (same assumptions as above):
<<bootNP, cache = TRUE>>=
dat.tmp2 = flowers[sample(nrow(flowers), replace = TRUE),]
fit.tmp2 = refit(fit,dat.tmp2$toF)
coef.tmp2 = getcoefs(fit.tmp2) 
@ 

Here are the results of comparing these three methods: BLACK lines are the distribution defined by means and standard errors for fixed effects, RED lines are the distribution of parametric bootstrap samples (250 replicates), and BLUE lines are the distribution of nonparametric bootstrap samples (250 replicates):

\includegraphics{lme4_confidence_intervals.pdf}

Overall, parametric bootstrapping led to distributions that were similar to, but slightly wider than, asymptotic distributions.  I (EEC)  suspect these are the best measure of confidence in parameter estimates, though we are not aware of a general proof of this assertion. Nonparametric bootstrapping did not work well for these data. I (EEC) suspect poor performance of parametric bootstrapping results from the very sparse nature of the data; some replicate samples might not include any plants with high seed set (see graph of flowering vs. pods).  We did not include calculation of confidence limits in the analyses of simulated data because the time demands were prohibitive.  On my laptop, 10 replicates of parametric bootstrapping took 525 seconds to run.

\end{document}

