\documentclass[11pt,letterpaper]{article}

\usepackage[american]{babel}
\usepackage[utf8]{inputenc}
\newcommand{\R}{{\sf R}}
\newcommand{\Splus}{{\sf S-PLUS}}
\newcommand{\fixme}[1]{\textbf{FIXME: #1}}
%\newcommand{\fixme}[1]{}
\newcommand{\code}[1]{{\tt #1}}
\providecommand{\pkg}[1]{{\fontseries{b}\selectfont #1}}
%% \newcommand{\pkg}[1]{{\fontseries{b}\selectfont #1}}
\let\proglang=\textsf
\usepackage{url}
\usepackage{alltt}
\usepackage{fancyvrb} % with VerbatimFootnotes, allow verb in footnotes
\usepackage{listings}
\usepackage{verbatim}
\usepackage{hyperref}
\usepackage{natbib}
\bibliographystyle{chicago}
% \bibliographystyle{apa-good}
\usepackage{Sweave}
\usepackage{fullpage}
\usepackage{graphicx} %If you want to include postscript graphics
\usepackage{wrapfig} 
\usepackage[tableposition=top]{caption}
\usepackage{ifthen}


\begin{document}

\title{The Hobbs Weed Infestation Problem}
\author{John C. Nash, Telfer School of Management, University of Ottawa, Ottawa, Canada\\
        Anders Nielsen, Technical University of Denmark, Charlottenlund, Denmark\\
        Ben Bolker, McMaster University, Hamilton, Canada}
\date{May 2011}
\maketitle

\section*{Note}

\noindent
This (partially finished) vignette borrows heavily on a Scaling-Optim vignette by John Nash
and some material is duplicated so that the flow of ideas here is self-contained. We will use
the statistical programming system\proglang{R}(\textit{http://www.r-project.org}) to present most of our calculations,
but will use R, AD Model Builder (abbreviated ADMB) and the Bayesian estimation approach called BUGS (via
either JAGS or OpenBUGS) to attempt estimation of models for the data of this problem.  

This document is intended as a repository of a range of different attempts to approach a problem in
nonlinear estimation. As such it has sections that are heavy reading. It is an archive of what we did
and thought about rather than a summary, though the presentation is not necessarily in order. 


\indent
\section{The initial Hobbs weed infestation problem}

This problem came across the desk of John Nash sometime in 1974
when he was working on the development of a program to solve nonlinear least squares 
estimation problems. He had written several variants of Gauss-Newton methods in BASIC 
for a Data General NOVA system that offered a very limited environment of a 10 character
per second teletype with paper tape reader and punch that allowed access to a maximum 
8K byte (actually 4K word) segment of the machine. Arithmetic was particularly horrible
in that floating point used six hexadecimal digits in the mantissa with no guard digit.

The problem was supplied by Mr. Dave Hobbs of Agriculture Canada. As told to John Nash,
the observations (y) are weed densities per unit area over 12 growing periods. We were never
given the actual units of the observations. Here is the data.

% chunk 1
<<echo=TRUE,fig=TRUE,width=7,height=4>>=
# draw the data
    y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
         38.558, 50.156, 62.948, 75.995, 91.972)
    t<-1:12
    plot(t,y)
    title(main="Hobbs' weed infestation data", font.main=4)
@

It was suggested that the appropriate model was a 3-parameter logistic, that is,

\begin{equation}
     y_i  =  b_1 / (1 + b_2 \exp( - b_3 t_i))+\varepsilon_i
\end{equation}

where $\varepsilon_i\sim N(0,\sigma^2)$, $t_i$ is the growing period, and $i=1,\ldots ,26$.

The problem can be approached as a nonlinear least squares problem. In R, the nls() function
could be used, but fails for many sets of starting parameters, and optimization turns out
to be more robust in this case. Examples of the use of nls() are presented later. The problem
has been published in a number of places, notably \cite{jncnm79} and \cite{jnmws87}.

Most discussions of the problem have centered on solving the three parameter nonlinear least
squares problem, either as such a problem using Gauss-Newton (ref?) or Marquardt (ref?) methods,
or else via general nonlinear optimization of the sum of squares. 
While minimizing the sum of squares of residuals is one approach to ``fitting'' a function to data, 
statisticians frequently prefer the maximum likelihood approach so that the variance can also be 
estimated at the same time. This is conventionally accomplished by minimizing the negative log likelihood.

For our problem, \\
\begin{center}
$  nll = 0.5  \sum_{i=1}^n ( \log(2 \pi \sigma^2)+(res[i]^2)/\sigma^2 ) $
\\
or\\
$ nll =  0.5  ( n \log(2  \pi  \sigma^2) + \sum_{i=1}^n  (res[i]^2/\sigma^2 ) ) $
\\
\end{center}

There are a number of annoyances about the particular logistic problem and data:
\begin{itemize}
\item{The scale of the problem is such that the upper asymptote $b_1$ has scale of the order
of 100, $b_2$ has scale of the order 10, while the coefficient of time $b_3$ is scaled by 0.1. We can
explicitly put such scaling factors into the model so that the coefficients all come out with
roughly similar scale. As we shall show, this eases the computational task.}
\item{It is useful if all the coefficients are positive. We can use explicit bounds in some 
optimization and nonlinear least squares methods, but will see that writing the model in terms
of the logs of the parameters achieves the same goal and also provides a scaling of the 
problem.}
\item{Other transformations of the problem are possible; at least one will be mentioned later.}

\end{itemize}

\section{Preparation}

Load all the packages we'll need, up front
(not all of these are absolutely necessary, but
it will be most convenient to make sure you have
them installed now).

% chunk 2
<<>>=
library(ggplot2)  ## pictures
library(bbmle)    ## MLE fitting in R (wrapper)
library(optimx)   ## additional R optimizers
library(MCMCpack) ## for post-hoc MCMC in R
library(coda)     ## analysis of MCMC runs (R, BUGS, ADMB)
library(R2admb)   ## R interface to AD Model Builder
library(R2jags)   ## R interface to JAGS (BUGS dialect)
# source("../R/tadpole_R_funs.R")

@ 

\section{Solving the maximum likelihood problem -- log-form parameters}

We will solve the 4-parameter maximum likelihood problem by ADMB, R, and JAGS (a BUGS variant, \cite{Plummer03jags}).
The parameters our estimation or optimization tools seek will be the logs of the quantities 
that enter the models, thereby forcing positivity on these quantities. 

\subsection{Solving the ML problem in ADMB}

The we use a small script to prepare the data file for use by AD Model Builder:

% chunk 3
<<echo=TRUE,fig=FALSE>>=
  source("../ADMB/weeds_ADMB_funs.R")
  weeds_ADMB_setup()
@

This creates the file 

  \begin{scriptsize}
  \verbatiminput{../ADMB/weed.dat}
  \end{scriptsize}

The implementation follows the typical AD Model Builder template, first data is read in, 
then model parameters are declared, and finally the negative log likelihood is coded. 

{
  \begin{scriptsize}
  \linespread{.8}
  \verbatiminput{../ADMB/weed.tpl}
  \end{scriptsize}
}


The model can be run from the command line by compiling and then executing the produced binary, 
but this can also be accomplished from within the \proglang{R} console like this as long as we ensure that 
the\proglang{R} working directory is set to the directory containing the AD Model Builder code for the 
problem \verb|weed.tpl| and the corresponding data file  \verb|weed.dat|.

% chunk 4 -- Note we save output and put in place
<<echo=TRUE,fig=FALSE>>=
file.copy("../ADMB/weed.tpl","weed.tpl")
file.copy("../ADMB/weed.dat","weed.dat")

system('admb weed')
system.time(system('./weed > weedout.txt'))
unlink("weed.tpl") # for cleanup of WRITEUP directory
unlink("weed.dat")
@

%% appears weedout.txt is 0 bytes. 

  \begin{scriptsize}
  \verbatiminput{weedout.txt}
  \end{scriptsize}

After running the model we can plot the fit to make sure all went well.

\setkeys{Gin}{width=0.7\textwidth}
\begin{center}

% chunk 5
<<echo=FALSE,fig=TRUE,width=7,height=4>>=
t<-1:12
y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
       38.558, 50.156, 62.948, 75.995, 91.972)

fit<-read.table('weed.std', head=FALSE, skip=1)

pred<-fit[fit$V2=='pred',]
plot(t,y)
lines(t,pred[,3], col='red')
lines(t,pred[,3]+2*pred[,4], col='red', lty='dashed')
lines(t,pred[,3]-2*pred[,4], col='red', lty='dashed')
@

\end{center}


Estimates and standard deviations of model parameters and derived quantities are 
located in the file \verb|weed.std|.  

\begin{scriptsize}
\verbatiminput{weed.std}
\end{scriptsize}



\subsection{Solving the ML problem in R}

We first load functions to compute the residuals (actually in natural rather than
log scale -- the logs are only needed in the optimization).

% chunk 6
<<>>=
# source("../R/shobbs.R", echo=TRUE)
source("../R/lhobbs.R", echo=TRUE)
@

Then we define the log likelihood function. 

% chunk 7
<<>>=
source("../R/lhobbslik.R", echo=TRUE)
@

It is a good idea to test the function and gradient to make sure we have our code
working properly. There is still some room for error, but at least the numerical
gradients of the function match the values from the analytic expressions.

% chunk 8
<<>>=
source("../R/lhobbsliktest.R", echo=TRUE)
@

Finally, we run the optimization. Here we try a number of methods both with and 
without analytic gradients. 

% chunk 9
<<>>=
source("../R/lhobbslikrun.R", echo=TRUE)
@

\subsection{Solving the ML problem in JAGS}

The BUGS/JAGS control file we will use is given below. This simply provides
the (scaled) model and very simple uniform priors. The range for the deviance
was widened based on estimates found.

  \begin{scriptsize}
  \verbatiminput{../BUGS/weeds_bugs.txt}
  \end{scriptsize}

We run JAGS from \proglang{R} as follows. 

% chunk 10
<<echo=TRUE>>=
# rm(list=ls()) ## Clear workspace. NOT in vignette.
library(rjags) # ? Is this needed?
library(R2jags)
y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
         75.995, 91.972)
t<-1:12
# Note that n.iter=10000 and n.burning=1000 was not sufficient.

tfit_jags <- jags(model="../BUGS/weeds_bugs.txt",
      data=list(N=length(y),t=t, y=y),
      parameters.to.save=c("logb1","logb2","logb3","logsigma"),
      n.iter=10000,
      n.burnin=1000,
      n.thin=10,
      n.chains=3,
      inits=list(list(logb1=1,logb2=1,logb3=1,logsigma=1,
            .RNG.name="base::Wichmann-Hill", .RNG.seed=654321),
            list(logb1=-1,logb2=-1,logb3=-1,logsigma=-1,
            .RNG.name="base::Wichmann-Hill", .RNG.seed=321654),
            list(logb1=0,logb2=0,logb3=0,logsigma=0,
            .RNG.name="base::Wichmann-Hill", .RNG.seed=123456)))

# wpred <- .... stuff to create predicted model
cat("output of jags() run\n")
 tfit_jags
@

This shows us the estimates found from our MCMC iterations. The effective sample
sizes for all four estimated parameters are reasonably large and the Gelman Rhat 
statistics are less than 1.2, suggesting that the chains have converged.  
The following code simply extracts part of the output that we will used for some 
graphs. 

% chunk 11
<<echo=TRUE>>=
 tfit_jags_b <- tfit_jags$BUGSoutput
@

We graph the progress of the MCMC process to check if it has stabilized ("converged").
While there is some noise in the plots, they are reasonably stable. 

% chunk 12
<<echo=TRUE>>=
library(emdbook) ## for as.mcmc.bugs
 tfit_jags_m <- as.mcmc.bugs(tfit_jags_b)
@

We can also look at the parameter distributions. Again, these are not too different, and
we can treat the results as reasonable.

% chunk 13
<<echo=TRUE,fig=TRUE,width=7,height=4>>=
library(coda)
 print(xyplot(tfit_jags_m))
@

Clearly the Bayesian (JAGS) approach takes a very different point of view to the optimization
(R and ADMB) methods. Nevertheless, the model parameters are not very different, and the 
JAGS outputs give us some idea of the possible distribution of the parameters. 

% chunk 14
<<echo=TRUE,fig=TRUE,width=7,height=4>>=
## x11()
 print(densityplot(tfit_jags_m))
@


\section{Other views of the Weeds problem}

\subsection{Nonlinear least squares}

The original problem was, with apparent simplicity, to find the three unscaled logistic 
parameters that provided the best fit to the 12 observations. This turned out to be 
surprisingly difficult. Indeed, the Statistical Research Service of Agriculture Canada was, 
in 1974, one of the most sophisticated and well-connected biostatistics units in the world, 
but staff found no computer program that would crack this problem. John Nash was, at the time,
trying to develop programs in BASIC for a Data General Nova minicomputer and an HP 9830A 
desktop calculator, and offered to give his codes a try. These succeeded in finding a solution. 

The Weeds problem was, in fact, a prime motivator to prepare very robust nonlinear least squares
and optimization codes for what were, in today's view, rather poor computational platforms. 
However, even today, the original problem creates difficulties. Let us see what \proglang{R} can do.

% chunk 15
<<echo=TRUE>>=
source("../R/weeds_nls1.R", echo=TRUE)
@

Here we see that the very crude Nelder-Mead optimizer does reasonably well if we give it a 
scaled start or work with the logs of the parameters. the \code{nls()} does less well than we might
hope. However, truthfully, this is a "bad" problem.

\subsection{Reparametrization}

Our model is an S curve, but our data only has the early up-turned part of the curve, so we 
can anticipate that we are going to have difficulty estimating one or both of the upper limit of 
the growth and the rate of getting there. These are the $b_1$ and $b_3$ parameters in he original
model. 

Doug Bates has suggested the model

$$
yy = c_1/(1+\exp((c_2 - tt)/c_3)) 
$$

where the $c_1$ should be the same as $b_1$ of the original model. However, now it is
much clearer that $c_2$ is the time at which we reach the half-way point to $c_1$. 
Moreover, $c_3$ is now scaled in time units and controls how fast the curve rises. (The
same arguments can be applied to declining data with $c_3$ having its sign reversed.)

Looking at the graph of the data, we can provide a rough guess at the half-way point as 
(13, 100), making a rough guess of $c_1 = 100$ and $c_2=13$. Plugging these values and 
the first data point (1, 5.308) into the model above gives us

$$ 
c_3 = (c_2 - 1)/\log(c_1/y[1]-1)   =  3.331294
$$

This gives the output

%% BMB: this should be an R code chunk, not \verbatim (!)
\begin{verbatim}
> dbn<-nls(y~c1/(1+exp((c2-t)/c3)), start=list(c1=200,c2=13,c3=3.33), trace=TRUE)
132.4117 :  200.00  13.00   3.33 
3.087018 :  193.510427  12.315751   3.173038 
2.587485 :  196.075031  12.414664   3.188793 
2.587277 :  196.184968  12.417261   3.189077 
2.587277 :  196.186251  12.417298   3.189083 
\end{verbatim}

where the starting function value and parameters are very close to the solution.
However, it should be noted that random starts to \code{nls()} seem to give 
singular gradient or similar failures to those using the standard model. 

\bibliography{../../nonlin}


\end{document}} 
