\setkeys{Gin}{width=1.0\textwidth}

% chunk 1
<<echo=FALSE>>=
opt2vec <- function(method, optans, time) {
## Displays optim() answers nicely
        optans$par<-round(optans$par,4)
        optans$value<-round(optans$value,6)
      time<-round(time,3)      
      outvec<-c(meth=method, fval=optans$value,
                 par=optans$par, c=optans$counts,
                 conv=optans$convergence, time=time)
      return(outvec)
}
 # Note we need to repeat the code but not echo it
hobbs.f<- function(x){ # # Hobbs weeds problem -- function
    if (abs(12*x[3]) > 500) { # check computability
       fbad<-.Machine$double.xmax
       return(fbad)
    }
    res<-hobbs.res(x)
    f<-sum(res*res)
}
hobbs.res<-function(x){ # Hobbs weeds problem -- residual
# This variant uses looping
    if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
    y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
         38.558, 50.156, 62.948, 75.995, 91.972)
    t<-1:12
    if(abs(12*x[3])>50) {
       res<-rep(Inf,12)
    } else {
       res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y
    }
}
@


\paragraph{}


\paragraph{}
Now we will express the sum of squares function in R. First we will use an
unscaled version of the loss function and residual. Then we will scale by simple 
powers of 10. Note that the data is embedded in the function.

% chunk 2A
<<echo=TRUE,fig=FALSE>>=

hobbs.f<- function(x){ # # Hobbs weeds problem -- function
    if (abs(12*x[3]) > 500) { # check computability
       fbad<-.Machine$double.xmax
       return(fbad)
    }
    res<-hobbs.res(x)
    f<-sum(res*res)
}
hobbs.res<-function(x){ # Hobbs weeds problem -- residual
# This variant uses looping
    if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
    y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 
         38.558, 50.156, 62.948, 75.995, 91.972)
    t<-1:12
    if(abs(12*x[3])>50) { # to control for bad parameter inputs
       res<-rep(Inf,12)
    } else {
       res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y
    }
}
@

\section{Minimization of the sum of squares using R tools}


%% be nice to do like JSS and have a pgk and proglang style (??borrow)
Let us try running this problem in \tt optim \rm using the standard 
Nelder-Mead method starting at the vector expressed as c(1, 1, 1).


% chunk 3
<<echo=TRUE>>=
cat("Running the Hobbs problem - code already loaded - with Nelder-Mead\n")
start<-c(1,1,1)
f0<-hobbs.f(start)
cat("initial function value=",f0,"\n")
tu<-system.time(ansnm1<-optim(start, hobbs.f, control=list(maxit=5000)))[1]
# opt2vec is a small routine to convert answers to a vector form for printing
Unscaled1<-opt2vec("Nelder",ansnm1,tu)
Unscaled1

@

Below is a very simple scaling of the original problem. Here is the
code for the function and residual. 

% chunk 3
<<echo=TRUE>>=
shobbs.f<-function(x){ # # Scaled Hobbs weeds problem -- function
    if (abs(12*x[3]*0.1) > 50) { # check computability
       fbad<-.Machine$double.xmax
       return(fbad)
    }
    res<-shobbs.res(x)
    f<-sum(res*res)
}
shobbs.res<-function(x){ # scaled Hobbs weeds problem -- residual
# This variant avoids looping
    if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
    y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
         75.995, 91.972)
    n<-length(y)
    t<-1:n
    if (12*x[3]*0.1>50) { 
       res<-rep(Inf,n)
    } else {
       res<-100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*t)) - y
    }
}
@


% chunk 4
<<echo=FALSE>>=
shobbs.f<-function(x){ # # Scaled Hobbs weeds problem -- function
    if (abs(12*x[3]*0.1) > 500) { # check computability
       fbad<-.Machine$double.xmax
       return(fbad)
    }
    res<-shobbs.res(x)
    f<-sum(res*res)
}
shobbs.res<-function(x){ # scaled Hobbs weeds problem -- residual
# This variant uses looping
    if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
    y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
         75.995, 91.972)
    t<-1:12
    res<-100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*t)) - y
}
@

Let us also try running the scaled version of the problem in \tt optim \rm using the standard 
Nelder-Mead method starting at the vector expressed as c(1, 1, 1).

% chunk 5
<<echo=TRUE>>=
cat("Running the scaled Hobbs problem - code already loaded - with Nelder-Mead\n")
start<-c(1,1,1)
f0<-shobbs.f(start)
cat("initial function value=",f0,"\n")
ts<-system.time(ansnms1<-optim(start, shobbs.f, control=list(maxit=5000)))[1]
Scaled2<-opt2vec("Nelder",ansnms1,ts)
Scaled2
@

Oops. We didn't really give the unscaled problem a fair start. We really should use
the equivalent point to that which we used to start the scaled problem, namely,
c(100, 10, 0.1). We've actually thought about this already and are referring to this
point as ``start 2''

% chunk 6
<<echo=TRUE>>=
cat("Running the Hobbs problem - code already loaded - with Nelder-Mead\n")
start<-c(100,10,.1)
f0<-hobbs.f(start)
cat("initial function value=",f0,"\n")
tu<-system.time(ansnm1a<-optim(start, hobbs.f, control=list(maxit=5000)))[1]
Unscaled2<-opt2vec("Nelder",ansnm1a,tu)
Unscaled2
@

And maybe we should also see what the scaled problem does from c(0.01, 0.1, 10),
which is the equivalent of unscaled c(1,1,1) or ``start 1''.

% chunk 7
<<echo=TRUE>>=
cat("Running the scaled Hobbs problem - code already loaded - with Nelder-Mead\n")
start<-c(.01,.1,10)
f0<-shobbs.f(start)
cat("initial function value=",f0,"\n")
ts<-system.time(ansnms1z<-optim(start, shobbs.f, control=list(maxit=5000)))[1]
Scaled1<-opt2vec("Nelder",ansnms1z,ts)
Scaled1
cat("Tabulate the results\n")
allans<-data.frame(Unscaled1,Scaled1,Unscaled2,Scaled2)
allans

@

The examples suggest:

\begin{itemize}

\item{That unscaled problems may have difficulty finding a solution from a "random" or simple starting point.}

\item{That from similar starting points, the scaled function may allow a solution to be found more efficiently i.e., with fewer function evaluations.}

\end{itemize}

\subsection{More optimizers}

Nash and Varadhan (2010) developed \textbf{optimx}, a wrapper for a number of optimization tools available for R. 
This allows a control \texttt{all.methods} which when set TRUE will try all appropriate methods within \textbf{optimx}.
However, we do now use expressions for the gradients.

% chunk 8
<<echo=TRUE>>=
hobbs.jac<-function(x){ # Jacobian of Hobbs weeds problem
   jj<-matrix(0.0, 12, 3)
   t<-1:12
    yy<-exp(-x[3]*t)
    zz<-1.0/(1+x[2]*yy)
     jj[t,1] <- zz
     jj[t,2] <- -x[1]*zz*zz*yy
     jj[t,3] <- x[1]*zz*zz*yy*x[2]*t
   return(jj)
}
@
\newpage

<<echo=TRUE>>=
hobbs.g<-function(x){ # gradient of Hobbs weeds problem
    # NOT EFFICIENT TO CALL AGAIN
    jj<-hobbs.jac(x)
    res<-hobbs.res(x)
    gg<-as.vector(2.*t(jj) %*% res)
    return(gg)
}
@
\newpage

<<echo=TRUE>>=
shobbs.jac<-function(x) { # scaled Hobbs weeds problem -- Jacobian
    jj<-matrix(0.0, 12, 3)
    t<-1:12
    yy<-exp(-0.1*x[3]*t)
    zz<-100.0/(1+10.*x[2]*yy)
    jj[t,1] <- zz
    jj[t,2] <- -0.1*x[1]*zz*zz*yy
    jj[t,3] <- 0.01*x[1]*zz*zz*yy*x[2]*t
    return(jj)
}
@
\newpage

<<echo=TRUE>>=
shobbs.g <- function(x) { # scaled Hobbs weeds problem -- gradient
   shj<-shobbs.jac(x)
   shres<-shobbs.res(x)
   shg<-as.vector(2.0* (shres %*% shj))
   return(shg)
}
@
\newpage

Now let us run this scaled problem. 

% chunk 8Z
<<echo=TRUE>>=
library(optimx)
cat("Running the Hobbs problem from the scaled start - code already loaded\n")
start<-c(100,10,.1)
starts<-c(1,1,1)
f0<-hobbs.f(start)
f0s<-shobbs.f(starts)
cat("initial function values: unscaled=",f0,"  scaled=",f0s,"\n")
# Following fails for some methods
test<-try(tu<-system.time(aall1<-optimx(start, hobbs.f, hobbs.g,control=list(all.methods=TRUE,maxit=5000)))[1])
if (class(test) != "try-error") {
print(aall1)
} else {cat("Unscaled all-methods attempt failed\n") }
test<-try(ts<-system.time(aall1s<-optimx(starts, shobbs.f,shobbs.g,control=list(all.methods=TRUE,maxit=5000)))[1])
if (class(test) != "try-error") {
print(aall1s)
} else {cat("Scaled all-methods attempt failed\n") }

@

\newpage
Here we see that many methods do quite well on this problem, but that some methods, especially the
Powell methods (bobyqa, newuoa, uobyqa) do poorly. These methods implicitly assume similar scalings
for the parameters, and even our simple factors on the parameters do not suffice for these algorithms.

\section{The KKT conditions}

For an unconstrained objective function f(\textbf{x}), the KKT (Karush - Kuhn - Tucker) conditions for a (local) minimum at \textbf{y} are that the gradient of f at \textbf{y} is null and that the eigenvalues of the Hessian of f at
\textbf{y} are all positive. The gradient is the vector of first derivatives of f with respect to the parameters (evaluated at \textbf{y}). The Hessian is the matrix of second derivatives. 

Of course, "null" and "positive" are ideas that presume exact arithmetic. The reality is that we want the gradient to be ``small''. For the Hessian, we want all eigenvalues positive as a first condition. That is relatively easy to check. However, if the ratio of smallest to largest eigenvalue is less than, say, 1E-6, we may suspect that the Hessian is very close to singular. This means that we cannot be sure that we are at a local minimum. Interpreted another way, it suggests that it is difficult to determine the minimum as a region of parameter space may be ``flat''.

\textbf{optimx} includes checks on the KKT conditions, and they appear in the tables of methods above. In the course of the
NCEAS Nonlinear Modeling project, an R package \tt kktc \rm was created. To install this package directly within R type:\\
\tt
 install.packages("kktc", repos="http://R-Forge.R-project.org")
\rm.

\section{Using a nonlinear least squares tool}

For completeness, the code to run this problem in nls() is included. 

% chunk 9
<<echo=TRUE>>=
@

Here we see the benefit of a) scaling and b) starting values ``close enough'' to the solution.

\section{A maximum likelihood version of the problem}


\subsection{Solving the ML problem with optimx}

The likelihood expressed in R based on the simple scaling of the parameters
in the 3-parameter logistic (file \tt shobbs.R \rm ) is as follows :

% chunk 12
<<echo=TRUE,fig=FALSE>>=
# shobbslikn.R
# 4 parameter negative log likelihood for weeds problem in 
#  direct (non-log) scale, including sigma2
#  We still use simple scalings for the logistic parameters

shobbsn.lik<-function(xaug){ # likelihood function including sigma
  xx<-xaug[1:3]
  sigma2<-xaug[4]
  res<-shobbs.res(xx)
  nll<-0.5*(length(res)*log(2*pi*sigma2)+sum(res*res)/sigma2)
}

shobbsn.lg<-function(xaug){ # gradient function including sigma
  xx<-xaug[1:3]
  sigma2<-xaug[4]
  res3<-shobbs.res(xx)
  n<-length(res3)
  f3<-crossprod(res3)
#  cat("in shobbs.lg: f3=",f3," n=",n," g3:\n")
  g3<-0.5*shobbs.g(xx)/sigma2
#  print(g3)
  gg<-c(g3,0.5*(n - f3/sigma2)/sigma2)
}

shobbs.f<-function(x){ # # Scaled Hobbs weeds problem -- function
    if (abs(12*x[3]*0.1) > 50) { # check computability
       fbad<-.Machine$double.xmax
       return(fbad)
    }
    res<-shobbs.res(x)
    f<-sum(res*res)
}
shobbs.res<-function(x){ # scaled Hobbs weeds problem -- residual
# This variant uses looping
    if (abs(12*x[3]*0.1) > 50) { # check computability
       rbad<-rep(.Machine$double.xmax, length(x))
       return(rbad)
    }
    if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
    y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
         75.995, 91.972)
    t<-1:12
    res<-100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*t)) - y
}
 
shobbs.jac<-function(x) { # scaled Hobbs weeds problem -- Jacobian
    jj<-matrix(0.0, 12, 3)
    t<-1:12
    yy<-exp(-0.1*x[3]*t)
    zz<-100.0/(1+10.*x[2]*yy)
    jj[t,1] <- zz
    jj[t,2] <- -0.1*x[1]*zz*zz*yy
    jj[t,3] <- 0.01*x[1]*zz*zz*yy*x[2]*t
    return(jj)
}

shobbs.g <- function(x) { # scaled Hobbs weeds problem -- gradient
   shj<-shobbs.jac(x)
   shres<-shobbs.res(x)
   shg<-as.vector(2.0* (shres %*% shj))
   return(shg)
}
# NOTE: we also put in a form where sigma appears in log form
shobbs.lik<-function(xaug){ # likelihood function including sigma
  xx<-xaug[1:3]
  logSigma<-xaug[4]
  sigma2=exp(2.0*logSigma);
  res<-shobbs.res(xx)
  nll<-0.5*(length(res)*log(2*pi*sigma2)+sum(res*res)/sigma2)
}

shobbs.lg<-function(xaug){ # gradient function including sigma
  xx<-xaug[1:3]
  logSigma<-xaug[4]
  sigma2=exp(2.0*logSigma);
  res3<-shobbs.res(xx)
  n<-length(res3)
  f3<-crossprod(res3)
#  cat("in shobbs.lg: f3=",f3," n=",n," g3:\n")
  g3<-0.5*shobbs.g(xx)/sigma2
#  print(g3)
  gg<-c(g3,(n - f3/sigma2))
}

@

As a practical matter, we had already obtained the solution in ADMB, and tested
the R code using

% chunk 13
<<echo=TRUE,fig=FALSE>>=
# test shobbsn.lik
  xtex<-c(1.9619, 4.9092, 3.1357)
  xaug<- c(xtex,0.21561)
  print(xaug) 
  negloglik<-shobbsn.lik(xaug)
  cat("negloglik=",negloglik,"\n")
@

We believe it is important to do such calculations, as silly errors are one of the 
main sources of frustration and wasted time in nonlinear modeling.

We can test the effectiveness of different methods of optimization in R using the
all.methods flag in optimx. Hence,
\newpage

% chunk 14
<<echo=TRUE,fig=FALSE>>=
test<-try(ansR4<-optimx(c(2,5,3,1),shobbsn.lik, shobbsn.lg, control=list(all.methods=TRUE)))

if (class(test) != "try-error") {
    print(ansR4)
} else {
    cat("ML attempt with optimx failed for some method\n")
}
@

\newpage

A majority of the methods have achieved the likelihood alread found by ADMB, and most
of these also found solutions satisfying both Karush-Kuhn-Tucker conditions. A minor concern
is the lack of solution for two methods and the presence of several solutions where only
the first KKT condition has been met.

Note that many workers find it onerous to provide gradients. (It usually is!!). ADMB is 
explicitly intended to provide them. If we rerun the above calculations with numerical
derivatives, we find (note that Nelder-Mead and newuoa and bobyqa are derivative free 
anyway):
\newpage

% chunk 15
<<echo=TRUE,fig=FALSE>>=
test<-try(ansR4n<-optimx(c(2,5,3,1),shobbsn.lik, control=list(all.methods=TRUE)))

if (class(test) != "try-error") {
    print(ansR4n)
} else {
    cat("ML attempt with optimx on scaled likelihood (no gradients) failed for some method\n")
}
@

\newpage


We can try to work with parameters in log form. That is, $log(b_1)$, $log(b_2)$, $log(b_3)$ and $log(\sigma)$.
This is a reparametrization, so we must correct our codes. Reparametrization, as mentioned below, 
is one way to improve the chances of successful application of optimization. 

% chunk 16
<<echo=TRUE,fig=FALSE>>=
# shobbslik.R
# 4 parameter negative log likelihood for weeds problem in 
#  log scale for all parameters
#  and we use shobb.res form using simple scaling parameters
# need source("shobbs.R")
require("optimx")
shobbs.lik<-function(xaug){ # likelihood function including sigma
  xx<-exp(xaug[1:3])
  logSigma<-xaug[4]
  sigma2=exp(2.0*logSigma);
  res<-shobbs.res(xx)
  nll<-0.5*(length(res)*log(2*pi*sigma2)+sum(res*res)/sigma2)
}

shobbs.lg<-function(xaug){ # gradient function including sigma
  xx<-exp(xaug[1:3])
  logSigma<-xaug[4]
  sigma2=exp(2.0*logSigma);
  res3<-shobbs.res(xx)
  n<-length(res3)
  f3<-crossprod(res3)
#  cat("in shobbs.lg: f3=",f3," n=",n," g3:\n")
  g3<-0.5*xx*shobbs.g(xx)/sigma2
#  print(g3)
  gg<-c(g3,(n - f3/sigma2))
}
cat("test derivative code\n")
rm(list=ls())
source("shobbs.R")
source("shobbslik.R")
xxa<-c(2,5,3,3)
f<-shobbs.lik(xxa)
f
ga<-shobbs.lg(xxa)
ga
require(numDeriv)
gn<-grad(shobbs.lik, xxa)
gn
cat("\n\n")

cat("Using analytic derivatives\n")
test<-try(ansR5<-optimx(log(c(2,5,3,1)),shobbs.lik, shobbs.lg, control=list(all.methods=TRUE)))
if (class(test) != "try-error") {
    print(ansR5)
} else {
    cat("ML attempt with optimx on scaled likelihood and analytic gradients failed for some method\n")
}

cat("Using numerical derivative approximations\n")
test<-try(ansR5n<-optimx(log(c(2,5,3,1)),shobbs.lik, control=list(all.methods=TRUE)))
if (class(test) != "try-error") {
    print(ansR5n)
} else {
    cat("ML attempt with optimx on scaled likelihood and numerical gradients failed for some method\n")
}
@


\section{An alternate parametrization choices}

How we set up our nonlinear models has, unfortunately, a potentially large impact on our
chances of getting a solution efficiently or at all. The initial problem here has been specified 
so that the solution is sought in terms of $b_1$, $b_2$, and $b_3$, and optionally sigma2. However, 
one of us (AN) automatically set up ADMB to use $log(b_1)$, $log(b_2)$ and $log(b_3)$ as the parameters, 
even though the naturally scaled parameters were reported. There is also a parametrization suggested
by Doug Bates, that predicts the state of weeds at time t as 


  $  yy ~ c_1/(1+exp((c_2 - tt)/c_3)) $


The first parameter, $c_1$, matches $b_1$ as the upper asymptote of the function, we now have 
$c_2$ as the time point where the function reaches its mid-height. $c_3$ controls the rate at
which the function progresses to the asymptote. These parameters are said to be much easier to
estimate with nonlinear least squares or optimization than the original set. However, it is not
the experience of the authors that this is obviously true.

?? code

\newpage

\section{Parametrization and model choices}

We have for this problem a quite large set of choices of ways to model the data using equivalent
but different ways to introduce the modelling parameters:

\begin{itemize}
\item{the original 3-parameter formulation with no scaling;}
\item{a 3-parameter standard model using simple powers of 10 to scale the parameters so the ones we
estimate have approximately the same magnitudes;}
\item{the two choices above, but setting the parameters estimated to the the log() of the ones we
want. This means that $b_j = exp(par_j)$ and we work with the $par_j$.}
\item{the Bates formulation of the previous section. We can also work with logs of the parameters.}
\item{all of the above repeated with the additional estimation of the variance. We may wish to 
estimate the log of the standard deviation.}
\end{itemize}

This represents a large number of possibilities for our approach to {\bf modeling}. In 
addition, we have many choices of how to effect the {\bf estimation} of these models.
\newpage


\section{Discussion}

The problem presented here, when submitted in the original scaling and parametrization, gives
most relevant software packages difficulty. It seems to be more or less generally useful to 
have our objective functions specified in terms of the logs of (positive) parameters rather 
than the naturally scaled parameters. Indeed, some automation of the setup of objective functions
to do this would be an interesting possibility. More general reparametrization, while extremely
helpful, requires insight into problems that users cannot be expected to have.

\newpage

\section{References}

?? ordering

John C Nash and Ravi Varadhan, (2011) Unifying Minimization Algorithms to Aid Software System Users: 
optimx() for R, in revision for the Journal of Statistical Software.

John C Nash and Ravi Varadhan (2010) optimx: A replacement and extension of the optim() function,
R package version 0.88, http://CRAN.R-project.org/package=optimx

ADMB Project. 2009 AD Model Builder: automatic differentiation model builder. Developed by David Fournier and freely available from admb-project.org.

Ben Bolker and Hans Skaug (2011) R2admb: ADMB to R interface functions, R package version 0.7/r34,
http://R-Forge.R-project.org/projects/r2admb/

?? more JAGS refs

Martyn Plummer (2010) rjags: Bayesian graphical models using MCMC,
R package version 2.2.0-1, http://mcmc-jags.sourceforge.net

Lunn, D., Spiegelhalter, D., Thomas, A., Best, N. (2009). The BUGS project: Evolution, critique, and future directions, \textit {Statistics in Medicine}, \textbf{28}, 3049-3067.

R Development Core Team (2010) R: A Language and Environment for Statistical
Computing, R Foundation for Statistical Computing, Vienna, Austria,
ISBN 3-900051-07-0, http://www.R-project.org

??others


\end{document}}