\setkeys{Gin}{width=1.0\textwidth} % chunk 1 <>= opt2vec <- function(method, optans, time) { ## Displays optim() answers nicely optans$par<-round(optans$par,4) optans$value<-round(optans$value,6) time<-round(time,3) outvec<-c(meth=method, fval=optans$value, par=optans$par, c=optans$counts, conv=optans$convergence, time=time) return(outvec) } # Note we need to repeat the code but not echo it hobbs.f<- function(x){ # # Hobbs weeds problem -- function if (abs(12*x[3]) > 500) { # check computability fbad<-.Machine$double.xmax return(fbad) } res<-hobbs.res(x) f<-sum(res*res) } hobbs.res<-function(x){ # Hobbs weeds problem -- residual # This variant uses looping if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3") y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) t<-1:12 if(abs(12*x[3])>50) { res<-rep(Inf,12) } else { res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y } } @ \paragraph{} \paragraph{} Now we will express the sum of squares function in R. First we will use an unscaled version of the loss function and residual. Then we will scale by simple powers of 10. Note that the data is embedded in the function. % chunk 2A <>= hobbs.f<- function(x){ # # Hobbs weeds problem -- function if (abs(12*x[3]) > 500) { # check computability fbad<-.Machine$double.xmax return(fbad) } res<-hobbs.res(x) f<-sum(res*res) } hobbs.res<-function(x){ # Hobbs weeds problem -- residual # This variant uses looping if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3") y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) t<-1:12 if(abs(12*x[3])>50) { # to control for bad parameter inputs res<-rep(Inf,12) } else { res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y } } @ \section{Minimization of the sum of squares using R tools} %% be nice to do like JSS and have a pgk and proglang style (??borrow) Let us try running this problem in \tt optim \rm using the standard Nelder-Mead method starting at the vector expressed as c(1, 1, 1). % chunk 3 <>= cat("Running the Hobbs problem - code already loaded - with Nelder-Mead\n") start<-c(1,1,1) f0<-hobbs.f(start) cat("initial function value=",f0,"\n") tu<-system.time(ansnm1<-optim(start, hobbs.f, control=list(maxit=5000)))[1] # opt2vec is a small routine to convert answers to a vector form for printing Unscaled1<-opt2vec("Nelder",ansnm1,tu) Unscaled1 @ Below is a very simple scaling of the original problem. Here is the code for the function and residual. % chunk 3 <>= shobbs.f<-function(x){ # # Scaled Hobbs weeds problem -- function if (abs(12*x[3]*0.1) > 50) { # check computability fbad<-.Machine$double.xmax return(fbad) } res<-shobbs.res(x) f<-sum(res*res) } shobbs.res<-function(x){ # scaled Hobbs weeds problem -- residual # This variant avoids looping if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3") y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) n<-length(y) t<-1:n if (12*x[3]*0.1>50) { res<-rep(Inf,n) } else { res<-100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*t)) - y } } @ % chunk 4 <>= shobbs.f<-function(x){ # # Scaled Hobbs weeds problem -- function if (abs(12*x[3]*0.1) > 500) { # check computability fbad<-.Machine$double.xmax return(fbad) } res<-shobbs.res(x) f<-sum(res*res) } shobbs.res<-function(x){ # scaled Hobbs weeds problem -- residual # This variant uses looping if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3") y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) t<-1:12 res<-100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*t)) - y } @ Let us also try running the scaled version of the problem in \tt optim \rm using the standard Nelder-Mead method starting at the vector expressed as c(1, 1, 1). % chunk 5 <>= cat("Running the scaled Hobbs problem - code already loaded - with Nelder-Mead\n") start<-c(1,1,1) f0<-shobbs.f(start) cat("initial function value=",f0,"\n") ts<-system.time(ansnms1<-optim(start, shobbs.f, control=list(maxit=5000)))[1] Scaled2<-opt2vec("Nelder",ansnms1,ts) Scaled2 @ Oops. We didn't really give the unscaled problem a fair start. We really should use the equivalent point to that which we used to start the scaled problem, namely, c(100, 10, 0.1). We've actually thought about this already and are referring to this point as ``start 2'' % chunk 6 <>= cat("Running the Hobbs problem - code already loaded - with Nelder-Mead\n") start<-c(100,10,.1) f0<-hobbs.f(start) cat("initial function value=",f0,"\n") tu<-system.time(ansnm1a<-optim(start, hobbs.f, control=list(maxit=5000)))[1] Unscaled2<-opt2vec("Nelder",ansnm1a,tu) Unscaled2 @ And maybe we should also see what the scaled problem does from c(0.01, 0.1, 10), which is the equivalent of unscaled c(1,1,1) or ``start 1''. % chunk 7 <>= cat("Running the scaled Hobbs problem - code already loaded - with Nelder-Mead\n") start<-c(.01,.1,10) f0<-shobbs.f(start) cat("initial function value=",f0,"\n") ts<-system.time(ansnms1z<-optim(start, shobbs.f, control=list(maxit=5000)))[1] Scaled1<-opt2vec("Nelder",ansnms1z,ts) Scaled1 cat("Tabulate the results\n") allans<-data.frame(Unscaled1,Scaled1,Unscaled2,Scaled2) allans @ The examples suggest: \begin{itemize} \item{That unscaled problems may have difficulty finding a solution from a "random" or simple starting point.} \item{That from similar starting points, the scaled function may allow a solution to be found more efficiently i.e., with fewer function evaluations.} \end{itemize} \subsection{More optimizers} Nash and Varadhan (2010) developed \textbf{optimx}, a wrapper for a number of optimization tools available for R. This allows a control \texttt{all.methods} which when set TRUE will try all appropriate methods within \textbf{optimx}. However, we do now use expressions for the gradients. % chunk 8 <>= hobbs.jac<-function(x){ # Jacobian of Hobbs weeds problem jj<-matrix(0.0, 12, 3) t<-1:12 yy<-exp(-x[3]*t) zz<-1.0/(1+x[2]*yy) jj[t,1] <- zz jj[t,2] <- -x[1]*zz*zz*yy jj[t,3] <- x[1]*zz*zz*yy*x[2]*t return(jj) } @ \newpage <>= hobbs.g<-function(x){ # gradient of Hobbs weeds problem # NOT EFFICIENT TO CALL AGAIN jj<-hobbs.jac(x) res<-hobbs.res(x) gg<-as.vector(2.*t(jj) %*% res) return(gg) } @ \newpage <>= shobbs.jac<-function(x) { # scaled Hobbs weeds problem -- Jacobian jj<-matrix(0.0, 12, 3) t<-1:12 yy<-exp(-0.1*x[3]*t) zz<-100.0/(1+10.*x[2]*yy) jj[t,1] <- zz jj[t,2] <- -0.1*x[1]*zz*zz*yy jj[t,3] <- 0.01*x[1]*zz*zz*yy*x[2]*t return(jj) } @ \newpage <>= shobbs.g <- function(x) { # scaled Hobbs weeds problem -- gradient shj<-shobbs.jac(x) shres<-shobbs.res(x) shg<-as.vector(2.0* (shres %*% shj)) return(shg) } @ \newpage Now let us run this scaled problem. % chunk 8Z <>= library(optimx) cat("Running the Hobbs problem from the scaled start - code already loaded\n") start<-c(100,10,.1) starts<-c(1,1,1) f0<-hobbs.f(start) f0s<-shobbs.f(starts) cat("initial function values: unscaled=",f0," scaled=",f0s,"\n") # Following fails for some methods test<-try(tu<-system.time(aall1<-optimx(start, hobbs.f, hobbs.g,control=list(all.methods=TRUE,maxit=5000)))[1]) if (class(test) != "try-error") { print(aall1) } else {cat("Unscaled all-methods attempt failed\n") } test<-try(ts<-system.time(aall1s<-optimx(starts, shobbs.f,shobbs.g,control=list(all.methods=TRUE,maxit=5000)))[1]) if (class(test) != "try-error") { print(aall1s) } else {cat("Scaled all-methods attempt failed\n") } @ \newpage Here we see that many methods do quite well on this problem, but that some methods, especially the Powell methods (bobyqa, newuoa, uobyqa) do poorly. These methods implicitly assume similar scalings for the parameters, and even our simple factors on the parameters do not suffice for these algorithms. \section{The KKT conditions} For an unconstrained objective function f(\textbf{x}), the KKT (Karush - Kuhn - Tucker) conditions for a (local) minimum at \textbf{y} are that the gradient of f at \textbf{y} is null and that the eigenvalues of the Hessian of f at \textbf{y} are all positive. The gradient is the vector of first derivatives of f with respect to the parameters (evaluated at \textbf{y}). The Hessian is the matrix of second derivatives. Of course, "null" and "positive" are ideas that presume exact arithmetic. The reality is that we want the gradient to be ``small''. For the Hessian, we want all eigenvalues positive as a first condition. That is relatively easy to check. However, if the ratio of smallest to largest eigenvalue is less than, say, 1E-6, we may suspect that the Hessian is very close to singular. This means that we cannot be sure that we are at a local minimum. Interpreted another way, it suggests that it is difficult to determine the minimum as a region of parameter space may be ``flat''. \textbf{optimx} includes checks on the KKT conditions, and they appear in the tables of methods above. In the course of the NCEAS Nonlinear Modeling project, an R package \tt kktc \rm was created. To install this package directly within R type:\\ \tt install.packages("kktc", repos="http://R-Forge.R-project.org") \rm. \section{Using a nonlinear least squares tool} For completeness, the code to run this problem in nls() is included. % chunk 9 <>= @ Here we see the benefit of a) scaling and b) starting values ``close enough'' to the solution. \section{A maximum likelihood version of the problem} \subsection{Solving the ML problem with optimx} The likelihood expressed in R based on the simple scaling of the parameters in the 3-parameter logistic (file \tt shobbs.R \rm ) is as follows : % chunk 12 <>= # shobbslikn.R # 4 parameter negative log likelihood for weeds problem in # direct (non-log) scale, including sigma2 # We still use simple scalings for the logistic parameters shobbsn.lik<-function(xaug){ # likelihood function including sigma xx<-xaug[1:3] sigma2<-xaug[4] res<-shobbs.res(xx) nll<-0.5*(length(res)*log(2*pi*sigma2)+sum(res*res)/sigma2) } shobbsn.lg<-function(xaug){ # gradient function including sigma xx<-xaug[1:3] sigma2<-xaug[4] res3<-shobbs.res(xx) n<-length(res3) f3<-crossprod(res3) # cat("in shobbs.lg: f3=",f3," n=",n," g3:\n") g3<-0.5*shobbs.g(xx)/sigma2 # print(g3) gg<-c(g3,0.5*(n - f3/sigma2)/sigma2) } shobbs.f<-function(x){ # # Scaled Hobbs weeds problem -- function if (abs(12*x[3]*0.1) > 50) { # check computability fbad<-.Machine$double.xmax return(fbad) } res<-shobbs.res(x) f<-sum(res*res) } shobbs.res<-function(x){ # scaled Hobbs weeds problem -- residual # This variant uses looping if (abs(12*x[3]*0.1) > 50) { # check computability rbad<-rep(.Machine$double.xmax, length(x)) return(rbad) } if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3") y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948, 75.995, 91.972) t<-1:12 res<-100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*t)) - y } shobbs.jac<-function(x) { # scaled Hobbs weeds problem -- Jacobian jj<-matrix(0.0, 12, 3) t<-1:12 yy<-exp(-0.1*x[3]*t) zz<-100.0/(1+10.*x[2]*yy) jj[t,1] <- zz jj[t,2] <- -0.1*x[1]*zz*zz*yy jj[t,3] <- 0.01*x[1]*zz*zz*yy*x[2]*t return(jj) } shobbs.g <- function(x) { # scaled Hobbs weeds problem -- gradient shj<-shobbs.jac(x) shres<-shobbs.res(x) shg<-as.vector(2.0* (shres %*% shj)) return(shg) } # NOTE: we also put in a form where sigma appears in log form shobbs.lik<-function(xaug){ # likelihood function including sigma xx<-xaug[1:3] logSigma<-xaug[4] sigma2=exp(2.0*logSigma); res<-shobbs.res(xx) nll<-0.5*(length(res)*log(2*pi*sigma2)+sum(res*res)/sigma2) } shobbs.lg<-function(xaug){ # gradient function including sigma xx<-xaug[1:3] logSigma<-xaug[4] sigma2=exp(2.0*logSigma); res3<-shobbs.res(xx) n<-length(res3) f3<-crossprod(res3) # cat("in shobbs.lg: f3=",f3," n=",n," g3:\n") g3<-0.5*shobbs.g(xx)/sigma2 # print(g3) gg<-c(g3,(n - f3/sigma2)) } @ As a practical matter, we had already obtained the solution in ADMB, and tested the R code using % chunk 13 <>= # test shobbsn.lik xtex<-c(1.9619, 4.9092, 3.1357) xaug<- c(xtex,0.21561) print(xaug) negloglik<-shobbsn.lik(xaug) cat("negloglik=",negloglik,"\n") @ We believe it is important to do such calculations, as silly errors are one of the main sources of frustration and wasted time in nonlinear modeling. We can test the effectiveness of different methods of optimization in R using the all.methods flag in optimx. Hence, \newpage % chunk 14 <>= test<-try(ansR4<-optimx(c(2,5,3,1),shobbsn.lik, shobbsn.lg, control=list(all.methods=TRUE))) if (class(test) != "try-error") { print(ansR4) } else { cat("ML attempt with optimx failed for some method\n") } @ \newpage A majority of the methods have achieved the likelihood alread found by ADMB, and most of these also found solutions satisfying both Karush-Kuhn-Tucker conditions. A minor concern is the lack of solution for two methods and the presence of several solutions where only the first KKT condition has been met. Note that many workers find it onerous to provide gradients. (It usually is!!). ADMB is explicitly intended to provide them. If we rerun the above calculations with numerical derivatives, we find (note that Nelder-Mead and newuoa and bobyqa are derivative free anyway): \newpage % chunk 15 <>= test<-try(ansR4n<-optimx(c(2,5,3,1),shobbsn.lik, control=list(all.methods=TRUE))) if (class(test) != "try-error") { print(ansR4n) } else { cat("ML attempt with optimx on scaled likelihood (no gradients) failed for some method\n") } @ \newpage We can try to work with parameters in log form. That is, $log(b_1)$, $log(b_2)$, $log(b_3)$ and $log(\sigma)$. This is a reparametrization, so we must correct our codes. Reparametrization, as mentioned below, is one way to improve the chances of successful application of optimization. % chunk 16 <>= # shobbslik.R # 4 parameter negative log likelihood for weeds problem in # log scale for all parameters # and we use shobb.res form using simple scaling parameters # need source("shobbs.R") require("optimx") shobbs.lik<-function(xaug){ # likelihood function including sigma xx<-exp(xaug[1:3]) logSigma<-xaug[4] sigma2=exp(2.0*logSigma); res<-shobbs.res(xx) nll<-0.5*(length(res)*log(2*pi*sigma2)+sum(res*res)/sigma2) } shobbs.lg<-function(xaug){ # gradient function including sigma xx<-exp(xaug[1:3]) logSigma<-xaug[4] sigma2=exp(2.0*logSigma); res3<-shobbs.res(xx) n<-length(res3) f3<-crossprod(res3) # cat("in shobbs.lg: f3=",f3," n=",n," g3:\n") g3<-0.5*xx*shobbs.g(xx)/sigma2 # print(g3) gg<-c(g3,(n - f3/sigma2)) } cat("test derivative code\n") rm(list=ls()) source("shobbs.R") source("shobbslik.R") xxa<-c(2,5,3,3) f<-shobbs.lik(xxa) f ga<-shobbs.lg(xxa) ga require(numDeriv) gn<-grad(shobbs.lik, xxa) gn cat("\n\n") cat("Using analytic derivatives\n") test<-try(ansR5<-optimx(log(c(2,5,3,1)),shobbs.lik, shobbs.lg, control=list(all.methods=TRUE))) if (class(test) != "try-error") { print(ansR5) } else { cat("ML attempt with optimx on scaled likelihood and analytic gradients failed for some method\n") } cat("Using numerical derivative approximations\n") test<-try(ansR5n<-optimx(log(c(2,5,3,1)),shobbs.lik, control=list(all.methods=TRUE))) if (class(test) != "try-error") { print(ansR5n) } else { cat("ML attempt with optimx on scaled likelihood and numerical gradients failed for some method\n") } @ \section{An alternate parametrization choices} How we set up our nonlinear models has, unfortunately, a potentially large impact on our chances of getting a solution efficiently or at all. The initial problem here has been specified so that the solution is sought in terms of $b_1$, $b_2$, and $b_3$, and optionally sigma2. However, one of us (AN) automatically set up ADMB to use $log(b_1)$, $log(b_2)$ and $log(b_3)$ as the parameters, even though the naturally scaled parameters were reported. There is also a parametrization suggested by Doug Bates, that predicts the state of weeds at time t as $ yy ~ c_1/(1+exp((c_2 - tt)/c_3)) $ The first parameter, $c_1$, matches $b_1$ as the upper asymptote of the function, we now have $c_2$ as the time point where the function reaches its mid-height. $c_3$ controls the rate at which the function progresses to the asymptote. These parameters are said to be much easier to estimate with nonlinear least squares or optimization than the original set. However, it is not the experience of the authors that this is obviously true. ?? code \newpage \section{Parametrization and model choices} We have for this problem a quite large set of choices of ways to model the data using equivalent but different ways to introduce the modelling parameters: \begin{itemize} \item{the original 3-parameter formulation with no scaling;} \item{a 3-parameter standard model using simple powers of 10 to scale the parameters so the ones we estimate have approximately the same magnitudes;} \item{the two choices above, but setting the parameters estimated to the the log() of the ones we want. This means that $b_j = exp(par_j)$ and we work with the $par_j$.} \item{the Bates formulation of the previous section. We can also work with logs of the parameters.} \item{all of the above repeated with the additional estimation of the variance. We may wish to estimate the log of the standard deviation.} \end{itemize} This represents a large number of possibilities for our approach to {\bf modeling}. In addition, we have many choices of how to effect the {\bf estimation} of these models. \newpage \section{Discussion} The problem presented here, when submitted in the original scaling and parametrization, gives most relevant software packages difficulty. It seems to be more or less generally useful to have our objective functions specified in terms of the logs of (positive) parameters rather than the naturally scaled parameters. Indeed, some automation of the setup of objective functions to do this would be an interesting possibility. More general reparametrization, while extremely helpful, requires insight into problems that users cannot be expected to have. \newpage \section{References} ?? ordering John C Nash and Ravi Varadhan, (2011) Unifying Minimization Algorithms to Aid Software System Users: optimx() for R, in revision for the Journal of Statistical Software. John C Nash and Ravi Varadhan (2010) optimx: A replacement and extension of the optim() function, R package version 0.88, http://CRAN.R-project.org/package=optimx ADMB Project. 2009 AD Model Builder: automatic differentiation model builder. Developed by David Fournier and freely available from admb-project.org. Ben Bolker and Hans Skaug (2011) R2admb: ADMB to R interface functions, R package version 0.7/r34, http://R-Forge.R-project.org/projects/r2admb/ ?? more JAGS refs Martyn Plummer (2010) rjags: Bayesian graphical models using MCMC, R package version 2.2.0-1, http://mcmc-jags.sourceforge.net Lunn, D., Spiegelhalter, D., Thomas, A., Best, N. (2009). The BUGS project: Evolution, critique, and future directions, \textit {Statistics in Medicine}, \textbf{28}, 3049-3067. R Development Core Team (2010) R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org ??others \end{document}}