General Unconstrained Minimization

Modeling

FirstOrderFunction

class FirstOrderFunction

Instances of FirstOrderFunction implement the evaluation of a function and its gradient.

class FirstOrderFunction {
  public:
   virtual ~FirstOrderFunction() {}
   virtual bool Evaluate(const double* const parameters,
                         double* cost,
                         double* gradient) const = 0;
   virtual int NumParameters() const = 0;
};
bool FirstOrderFunction::Evaluate(const double *const parameters, double *cost, double *gradient) const

Evaluate the cost/value of the function. If gradient is not NULL then evaluate the gradient too. If evaluation is successful return, true else return false.

cost guaranteed to be never NULL, gradient can be NULL.

int FirstOrderFunction::NumParameters() const

Number of parameters in the domain of the function.

GradientProblem

class GradientProblem
class GradientProblem {
 public:
  explicit GradientProblem(FirstOrderFunction* function);
  GradientProblem(FirstOrderFunction* function,
                  LocalParameterization* parameterization);
  int NumParameters() const;
  int NumLocalParameters() const;
  bool Evaluate(const double* parameters, double* cost, double* gradient) const;
  bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
};

Instances of GradientProblem represent general non-linear optimization problems that must be solved using just the value of the objective function and its gradient. Unlike the Problem class, which can only be used to model non-linear least squares problems, instances of GradientProblem not restricted in the form of the objective function.

Structurally GradientProblem is a composition of a FirstOrderFunction and optionally a LocalParameterization.

The FirstOrderFunction is responsible for evaluating the cost and gradient of the objective function.

The LocalParameterization is responsible for going back and forth between the ambient space and the local tangent space. When a LocalParameterization is not provided, then the tangent space is assumed to coincide with the ambient Euclidean space that the gradient vector lives in.

The constructor takes ownership of the FirstOrderFunction and LocalParamterization objects passed to it.

void Solve(const GradientProblemSolver::Options &options, const GradientProblem &problem, double *parameters, GradientProblemSolver::Summary *summary)

Solve the given GradientProblem using the values in parameters as the initial guess of the solution.

Solving

GradientProblemSolver::Options

class GradientProblemSolver::Options

GradientProblemSolver::Options controls the overall behavior of the solver. We list the various settings and their default values below.

bool GradientProblemSolver::Options::IsValid(string *error) const

Validate the values in the options struct and returns true on success. If there is a problem, the method returns false with error containing a textual description of the cause.

LineSearchDirectionType GradientProblemSolver::Options::line_search_direction_type

Default: LBFGS

Choices are STEEPEST_DESCENT, NONLINEAR_CONJUGATE_GRADIENT, BFGS and LBFGS.

LineSearchType GradientProblemSolver::Options::line_search_type

Default: WOLFE

Choices are ARMIJO and WOLFE (strong Wolfe conditions). Note that in order for the assumptions underlying the BFGS and LBFGS line search direction algorithms to be guaranteed to be satisifed, the WOLFE line search should be used.

NonlinearConjugateGradientType GradientProblemSolver::Options::nonlinear_conjugate_gradient_type

Default: FLETCHER_REEVES

Choices are FLETCHER_REEVES, POLAK_RIBIERE and HESTENES_STIEFEL.

int GradientProblemSolver::Options::max_lbfs_rank

Default: 20

The L-BFGS hessian approximation is a low rank approximation to the inverse of the Hessian matrix. The rank of the approximation determines (linearly) the space and time complexity of using the approximation. Higher the rank, the better is the quality of the approximation. The increase in quality is however is bounded for a number of reasons.

  1. The method only uses secant information and not actual derivatives.
  2. The Hessian approximation is constrained to be positive definite.

So increasing this rank to a large number will cost time and space complexity without the corresponding increase in solution quality. There are no hard and fast rules for choosing the maximum rank. The best choice usually requires some problem specific experimentation.

bool GradientProblemSolver::Options::use_approximate_eigenvalue_bfgs_scaling

Default: false

As part of the BFGS update step / LBFGS right-multiply step, the initial inverse Hessian approximation is taken to be the Identity. However, [Oren] showed that using instead \(I * \gamma\), where \(\gamma\) is a scalar chosen to approximate an eigenvalue of the true inverse Hessian can result in improved convergence in a wide variety of cases. Setting use_approximate_eigenvalue_bfgs_scaling to true enables this scaling in BFGS (before first iteration) and LBFGS (at each iteration).

Precisely, approximate eigenvalue scaling equates to

\[\gamma = \frac{y_k' s_k}{y_k' y_k}\]

With:

\[y_k = \nabla f_{k+1} - \nabla f_k\]
\[s_k = x_{k+1} - x_k\]

Where \(f()\) is the line search objective and \(x\) the vector of parameter values [NocedalWright].

It is important to note that approximate eigenvalue scaling does not always improve convergence, and that it can in fact significantly degrade performance for certain classes of problem, which is why it is disabled by default. In particular it can degrade performance when the sensitivity of the problem to different parameters varies significantly, as in this case a single scalar factor fails to capture this variation and detrimentally downscales parts of the Jacobian approximation which correspond to low-sensitivity parameters. It can also reduce the robustness of the solution to errors in the Jacobians.

LineSearchIterpolationType GradientProblemSolver::Options::line_search_interpolation_type

Default: CUBIC

Degree of the polynomial used to approximate the objective function. Valid values are BISECTION, QUADRATIC and CUBIC.

double GradientProblemSolver::Options::min_line_search_step_size

The line search terminates if:

\[\|\Delta x_k\|_\infty < \text{min_line_search_step_size}\]

where \(\|\cdot\|_\infty\) refers to the max norm, and \(\Delta x_k\) is the step change in the parameter values at the \(k\)-th iteration.

double GradientProblemSolver::Options::line_search_sufficient_function_decrease

Default: 1e-4

Solving the line search problem exactly is computationally prohibitive. Fortunately, line search based optimization algorithms can still guarantee convergence if instead of an exact solution, the line search algorithm returns a solution which decreases the value of the objective function sufficiently. More precisely, we are looking for a step size s.t.

\[f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]\]

This condition is known as the Armijo condition.

double GradientProblemSolver::Options::max_line_search_step_contraction

Default: 1e-3

In each iteration of the line search,

\[\text{new_step_size} \geq \text{max_line_search_step_contraction} * \text{step_size}\]

Note that by definition, for contraction:

\[0 < \text{max_step_contraction} < \text{min_step_contraction} < 1\]
double GradientProblemSolver::Options::min_line_search_step_contraction

Default: 0.6

In each iteration of the line search,

\[\text{new_step_size} \leq \text{min_line_search_step_contraction} * \text{step_size}\]

Note that by definition, for contraction:

\[0 < \text{max_step_contraction} < \text{min_step_contraction} < 1\]
int GradientProblemSolver::Options::max_num_line_search_step_size_iterations

Default: 20

Maximum number of trial step size iterations during each line search, if a step size satisfying the search conditions cannot be found within this number of trials, the line search will stop.

As this is an ‘artificial’ constraint (one imposed by the user, not the underlying math), if WOLFE line search is being used, and points satisfying the Armijo sufficient (function) decrease condition have been found during the current search (in \(\leq\) max_num_line_search_step_size_iterations). Then, the step size with the lowest function value which satisfies the Armijo condition will be returned as the new valid step, even though it does not satisfy the strong Wolfe conditions. This behaviour protects against early termination of the optimizer at a sub-optimal point.

int GradientProblemSolver::Options::max_num_line_search_direction_restarts

Default: 5

Maximum number of restarts of the line search direction algorithm before terminating the optimization. Restarts of the line search direction algorithm occur when the current algorithm fails to produce a new descent direction. This typically indicates a numerical failure, or a breakdown in the validity of the approximations used.

double GradientProblemSolver::Options::line_search_sufficient_curvature_decrease

Default: 0.9

The strong Wolfe conditions consist of the Armijo sufficient decrease condition, and an additional requirement that the step size be chosen s.t. the magnitude (‘strong’ Wolfe conditions) of the gradient along the search direction decreases sufficiently. Precisely, this second condition is that we seek a step size s.t.

\[\|f'(\text{step_size})\| \leq \text{sufficient_curvature_decrease} * \|f'(0)\|\]

Where \(f()\) is the line search objective and \(f'()\) is the derivative of \(f\) with respect to the step size: \(\frac{d f}{d~\text{step size}}\).

double GradientProblemSolver::Options::max_line_search_step_expansion

Default: 10.0

During the bracketing phase of a Wolfe line search, the step size is increased until either a point satisfying the Wolfe conditions is found, or an upper bound for a bracket containing a point satisfying the conditions is found. Precisely, at each iteration of the expansion:

\[\text{new_step_size} \leq \text{max_step_expansion} * \text{step_size}\]

By definition for expansion

\[\text{max_step_expansion} > 1.0\]
int GradientProblemSolver::Options::max_num_iterations

Default: 50

Maximum number of iterations for which the solver should run.

double GradientProblemSolver::Options::max_solver_time_in_seconds

Default: 1e6 Maximum amount of time for which the solver should run.

double GradientProblemSolver::Options::function_tolerance

Default: 1e-6

Solver terminates if

\[\frac{|\Delta \text{cost}|}{\text{cost}} \leq \text{function_tolerance}\]

where, \(\Delta \text{cost}\) is the change in objective function value (up or down) in the current iteration of the line search.

double GradientProblemSolver::Options::gradient_tolerance

Default: 1e-10

Solver terminates if

\[\|x - \Pi \boxplus(x, -g(x))\|_\infty \leq \text{gradient_tolerance}\]

where \(\|\cdot\|_\infty\) refers to the max norm, \(\Pi\) is projection onto the bounds constraints and \(\boxplus\) is Plus operation for the overall local parameterization associated with the parameter vector.

double GradientProblemSolver::Options::parameter_tolerance

Default: 1e-8

Solver terminates if

\[\|\Delta x\| \leq (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}\]

where \(\Delta x\) is the step computed by the linear solver in the current iteration of the line search.

LoggingType GradientProblemSolver::Options::logging_type

Default: PER_MINIMIZER_ITERATION

bool GradientProblemSolver::Options::minimizer_progress_to_stdout

Default: false

By default the Minimizer progress is logged to STDERR depending on the vlog level. If this flag is set to true, and GradientProblemSolver::Options::logging_type is not SILENT, the logging output is sent to STDOUT.

The progress display looks like

0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e:  0 it: 2.98e-02 tt: 8.50e-02
1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e:  1 it: 4.54e-02 tt: 1.31e-01
2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e:  1 it: 4.96e-02 tt: 1.81e-01

Here

  1. f is the value of the objective function.
  2. d is the change in the value of the objective function if the step computed in this iteration is accepted.
  3. g is the max norm of the gradient.
  4. h is the change in the parameter vector.
  5. s is the optimal step length computed by the line search.
  6. it is the time take by the current iteration.
  7. tt is the total time taken by the minimizer.
vector<IterationCallback> GradientProblemSolver::Options::callbacks

Callbacks that are executed at the end of each iteration of the Minimizer. They are executed in the order that they are specified in this vector. See the documentation for IterationCallback for more details.

The solver does NOT take ownership of these pointers.

GradientProblemSolver::Summary

class GradientProblemSolver::Summary

Summary of the various stages of the solver after termination.

string GradientProblemSolver::Summary::BriefReport() const

A brief one line description of the state of the solver after termination.

string GradientProblemSolver::Summary::FullReport() const

A full multiline description of the state of the solver after termination.

bool GradientProblemSolver::Summary::IsSolutionUsable() const

Whether the solution returned by the optimization algorithm can be relied on to be numerically sane. This will be the case if GradientProblemSolver::Summary:termination_type is set to CONVERGENCE, USER_SUCCESS or NO_CONVERGENCE, i.e., either the solver converged by meeting one of the convergence tolerances or because the user indicated that it had converged or it ran to the maximum number of iterations or time.

TerminationType GradientProblemSolver::Summary::termination_type

The cause of the minimizer terminating.

string GradientProblemSolver::Summary::message

Reason why the solver terminated.

double GradientProblemSolver::Summary::initial_cost

Cost of the problem (value of the objective function) before the optimization.

double GradientProblemSolver::Summary::final_cost

Cost of the problem (value of the objective function) after the optimization.

vector<IterationSummary> GradientProblemSolver::Summary::iterations

IterationSummary for each minimizer iteration in order.

double GradientProblemSolver::Summary::total_time_in_seconds

Time (in seconds) spent in the solver.

double GradientProblemSolver::Summary::cost_evaluation_time_in_seconds

Time (in seconds) spent evaluating the cost vector.

double GradientProblemSolver::Summary::gradient_evaluation_time_in_seconds

Time (in seconds) spent evaluating the gradient vector.

int GradientProblemSolver::Summary::num_parameters

Number of parameters in the problem.

int GradientProblemSolver::Summary::num_local_parameters

Dimension of the tangent space of the problem. This is different from GradientProblemSolver::Summary::num_parameters if a LocalParameterization object is used.

LineSearchDirectionType GradientProblemSolver::Summary::line_search_direction_type

Type of line search direction used.

LineSearchType GradientProblemSolver::Summary::line_search_type

Type of the line search algorithm used.

LineSearchInterpolationType GradientProblemSolver::Summary::line_search_interpolation_type

When performing line search, the degree of the polynomial used to approximate the objective function.

NonlinearConjugateGradientType GradientProblemSolver::Summary::nonlinear_conjugate_gradient_type

If the line search direction is NONLINEAR_CONJUGATE_GRADIENT, then this indicates the particular variant of non-linear conjugate gradient used.

int GradientProblemSolver::Summary::max_lbfgs_rank

If the type of the line search direction is LBFGS, then this indicates the rank of the Hessian approximation.