|
Computacion Inteligente Derivative-Based Optimization
|
tarix | 25.07.2018 | ölçüsü | 2,58 Mb. | | #58870 |
|
Derivative-Based Optimization
Contents Optimization problems Mathematical background Descent Methods The Method of Steepest Descent Conjugate Gradient
Objective function – mathematical function which is optimized by changing the values of the design variables. Objective function – mathematical function which is optimized by changing the values of the design variables. Design Variables – Those variables which we, as designers, can change. Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.
3 basic ingredients… 3 basic ingredients… - an objective function,
- a set of decision variables,
- a set of equality/inequality constraints.
Design Variables: decision and objective vector - Design Variables: decision and objective vector
- Constraints: equality and inequality
- Bounds: feasible ranges for variables
- Objective Function: maximization can be converted to minimization due to the duality principle
Identify the quantity or function, f, to be optimized. Identify the design variables: x1, x2, x3, …,xn. Identify the constraints if any exist a. Equalities b. Inequalities Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.
Objective functions may be unimodal or multimodal. Objective functions may be unimodal or multimodal. - Unimodal – only one optimum
- Multimodal – more than one optimum
Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design. The global optimum is the best of all local optimum designs.
Existence of global minimum Existence of global minimum If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S - A set S is closed if it contains all its boundary pts.
- A set S is bounded if it is contained in the interior of some circle
Derivative-based optimization (gradient based) Derivative-based optimization (gradient based) - Capable of determining “search directions” according to an objective function’s derivative information
- steepest descent method;
- Newton’s method; Newton-Raphson method;
- Conjugate gradient, etc.
Derivative-free optimization - random search method;
- genetic algorithm;
- simulated annealing; etc.
A square matrix M is positive definite if A square matrix M is positive definite if It is positive semidefinite if
A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0) A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0) - Proof (→): Let vi the eigenvector for the i-th eigenvalue λi
- Then,
- which implies λi > 0,
Theorem: If a matrix M = UTU then it is positive definite Theorem: If a matrix M = UTU then it is positive definite
f: Rn → R is a quadratic function if f: Rn → R is a quadratic function if
It is no necessary for Q be symmetric. It is no necessary for Q be symmetric. - Suposse matrix P non-symmetric
Suposse matrix P non-symmetric. Example - Suposse matrix P non-symmetric. Example
Given the quadratic function Given the quadratic function
Two other shapes can result from the quadratic form. Two other shapes can result from the quadratic form. - If Q is negative definite, then f is a parabolic “bowl” up side down.
- If Q is indefinite then f describes a saddle.
Quadratics are useful in the study of optimization. Quadratics are useful in the study of optimization. - Often, objective functions are “close to” quadratic near the solution.
- It is easier to analyze the behavior of algorithms when applied to quadratics.
- Analysis of algorithms for quadratics gives insight into their behavior in general.
The derivative of f: R → R is a function f ′: R → R given by The derivative of f: R → R is a function f ′: R → R given by if the limit exists.
Along the Axes…
In general direction…
Definition: A real-valued function f: Rn → R is said to be continuously differentiable if the partial derivatives Definition: A real-valued function f: Rn → R is said to be continuously differentiable if the partial derivatives
exist for each x in Rn and are continuous functions of x. In this case, we say f C1 (a smooth function C1)
Definition: The gradient of f: in R2 → R: Definition: The gradient of f: in R2 → R:
It is a function ∇f: R2 → R2 given by
Definition: The gradient of f: Rn → R is a function ∇f: Rn → Rn given by Definition: The gradient of f: Rn → R is a function ∇f: Rn → Rn given by
The gradient defines (hyper) plane approximating the function infinitesimally
By the chain rule
Proposition 1: Proposition 1: is maximal choosing
Proof:
Proof: Proof: - On the other hand for general v:
Proposition 2: let f: Rn → R be a smooth function C1 around p, Proposition 2: let f: Rn → R be a smooth function C1 around p, if f has local minimum (maximum) at p then,
Proof: intuitive
We found the best INFINITESIMAL DIRECTION at each point, We found the best INFINITESIMAL DIRECTION at each point, Looking for minimum: “blind man” procedure How can we derive the way to the minimum using this knowledge?
The gradient of f: Rn → Rm is a function Df: Rn → Rm×n given by The gradient of f: Rn → Rm is a function Df: Rn → Rm×n given by
If the derivative of ∇f exists, we say that f is twice differentiable. If the derivative of ∇f exists, we say that f is twice differentiable. - Write the second derivative as D2f (or F), and call it the Hessian of f.
The level set of a function f: Rn → R at level c is the set of points S = {x: f(x) = c}. The level set of a function f: Rn → R at level c is the set of points S = {x: f(x) = c}.
Fact: ∇f(x0) is orthogonal to the level set at x0 Fact: ∇f(x0) is orthogonal to the level set at x0
Proof of fact: Proof of fact: - Imagine a particle traveling along the level set.
- Let g(t) be the position of the particle at time t, with g(0) = x0.
- Note that f(g(t)) = constant for all t.
- Velocity vector g′(t) is tangent to the level set.
- Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule,
- Hence, ∇f(x0) and g′(0) are orthogonal.
Suppose f: R → R is in C1. Then, Suppose f: R → R is in C1. Then,
Suppose f: R → R is in C2. Then, Suppose f: R → R is in C2. Then,
Suppose f: Rn → R. Suppose f: Rn → R. - If f in C1, then
- If f in C2, then
We already know that ∇f(x0) is orthogonal to the level set at x0. We already know that ∇f(x0) is orthogonal to the level set at x0. Fact: ∇f points in the direction of increasing f.
Consider xα = x0 + α∇f(x0), α > 0. Consider xα = x0 + α∇f(x0), α > 0. Therefore, for sufficiently small , f(xα) > f(x0)
This theorem is the link from the previous gradient properties to the constructive algorithm. This theorem is the link from the previous gradient properties to the constructive algorithm. The problem:
The Theorem: The Theorem: - Suppose f: Rn → R C1 smooth, and exist continuous function: k: Rn → [0,1], and,
- And, the search vectors constructed by the model algorithm satisfy:
And Then - if is the sequence constructed by the algorithm model,
- then any accumulation point y of this sequence satisfy:
The theorem has very intuitive interpretation: The theorem has very intuitive interpretation: Always go in descent direction.
We now use what we have learned to implement the most basic minimization technique. We now use what we have learned to implement the most basic minimization technique. First we introduce the algorithm, which is a version of the model algorithm. The problem:
Theorem: Theorem: - If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:
- Proof: from Wolfe theorem
How long a step to take?
How long a step to take? How long a step to take? Therefore the method of steepest descent looks like this:
We from now on assume we want to minimize the quadratic function: We from now on assume we want to minimize the quadratic function: This is equivalent to solve linear problem:
La solucion es la interseccion de las lineas La solucion es la interseccion de las lineas
- Cada elipsoide tiene f(x) constante
What is the problem with steepest descent? What is the problem with steepest descent? - We can repeat the same directions over and over…
Wouldn’t it be better if, every time we took a step, we got it right the first time?
What is the problem with steepest descent? What is the problem with steepest descent? - We can repeat the same directions over and over…
Conjugate gradient requires n gradient evaluations and n line searches.
First, let’s define de error as First, let’s define de error as
Let’s pick a set of orthogonal search directions Let’s pick a set of orthogonal search directions
Using the coordinate axes as search directions… Using the coordinate axes as search directions…
We have
Given , how do we calculate ? Given , how do we calculate ?
Given , how do we calculate ? Given , how do we calculate ?
How do we find ? How do we find ? - Since search vectors form a basis
We want that after n step the error will be 0: We want that after n step the error will be 0:
So we look for such that So we look for such that - Simple calculation shows that if we take
Sources J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997. Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005 Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004 Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005
Sources Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000 Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996 Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994 Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004
Dostları ilə paylaş: |
|
|