Gradients, Optimization, and the Lagrange Multiplier

What is the gradient of a multivariable function? How can we maximize constrained functions? Let's learn together!

Keywords: multivariable calculus, optimization, linear algebra

By Carmen Cincotti Β 

While reading the article on position-based dynamics, I realized that there are quite a bit of mathematical concepts that I don’t understand too well. πŸ˜•

For example, in the context of constraints, there are concepts such as the gradient of a function. Moreover, when we talk about the optimization of a function that is constrained, we often talk about the Lagrange multiplier and also the Lagrangian.

In this article, my goal is to demystify these concepts.

The gradient

Before discovering the why of the gradient, I suggest that we first look at how to solve the gradient of a multivariable function such as f(x,y,z)f(x,y,z). The gradient is a vector consisting of the partial derivatives of the defined function, ff :

βˆ‡f=[βˆ‚fβˆ‚xβˆ‚fβˆ‚yβˆ‚fβˆ‚z] \nabla{f} = \begin{bmatrix} \frac{\partial f}{\partial x} \\[6pt] \frac{\partial f}{\partial y} \\[6pt] \frac{\partial f}{\partial z} \\ \end{bmatrix}

where βˆ‡\nabla is called nabla. It is sometimes called β€œdel”. It denotes the vector differential operator.

Let’s take an example. I have a function defined as f(x,y)=5x+3xyf(x,y) = 5x + 3xy. First, we need to find the partial derivatives with respect to the variables xx and yy as follows:

βˆ‚fβˆ‚x=5+3yβˆ‚fβˆ‚y=3x \frac{\partial f}{\partial x} = 5 + 3y \\[6pt] \frac{\partial f}{\partial y} = 3x \\[6pt]

This gives us a gradient:

βˆ‡f=[5+3y3x] \nabla{f} = \begin{bmatrix} 5 + 3y \\[6pt] 3x \\ \end{bmatrix}

That was quite easy to solve! Nevertheless, the importance of the gradient remains a mystery that we’ll uncover in the next section. πŸ•΅οΈβ€β™‚οΈ

πŸ’‘ If you are bad at partial derivatives, I suggest you learn them a little at this resource

The meaning of the gradient

Let’s imagine that we are at the foot of a mountain, and we wanted to climb it as quickly as possible. We see that the right track is to climb it in the direction of the arrow:

Person climbing a mountain

This arrow represents the gradient because it is the fastest ascent up the mountain.

In more precise terms, the gradient vector is the direction and fastest rate of increase of a function ff at any given point. This concept is the most important point to remember about the gradient. πŸ’‘

Imagine we create a heat map for a function ff within an arbitrary space. Then, we find the gradient at different inputs to ff such as (x0,y0,...),(x1,y1,...),...(xn,yn,...)(x_0, y_0, ...), (x_1, y_1, ...), ... ( x_n, y_n, ...) . If we represent the gradient at each of these points as arrows, we might see the following image:

Gradient of a Function.tif
By Vivekj78 - Own work, CC BY-SA 3.0, Link

We can see that the gradient βˆ‡f(x,y,...)\nabla{f}(x, y,...) tells us the direction to maximize the value of ff at each given input.

Also, the magnitude, βˆ£βˆ£βˆ‡f(x0,y0,...)∣∣|| \nabla{f}(x_0, y_0,...) ||, gives us the steepness of this ascent.

Gradient and contour lines

Let’s look at this graph for the equation f(x,y)=x2+y2f(x,y) = x^2 + y ^2:

A 3D plot of x squared + y squared

If we squash it from a three-dimensional to a two-dimensional space, we will see these circles which represent the input space, x et y, while the third z value (which represents the output of f(x,y)f(x,y)) is denoted by the texts 1, 2, 3, 4, 7:

Contour map of x squared + y squared

An important feature of contour plots is that they can help us find the peaks and valleys of a function.

Contour map example

When the lines (at the bottom of the graph above) are far apart, the slope is not steep. Conversely, if the lines are close to each other, the slope is steep.

This is an important point when discussing the relationship between the gradient and a contour plot.

Contour maps

As you can see, each gradient vector is perpendicular to the contour line it touches. This behavior is observed because the shortest track between lines is a line perpendicular to each other.

This makes sense, because we’ve already defined the gradient as the direction and fastest rate of increase of a function ff.

Application of the gradient - constrained optimization problem

The objective of a constrained optimization problem is to maximize or minimize a multivariable function f(x,y,z,...)f(x,y,z,...).

A constrained function takes the form g(x,y,z,...)=cg(x,y,z,...) = c.

If I take a function f(x,y)f(x,y) and a constrained function g(x,y)g(x,y) such as :

f(x,y)=3x+2yg(x,y)=0.5x2+0.5y2=cc=2f(x,y) = 3x + 2y \\[6pt] g(x,y) = 0.5x^2 + 0.5y^2 = c \\[6pt] c = 2

Our goal is to find the point where f(x,y)f(x,y) and g(x,y)g(x,y) are tangential when c=g(x,y)=2c = g(x,y) = 2:


To do this, we can find the maximum values of xx and yy using our knowledge of the gradient.

We already know the fact that the gradient points in the direction that is perpendicular to the contour curves.

Graph with gradients

However, the magnitudes of βˆ‡f(x,y)\nabla f(x,y) and βˆ‡g(x,y)\nabla g(x,y) are not the same. We must therefore correct this difference with a scalar, Ξ»\lambda which is called the Lagrange multiplier:

βˆ‡f(x,y)=Ξ»βˆ‡g(x,y) \nabla f(x,y) = \lambda \nabla g(x,y)

We already know that to find the values βˆ‡f(x,y)\nabla f(x,y) and βˆ‡g(x,y)\nabla g(x,y), we must take the partial derivative of each with respect to x and y:

βˆ‡f(x,y)=[βˆ‚fβˆ‚x=3βˆ‚fβˆ‚y=2] \nabla f(x,y) = \begin{bmatrix} \frac{\partial f}{\partial x} = 3 \\[6pt] \frac{\partial f}{\partial y} = 2 \\ \end{bmatrix}
βˆ‡g(x,y)=[βˆ‚fβˆ‚x=0.25xβˆ‚fβˆ‚y=0.25y] \nabla g(x,y) = \begin{bmatrix} \frac{\partial f}{\partial x} = 0.25x \\[6pt] \frac{\partial f}{\partial y} = 0.25y \\ \end{bmatrix}

We can solve the values Ξ»,x,y\lambda, x, y by solving the following system of equations:

[32]=Ξ»p[0.25x0.25y] \begin{bmatrix} 3 \\[6pt] 2 \\ \end{bmatrix} = \lambda_p \begin{bmatrix} 0.25x \\[6pt] 0.25y \\ \end{bmatrix} \\[6pt]

Which gives us the equations:

g(x,y)=0.5x2+0.5y2=c=23=Ξ»p0.25x2=Ξ»p0.25y...x=12/Ξ»py=8/Ξ»p0.5(12Ξ»p)2+0.5(8Ξ»p)2=236+16=52=Ξ»p2Β±52=Ξ»pg(x,y) = 0.5x^2 + 0.5y^2 = c = 2 \\[6pt] 3 = \lambda_p 0.25x \\[6pt] 2 = \lambda_p 0.25y \\ ... \\ x = 12 / \lambda_p \\[6pt] y = 8 / \lambda_p \\[6pt] 0.5(\frac{12}{\lambda_p})^2 + 0.5(\frac{8}{\lambda_p})^2 = 2 \\[6pt] 36 + 16 = 52 = \lambda_p^2 \\[6pt] \pm \sqrt{52} = \lambda_p

By solving for xx and yy:

x=Β±1.66y=Β±1.11∴(x,y)=(1.66,βˆ’1.11),(1.66,1.11),(βˆ’1.66,1.11),(βˆ’1.66,βˆ’1.11)x = \pm 1.66 \\[6pt] y = \pm 1.11 \\[6pt] \therefore (x,y) = (1.66, -1.11), (1.66, 1.11), (-1.66, 1.11), (-1.66, -1.11)

Suppose we want to maximize f(x,y)f(x,y). These points would be tested as inputs to the function. The correct answer would be (1.66,1.11)(1.66, 1.11)

Final result

The Lagrangian

We can reformulate these functions by writing them in the form:

L(x,y,Ξ»)=f(x,y)+Ξ»g(x,y)\mathcal{L(x, y, \lambda)} = f(x, y) + \lambda g(x,y)

This function is called the Lagrangian.

Why should I have to write yet another new form of the function? πŸ€”

In practice, a computer often solves these optimization problems. This shape encapsulates them better. Without a computer, it would probably be easier to separate the functions as we have done before.

For now, I’ll let you do your own research on this subject! Here is a starting point


Comments for Gradients, Optimization, and the Lagrange Multiplier

Written by Carmen Cincotti, computer graphics enthusiast, language learner, and improv actor currently living in San Francisco, CA. Β Follow @CarmenCincotti


Interested in contributing to Carmen's Graphics Blog? Click here for details!