The Jacobian vs. the Hessian vs. the Gradient

What are the differences between the Jacobian, the Hessian and the Gradient? All three have ties to multivariable calculus, let's dive in and take a look!

Keywords: multivariable calculus, artificial intelligence, optimization, physics

By Carmen Cincotti Β 

When studying multivariable calculus, we often come across the use of matrices to represent different concepts. We often come across the Jacobian, the Hessian and the gradient.

Jacobian vs Hessian

These concepts are close to each other, by virtue of being matrices having to do with derivatives of functions, However, each matrix has its own derivation, and signification.

What is the difference between the Jacobian, the Hessian and the gradient πŸ€”? We will discover it together in this article. During this mathematical journey, we will also study the concepts of the difference between vector-valued functions and scalar-valued functions.

The Gradient

We start with the gradient which we already talked about a few weeks ago. The gradient is a vector composed of the partial derivatives of a scalar-valued function, ff:

βˆ‡f(x,y,z)=[βˆ‚fβˆ‚xβˆ‚fβˆ‚yβˆ‚fβˆ‚z] \nabla{f}(x, y, z) = \begin{bmatrix} \frac{\partial f}{\partial x} \\[6pt] \frac{\partial f}{\partial y} \\[6pt] \frac{\partial f}{\partial z} \\ \end{bmatrix}

where βˆ‡\nabla is called nabla. It is sometimes called β€œdel”. It denotes the vector differential operator.

Recall that the gradient measures the direction and the fastest rate of increase of a function ff at a given point.

Let’s imagine that we are at the foot of a mountain, and we wanted to climb it as quickly as possible. We see that the right track is to climb it in the direction of the arrow:

Person climbing a mountain

This arrow represents the gradient because it is the fastest ascent up the mountain.

A scalar-valued function

A scalar-valued function is a function (multi-variable / of dimension nn) that returns a scalar value:

f:Rn→R f: \mathbb{R}^n \rightarrow \mathbb{R}

For example, if we evaluate f(x,y)=x+yf(x,y) = x + y at the point (2,1), we get the scalar, 3:

f(2,1)=2+1=3 f(2,1) = 2 + 1 = 3\\[6pt]

How to calculate the Gradient

Let’s take an example. I have a function defined as f(x,y)=5x2+3xy+3y3f(x,y) = 5x^2 + 3xy + 3y^3. First, we need to find the partial derivatives with respect to the variables xx and yy as follows:

βˆ‚fβˆ‚x=10x+3yβˆ‚fβˆ‚y=3x+9y2 \frac{\partial f}{\partial x} = 10x + 3y \\[6pt] \frac{\partial f}{\partial y} = 3x + 9y^2 \\[6pt]

This gives us a gradient:

βˆ‡f=[10x+3y3x+9y2] \nabla{f} = \begin{bmatrix} 10x + 3y \\[6pt] 3x + 9y^2 \\ \end{bmatrix}

The Hessian

If we follow the explanation of the gradient, the Hessian will also be easy to understand. It is the derivative of the gradient of a scalar-valued function f:Rn→Rf: \mathbb{R}^n \rightarrow \mathbb{R}. For instance :

Hf(x,y)=βˆ‡2f(x,y)=[fxxfxyfyxfyy] \mathbf{H}f(x, y) = \nabla{}^2f(x, y) = \begin{bmatrix} f_{xx} & f_{xy} \\[6pt] f_{yx} & f_{yy} \\[6pt] \end{bmatrix}

Using again the function f(x,y)=5x2+3xy+3y3f(x,y) = 5x^2 + 3xy + 3y^3, we will see a Hessian matrix as follows:

Hf(x,y)=βˆ‡2f(x,y)=[103318y] \mathbf{H}f(x, y) = \nabla{}^2f(x, y) = \begin{bmatrix} 10 & 3 \\[6pt] 3 & 18y \\[6pt] \end{bmatrix}

Some applications of the Hessian matrix are the following:

The Hessian determinant

The Hessian determinant plays a role in finding the local maxima/minima of a multivariable function. We will see what a determinant is in a later article. For now, I’ll just explain how to calculate it. I present to you once again the Hessian matrix that we have already calculated in the last part:

Hf(x,y)=βˆ‡2f(x,y)=[103318y] \mathbf{H}f(x, y) = \nabla{}^2f(x, y) = \begin{bmatrix} 10 & 3 \\[6pt] 3 & 18y \\[6pt] \end{bmatrix}

In order to find the determinant, we need to follow the framework below:

det⁑([abcd])=adβˆ’bc \det\begin{pmatrix}\begin{bmatrix} a & b \\[6pt] c & d \\[6pt] \end{bmatrix}\end{pmatrix} = ad-bc

Therefore,

det⁑([103318y])=180yβˆ’9 \det\begin{pmatrix}\begin{bmatrix} 10 & 3 \\[6pt] 3 & 18y \\[6pt] \end{bmatrix}\end{pmatrix} = 180y - 9

The Jacobian

The Jacobian is a matrix that holds all first-order partial derivatives of a vector-valued function:

f:Rn→Rn f: \mathbb{R}^n \rightarrow \mathbb{R}^n

The Jacobian form of a vector-valued function h(f(x,y),g(x,y))h(f(x,y), g(x, y)) is therefore the following:

Jh(f(x,y),g(x,y))=[fxfygxgy] \mathbf{J}h(f(x,y), g(x,y)) = \begin{bmatrix} f_x & f_y \\[6pt] g_x & g_y \\[6pt] \end{bmatrix}

For example - if we have a vector-valued function like:

f(x,y):[sin(x)+yx+cos(y)] f(x,y): \begin{bmatrix} sin(x) + y \\[6pt] x + cos(y) \end{bmatrix}

The Jacobian matrix will be :

J(f)=[cos(x)11βˆ’sin(y)] J(f) = \begin{bmatrix} cos(x) & 1 \\[6pt] 1 & -sin(y) \end{bmatrix}

By using the Jacobian, we can locally linearize a vector valued function at a specific point.

Linear functions are simple enough that we can understand them well (using linear algebra), and often understanding the local linear approximation of f(x,y)allowsustodrawconclusionsaboutf(x,y) allows us to draw conclusions about f $$ itself.

To learn more about the calculation, here is a video that helped me a lot:

The determinant of the Jacobian matrix

We can use Jacobian matrix of a vector valued transformation to find its determinant. To do this with a transformation of f:R3β†’R3f: \mathbb{R}^3 \rightarrow \mathbb{R}^3, we use the framework:

det⁑([a1a2a3b1b2b3c1c2c3])=a1det⁑([b2b3c2c3])βˆ’a2det⁑([b1b3c1c3])+a3det⁑([b1b2c1c2]) \det\begin{pmatrix}{\begin{bmatrix} a_1 & a_2 & a_3 \\[6pt] b_1 & b_2 & b_3 \\[6pt] c_1 & c_2 & c_3 \\[6pt] \end{bmatrix}}\end{pmatrix} = a_1 \det\begin{pmatrix}\begin{bmatrix} b_2 & b_3 \\[6pt] c_2 & c_3 \\[6pt] \end{bmatrix}\end{pmatrix} - a_2 \det\begin{pmatrix}\begin{bmatrix} b_1 & b_3 \\[6pt] c_1 & c_3 \\[6pt] \end{bmatrix}\end{pmatrix} + a_3 \det\begin{pmatrix}\begin{bmatrix} b_1 & b_2 \\[6pt] c_1 & c_2 \\[6pt] \end{bmatrix}\end{pmatrix}

Consider this example using this Jacobian matrix:

J(f)=[2xyx200βˆ’11101] J(f)=\begin{bmatrix} 2xy & x^2 & 0 \\[6pt] 0 & -1 & 1 \\[6pt] 1 & 0 & 1 \end{bmatrix}

We can calculate the determinant as follows:

det⁑([2xyx200βˆ’11101])=2xydet⁑([βˆ’1101])βˆ’x2det⁑([0111])+0det⁑([0βˆ’110]) \det\begin{pmatrix}\begin{bmatrix} 2xy & x^2 & 0 \\[6pt] 0 & -1 & 1 \\[6pt] 1 & 0 & 1 \end{bmatrix}\end{pmatrix} = 2xy \det\begin{pmatrix}\begin{bmatrix} -1 & 1 \\[6pt] 0 & 1 \\[6pt] \end{bmatrix}\end{pmatrix} - x^2 \det\begin{pmatrix}\begin{bmatrix} 0 & 1 \\[6pt] 1 & 1 \\[6pt] \end{bmatrix}\end{pmatrix} + 0 \det\begin{pmatrix}\begin{bmatrix} 0 & -1 \\[6pt] 1 & 0 \\[6pt] \end{bmatrix}\end{pmatrix}
=2xy(βˆ’1(1)βˆ’(1(0)))βˆ’x2(0(1)βˆ’1(1))+0(0(0)βˆ’(βˆ’1(1)))=2xy(βˆ’1)βˆ’x2(βˆ’1)+0=βˆ’2xy+x2 = 2xy(-1(1) - (1(0))) - x^2(0(1) - 1(1)) + 0(0(0) - (-1(1)))\\[6pt] = 2xy(-1) - x^2(-1) + 0 \\[6pt] = -2xy + x^2

What does the determinant mean? πŸ€” The function βˆ’2xy+x2-2xy+x^2 gives us the amplitude in which space contracts or expands during a transformation around a point (x,y,z)(x, y, z).

For example, if we evaluate it at the point (1,0,2)(1, 0, 2), we will see:

=βˆ’2(1)(0)+12=0+1=1 = -2(1)(0) + 1^2 \\[6pt] = 0 + 1 \\[6pt] = 1

Which tells us that the space does not change around this point! However, around the point (3,1,0)(3, 1, 0) , we see another story:

=βˆ’2(3)(1)+32=βˆ’6+9=3= -2(3)(1) + 3^2 \\[6pt] = -6 + 9 \\[6pt] = 3

I recommend this resource in order to visualize the transformations and the determinant for each one!

Resources


Comments for The Jacobian vs. the Hessian vs. the Gradient



Written by Carmen Cincotti, computer graphics enthusiast, language learner, and improv actor currently living in San Francisco, CA. Β Follow @CarmenCincotti

Contribute

Interested in contributing to Carmen's Graphics Blog? Click here for details!