3D Cameras | WebGPU

Let's take a closer look (ha!) into cameras in a 3D scene... how they work, how they're modeled, and some linear algebra.

Keywords: WebGPU, linear algebra, cameras, rendering pipeline, real-time rendering, transformations

By Carmen Cincotti  

This week I continued to learn more about the elements needed to create a 3D scene with WebGPU. I got to the point where I wanted to introduce the ability to move the camera. Indeed, during its implementation, I quickly realized that the theory of a 3D camera is not trivial. The theory is a gateway to the knowledge needed to understand how to create and multiply transformative matrices to stage our 3D objects.

⚠️ This article is a continuation of our discussion from last week. I recommend that you read it again to better understand the context of the content we are about to see!

The Camera

Composition, staging and camera movements are the foundations of a story.

A camera

The camera is the gateway to our 3D scene. Without it, our scenes would look dull and flat without the ability to distinguish between foreground and background elements, among a huge list of other things.

3D cameras are not real in the sense that they are created by calling a WebGPU function. To simulate them, we need to model them carefully using matrices to manipulate our 3D vertices. After applying a series of transform matrices to the vertices, we can view our world through the camera lens.

A quick note on some assumptions about our coordinate system…it’s a right-handed coordinate system shown in the image below:

That being said, in world space we assume that:

  • the X axis points to the right.
  • the Y-axis points up.
  • the Z axis points to the screen (us).

But, note the difference in assumption for camera space:

  • the X axis points to the right.
  • the Y-axis points up.
  • the Z axis points towards the scene (away from us) which is the opposite of world space! It is therefore negative with respect to the world coordinate system. We look down the negative z-axis.

So, to get started, let’s first study the eye of a camera and how to create the matrix needed to transform the position of our vertices from world space to camera space.

The eye and the view matrix

The eye is the origin of our camera. We view the scene from this point in 3D space. Our goal is to transform our scene from world space to camera space (view space), which means that we need to transform our coordinate system to be relative to the camera lens with a transformation matrix called the view matrix. A method to construct the view matrix will be through constructing a lookAt matrix, which encodes the position and the target of the camera through translation and rotation matrices.

Camera looking at an object

The lookAt matrix

We can calculate the direction and rotation of our camera and build a matrix to transform our coordinate system from world space to camera space. We need to find our camera vectors to build our rotation matrix.

Foward vector (z axis)

To calculate the directional forwardVector in which our forward axis of our camera is pointing at, we just need to subtract the positional vector of the eye of the camera from the position vector of our target. Suppose the camera points to the point (0, 0, 0).

forwardVector=normalize(cameraPositioncameraTarget) forwardVector = normalize(cameraPosition - cameraTarget)

⚠️ NB. Remember to normalize this resulting vector, because a direction vector is a unit vector.

⚠️ NB2. The direction we just calculated is the opposite of the ‘real’ direction because the z axis of our camera is opposite to that of the world!

Right vector (x axis)

Using a little trick, we can find the rightVector directional vector by calculating the cross product between the forwardVector and an arbitrary upward directed vector. An easy choice is the directional vector (0, 1, 0). The result will be a right-pointing perpendicular vector, which will be our rightVector.

Recall that the cross product can be used to find a vector that’s perpendicular to two vectors in 3D space:

The cross product explanation

We can leverage this fact to choose a tempUpVector, since it should theoretically be within the plane formed by the real upVector and the forwardVector:

The cross product explanation with tempUpVector

Therefore, our calculation becomes:

tempUpVector=normalize(vec3(0,1,0))rightVector=normalize(cross(tempUpVector,forwardVector)) tempUpVector = normalize(vec3(0, 1, 0)) \\ rightVector = normalize(cross(tempUpVector, forwardVector))

Up vector (the y axis)

Finally, we’ll use the trick we just used to calculate the directional upVector of our camera. This time we intersect our rightVector and our forwardVector.

upVector=normalize(cross(forwardVector,rightVector)) upVector = normalize(cross(forwardVector, rightVector))

The translation vector

The calculation of the camera translation vector is not well documented in the resources I have read. It is necessary to use the dot product between the calculated vectors and the position of the eye. Be careful because there are a ton of resources out there in the wild that are not correct!

tx=dot(positionOfCamera,rightVector)ty=dot(positionOfCamera,upVector)tz=dot(positionOfCamera,forwardVector) t_x = dot(positionOfCamera, rightVector) \\ t_y = dot(positionOfCamera, upVector) \\ t_z = dot(positionOfCamera, forwardVector)

It is calculated this way because of the order of matrix operations. We are rotating first, and then translating. Therefore, it is necessary to calculate the translation of the camera in relation to its new orientation.

For example, let’s assume our camera starts at (0, 4, 4) in world space. It does not suffice to set the translation vector to (0, 4, 4) like this:

Wrongly translating the camera in a lookAt matrix

Instead, we need to apply the rotation first, then find the translation vector, which after using our dot product calculations from above, we find that that corresponds to a vector (0, 0, 5.65):

Rightly translating the camera in a lookAt matrix

The final lookAt matrix

Our final matrix is as follows:

lookAt=[RxRyRztxUxUyUztyFxFyFztz0001] lookAt = \begin{bmatrix} R_x & R_y & R_z & t_x \\ U_x & U_y & U_z & t_y \\ F_x & F_y & F_z & t_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

where R = rightVector, U = upVector and F = forwardVector.

However, in the code, the rotation and the translation are inverted since recall that a ViewMatrix in fact moves the world to make the camera as the origin… be careful here, as this really screwed me up at work one day… the inverse of rotation is the transpose, and finally the inverse of a translation is just its negative self… therefore our viewMatrix is just the inverse of the lookAt matrix:

viewMatrix=[RxUxFx0RyUyFy0RzUzFz0txtytz1] viewMatrix = \begin{bmatrix} R_x & U_x & F_x & 0 \\ R_y & U_y & F_y & 0 \\ R_z & U_z & F_z & 0 \\ -t_x & -t_y & -t_z & 1 \\ \end{bmatrix}

Implementation code

Using the LookAt matrix as the view matrix, we will transform all coordinates from world space to our camera space, where the camera is looking down the negative z-axis.

After calculating the view matrix, the next matrix to calculate is the projection matrix. The two types of cameras I would like to focus on are perspective cameras and orthographic cameras.

The perspective camera

A perspective camera makes it possible to distinguish the position of the elements in depth. Thanks to the projection, distant objects are smaller than near ones. Our eyes imitate this view. Imagine that we are looking at a forest, the closest trees seem bigger to us than the trees a few kilometers away.

Contrasting Tree Types Coexist in a Forest.jpg

By Wing-Chi Poon - self-made, at Gotier Trace Road, Bastrop State Park, Texas, USA. This area of Texas is known as the Lost Pines Forest., CC BY-SA 2.5, Link

Perspective camera model

Field of view angle in view frustum

Above is an illustration of the camera geometry which is called the view frustum. The vertices outside the frustum are “clipped” (more on that later). The perspective camera is made up of a few parts that need to be defined in order to create our projection matrix. Here are those parts:

  • Near plane - this is the distance to the near clipping plane through the negative z axis (defined in camera space)
  • Far plane - this is the distance to the far clipping plane through the negative z axis (defined in camera space)
  • Field of view (fov_y) - this is the angle between the top and bottom sides of the frustum.
  • Aspect ratio - this is the aspect ratio of our window.

I recommend that you check out this resource to play around with the different settings to see how view frustum works.

💡 Are the two clipping planes (‘near’ and ‘far’) arbitrarily defined?

The definition of these two planes are responsible for preventing (or causing) something called z-fighting, which is due to floating point comparison issues.

Let us take a contrived example to understand this phenomenon: I place

  • Object 1 in camera space @ (0, 0, 50)
  • Object 2 in camera space @(0, 0, 55)

…and define my near and far planes of the viewing frustum as 0.1 and 100.

Basically, once we transform coordinates into NDC (we’ll see what this means next week), our Z-buffer coordinates are between 0 and 1 (as defined in WebGPU, be sure to double check if this is the case if you are using a different API).

Therefore, my objects in NDC are now:

  • Object 1: (0, 0, 0.5)
  • Object 2: (0, 0, 0.55)

That works fine and no z-fighting will happen since the level of precision should produce no floating point comparison issues.

Now, lets set the near plane to 1e-5 and the far plane to 1e6. Now in NDC, things are didfferent:

  • Object 1 will be located at (0, 0, 0.00004)
  • Object 2 will be located at (0, 0, 0.000045)

As you can see, this is a problem as we’re now starting to demand a lot of our computer’s flimsy ability to compare super precise floating points (spoiler: it’s not that great)…

This will certainly cause problems, thus it is important to use reasonable values.

The orthographic camera

With the orthographic camera, all equal objects appear at the same scale. Parallel lines stay parallel, which is different from our perspective camera. This facilitates the evaluation of relative sizes and the alignment of models.

The difference between Perspective and Orthographic projection.

Just like the perspective camera model, we define parameters in order to configure it. We are responsible for defining the near plane and the far plane, as well as the other planes that form the orthogonal projection cube.

Because I’ll be focusing more on perspective cameras in the near future, I won’t be going too into orthographic cameras for now…

Next time

In the following article, we will see the transformations of our coordinates in more detail.


Comments for 3D Cameras | WebGPU

Written by Carmen Cincotti, computer graphics enthusiast, language learner, and improv actor currently living in San Francisco, CA.  Follow @CarmenCincotti


Interested in contributing to Carmen's Graphics Blog? Click here for details!