Camera Calibration and 3D Reconstruction

Source: opencv

In camera calibration I presented how to call the camera_calibration ROS package to calibrate a pinhole camera (in my case). Now I want to understand the mathematics behind it. It looks a lot like the mathematics applied in camera backward projection.

For pinhole camera model, a scene is formed by projecting 3D points into the image plane using a perspective transformation:

s \cdot m^{'} = A [R ∣ t] M^{'}

s u v 1 = f_{x} 00 0 f_{y} 0 c_{x} c_{y} 1 r_{11} r_{21} r_{31} r_{12} r_{22} r_{32} r_{13} r_{23} r_{33} t_{1} t_{2} t_{3} X Y Z 1

where:

$(X, Y, Z)$ are the coordinates of a 3D point in the world coordinate space
$(u, v)$ are the coordinates of the projection point in pixels
$A$ is a camera matrix, or a matrix of intrinsic parameters
$(c_{x}, c_{y})$ is a principal point that is usually at the image center
$f_{x}, f_{y}$ are the focal lengths expressed in pixel units.

Tip

If an image from the camera is scaled by a factor, all of these params should be scaled by the same factor! However, the intrinsic parameters do not depend on the scene viewed, so as long as the focal length is fixed, they can be re-used.

Similar to camera backward projection, the joint rotation of R and t describe the camera motion around a static scene (as litter is usually stationary in the ocean, as in on the bottom) and it translates coordinates of a point $(X, Y, Z)$ to a coordinate system, fixed with respect to the camera. When $z \neq = 0$ :

x y z = R X Y Z + t

x^{'} = \frac{x}{z}

y^{'} = \frac{y}{z}

u = f_{x} \cdot x^{'} + c_{x}

v = f_{y} \cdot y^{'} + c_{y}

As lenses usually present distortion, the above model is extended as:

x^{''} = x^{'} \frac{1 + k _{1} r ^{2} + k _{2} r ^{4} + k _{3} r ^{6}}{1 + k _{4} r ^{2} + k _{5} r ^{4} + k _{6} r ^{6}} + 2 p_{1} x^{'} y^{'} + p_{2} (r^{2} + 2 x^{'2})

y^{''} = y^{'} \frac{1 + k _{1} r ^{2} + k _{2} r ^{4} + k _{3} r ^{6}}{1 + k _{4} r ^{2} + k _{5} r ^{4} + k _{6} r ^{6}} + p_{1} (r^{2} + 2 y^{'2}) + 2 p_{2} x^{'} y^{'}

where,

r^{2} = x^{'2} + y^{'2}

u = f_{x} \cdot x^{''} + c_{x}

v = f_{y} \cdot y^{''} + c_{y}

The distortion vector contains $(k_{1}, k_{2}, p_{1}, p_{2})$ , where $k_{1}$ and $k_{2}$ are radial distortion coefficients and $p_{1}$ and $p_{2}$ are tangential distortion coefficients.

Distortion coefficients are also intrinsic parameters

If a camera has been calibrated for images of $320 \times 240$ , the same distortion coefficients can be used for $640 \times 480$ images from the same camera while $f_{x}, f_{y}, c_{x}$ and $c_{y}$ need to be scaled appropriately.

I did something similar to this in my visual-odom implementation without the need to explicit the camera intrinsic parameters and I would deduce matrix $A = [R ∣ t]$ based on affine transformations.

🚀 Costin Chitic

Recent Notes

SeaClear

Visual Odometry

Simple Online and Real-Time Tracking

Computer Vision for detecting a Metallic Grid and a set reference

Inertial Odometry

Camera Calibration and 3D Reconstruction

Graph View

Backlinks