Computer Vision

Camera Parametrization (OpenCV)

Cameras are mostly commonly denoted by their intrinsics \(K\) and extrinsics \(R, t\). In the OpenCV convention, \(K\) is in pixel units, and \(R,t\) denote world-to-camera transformation.

Camera Parametrization (Plücker)

Another commonly used parametrization is called Plücker rays, where the pixel is associated with the ray shooting from the camera center \(p\) to the observed point. The ray at \((u,v)\) is parametrized by \(d_{u,v}\in\mathbb{R}^3\) and \(m_{u,v}\in\mathbb{R}^3\). Here, \(d_{u,v}\) has unit length and denotes the direction of the ray, and \(m_{u,v}=p\times d_{u,v}\) is the momentum.

Note

One can show that \(p\) can be replaced by any point on the ray and \(m_{u,v}\) stays in variant. Moreover, \((d_{u,v}, m_{u,v})\) uniquely determines a ray with direction.

For each pixel \((u,v)\) of a camera with parameters \(K,R,t\), we can compute its Plücker ray parametrization as

\[\begin{split}p & = -R^Tt\\ \tilde d_{u,v} & = R^TK^{-1}\left[\begin{matrix}u\\v\\1\end{matrix}\right]\\ d_{u,v} & = \frac{\tilde d_{u,v}}{\|\tilde d_{u,v}\|}\\ m_{u,v} & = p\times d_{u,v}\end{split}\]

On the other hand, to recover \(K,R,t\) from Plücker rays takes some more work. First, we solve the camera center as the intersection of all rays (in the least square sense):

\[p = \mathop{\mathrm{argmin}}_{p}\sum_{u,v}\|p\times d_{u,v}-m_{u,v}\|^2\]

After obtaining \(p\), we note that \(K, R\) satisfy (\(\sim\) denotes equal up to scaling)

\[\begin{split}Rd_{u,v}\sim K^{-1}\left[\begin{matrix}u\\v\\1\end{matrix}\right]\end{split}\]

Or equivalently:

\[\begin{split}\exists\alpha_{u,v}, (KR)d_{u,v}=\alpha_{u,v}\left[\begin{matrix}u\\v\\1\end{matrix}\right]\end{split}\]

Now let us define \(P=KR\) (the rotational of part of the overall projection matrix), \(h_{u,v}=[u,v,1]^T\) (the homogeneous pixel coordinates of \((u,v)\)). Then we have

\[\exists\alpha_{u,v}, Pd_{u,v}=\alpha_{u,v}h_{u,v}\]

Using DLT removes the unknown constant \(\alpha_{u,v}\)

\[0=h_{u,v}\times (Pd_{u,v})=[h_{u,v}]_\times Pd_{u,v}\]

This is a linear equation on \(P\), to solve it, we need to rewrite the linear operator \([h_{u,v}]_\times Pd_{u,v}\) as a matrix-vector product \(B(P)=B\ \mathrm{Flatten}(P)\). However, using scipy.sparse.linalg.svds and scipy.sparse.linalg.LinearOperator we can use an operator to represent \(B\) and \(B^T\), as follows:

\[\begin{split} B(P) & = [h_{u,v}]_\times Pd_{u,v} \in\mathbb{R}^3,&\ \textrm{for}\ P\in\mathbb{R}^{3\times3}\\ \iff B(P)_i & = \sum_{j,k}([h_{u,v}]_\times)_{ij}(d_{u,v})_{k}P_{jk}\\ \iff B^T(r)_{jk} & = \sum_{i}([h_{u,v}]_\times)_{ij}r_i\\ & = \sum_{i}([h_{u,v}]_\times)_{ij}r_i(d_{u,v})_{k}\\ & = ([h_{u,v}]_\times^Tr)_j(d_{u,v})_{k}\\ & = (-h_{u,v}\times r)_j(d_{u,v})_{k}\\ \iff B^T(r) & = (-h_{u,v}\times r)(d_{u,v})^T \in\mathbb{R}^{3\times3},&\ \textrm{for}\ r\in\mathbb{R}^3\end{split}\]

Finally, \(P\) is solved as the singular vector of \(B\) with the smallest singular value, with \(\det(P)>0\). And then we do an RQ decomposition (bad naming in this case, our \(R\) is actually the \(Q\), and our \(K\) is the \(R\)):

\[P=\tilde K\tilde R\]

Note

It is important that you choose \(P\) to have \(\det(P)>0\). Furthermore, using the arpack solver will ensure \(\det(\tilde R)>0\) because it uses 2 Householder reflections, in which case we also have \(\det(\tilde K)>0\).

Here \(\tilde K\) is upper triangular and \(\tilde R\) is orthogonal. Note that they are not the final intrinsics and extrinsics yet due to scale and orientation ambiguity. Note the following (\(\tilde K_{12}\) may be a non-zero but very small number, we ignore it here):

\[\begin{split}\tilde K\tilde R = \left[\begin{matrix}\tilde K_{11} & 0 & \tilde K_{13}\\0 & \tilde K_{22} & \tilde K_{23}\\0 & 0 & \tilde K_{33}\end{matrix}\right]\left[\begin{matrix}\tilde r_1^T\\\tilde r_2^T\\\tilde r_3^T\end{matrix}\right]=\left[\begin{matrix}\tilde K_{11}\tilde r_1^T+\tilde K_{13}\tilde r_3^T\\\tilde K_{22}\tilde r_2^T+\tilde K_{23}\tilde r_3^T\\\tilde K_{33}\tilde r_3^T\end{matrix}\right]\end{split}\]

So, reversing a sign for some column in \(\tilde K\) is equivalent to reversing a sign for some row in \(\tilde R\). We can now correct the orientation using the following procedure:

Check if \(\tilde K_{11}<0\). If so, reverse the signs of the first column of \(\tilde K\) and \(\tilde r_1^T\).
Check if \(\tilde K_{22}<0\). If so, reverse the signs of the second column of \(\tilde K\) and \(\tilde r_2^T\).
Check if \(\tilde K_{33}<0\). If so, reverse the signs of the third column of \(\tilde K\) and \(\tilde r_3^T\).

Assuming \(\tilde K\) and \(\tilde R\) now have rectified orientation, we finally set

\[K=\tilde K/\tilde K_{22},\quad R=\tilde R,\quad t=-Rp.\]