Computer Vision
Camera Parametrization (OpenCV)
Cameras are mostly commonly denoted by their intrinsics \(K\) and extrinsics \(R, t\). In the OpenCV convention, \(K\) is in pixel units, and \(R,t\) denote world-to-camera transformation.
Camera Parametrization (Plücker)
Another commonly used parametrization is called Plücker rays, where the pixel is associated with the ray shooting from the camera center \(p\) to the observed point. The ray at \((u,v)\) is parametrized by \(d_{u,v}\in\mathbb{R}^3\) and \(m_{u,v}\in\mathbb{R}^3\). Here, \(d_{u,v}\) has unit length and denotes the direction of the ray, and \(m_{u,v}=p\times d_{u,v}\) is the momentum.
Note
One can show that \(p\) can be replaced by any point on the ray and \(m_{u,v}\) stays in variant. Moreover, \((d_{u,v}, m_{u,v})\) uniquely determines a ray with direction.
For each pixel \((u,v)\) of a camera with parameters \(K,R,t\), we can compute its Plücker ray parametrization as
On the other hand, to recover \(K,R,t\) from Plücker rays takes some more work. First, we solve the camera center as the intersection of all rays (in the least square sense):
After obtaining \(p\), we note that \(K, R\) satisfy (\(\sim\) denotes equal up to scaling)
Or equivalently:
Now let us define \(P=KR\) (the rotational of part of the overall projection matrix), \(h_{u,v}=[u,v,1]^T\) (the homogeneous pixel coordinates of \((u,v)\)). Then we have
Using DLT removes the unknown constant \(\alpha_{u,v}\)
This is a linear equation on \(P\), to solve it, we need to rewrite the linear operator \([h_{u,v}]_\times Pd_{u,v}\) as a matrix-vector product \(B(P)=B\ \mathrm{Flatten}(P)\). However, using scipy.sparse.linalg.svds and scipy.sparse.linalg.LinearOperator we can use an operator to represent \(B\) and \(B^T\), as follows:
Finally, \(P\) is solved as the singular vector of \(B\) with the smallest singular value, with \(\det(P)>0\). And then we do an RQ decomposition (bad naming in this case, our \(R\) is actually the \(Q\), and our \(K\) is the \(R\)):
Note
It is important that you choose \(P\) to have \(\det(P)>0\). Furthermore, using the arpack solver will ensure \(\det(\tilde R)>0\) because it uses 2 Householder reflections, in which case we also have \(\det(\tilde K)>0\).
Here \(\tilde K\) is upper triangular and \(\tilde R\) is orthogonal. Note that they are not the final intrinsics and extrinsics yet due to scale and orientation ambiguity. Note the following (\(\tilde K_{12}\) may be a non-zero but very small number, we ignore it here):
So, reversing a sign for some column in \(\tilde K\) is equivalent to reversing a sign for some row in \(\tilde R\). We can now correct the orientation using the following procedure:
Check if \(\tilde K_{11}<0\). If so, reverse the signs of the first column of \(\tilde K\) and \(\tilde r_1^T\).
Check if \(\tilde K_{22}<0\). If so, reverse the signs of the second column of \(\tilde K\) and \(\tilde r_2^T\).
Check if \(\tilde K_{33}<0\). If so, reverse the signs of the third column of \(\tilde K\) and \(\tilde r_3^T\).
Assuming \(\tilde K\) and \(\tilde R\) now have rectified orientation, we finally set