Kinecting up the Past: Mapping Depth Data into Virtual Space

While mapping depth values from the Kinect sensor to a virtual space is straightforward, a perspective correction factor needs to be taken into account, which is discussed in this post. In the following, the official Windows Kinect SDK is used and all formula given relate to the specific values returned from the API (which can be different to those returned by unofficial SDKs). Depth data is delivered as scanlines from bottom to top.

To convert Kinect data into 3D space where one unit is equal to 1 metre:

scale=depth*PERSPECTIVE_CORRECTION
x=(i-(DEPTH_FRAME_WIDTH/2))*scale;
y=(j-(DEPTH_FRAME_HEIGHT/2))*scale;
z=-depth/1000;

Where:

depth is the millimetre depth value returned by the Kinect device within the depth map
PERSPECTIVE_CORRECTION is an empirically derived constant that converts from the camera’s perspective into an orthogonal view (essentially “undoing” the natural perspective view of the camera)
DEPTH_FRAME_WIDTH is the width dimension of the depth map (typically 320 or 640)
DEPTH_FRAME_HEIGHT is the height dimension of the depth map (typically 240 or 480)
i and j represent the i^th pixel from the left and j^th pixel from the bottom of the frame

Notes:

This formula translates the depth values onto the negative z-axis such that a value of zero is the camera position and -1.0 is 1 metre away.
A right-handed coordinate system is used.
The PERSPECTIVE_CORRECTION constant is fixed for a given depth map resolution and defined as 0.00000356 for a resolution of 320x240 and 0.00000178 for a resolution of 640x480
When doubling the width and depth of the depth map, the constant is halved

Perspective Correction

The camera’s perspective field of vision is important to factor out to get precise [x, y, z] coordinates in space that can be used to correlate different snapshots of the same scene taken at different angles since camera perspective varies according to camera position. Figure 1 illustrates the result of mapping depth values directly to fixed [x, y] coordinates without taking into account perspective.

Figure 1a) Mapping depth values to fixed [x, y] coordinates without perspective correction: view seen from the camera

Figure 1b) Mapping depth values to fixed [x,y] coordinates without perspective correction: view of the scene from above - note that the wall and shelves do not make right-angles due to the camera taking a perspective view

By including the perspective correction, real-world right angles remain right angles in the virtual space and distances are corrected to their absolute values as illustrated in Figure 2.

Figure 2a) Mapping depth values to absolute [x, y, z] coordinates using perspective correction: view seen from the camera

Figure 2b) Mapping depth values to absolute [x, y, z] coordinates using perspective correction: view of the scene from above – note that the wall and shelves make right-angles when using the perspective correction constant and appear straight and well aligned

The perspective correction was determined by measuring objects in the real world and comparing them to the size of their virtual counterpart without correction. This was correlated against distance from the camera, resulting in the derived constants. The formula for determining the initial fixed [x, y] positions are given below:

x=(i-(DEPTH_FRAME_WIDTH/2))*WORLD_SCALE;

y=(j-(DEPTH_FRAME_HEIGHT/2))*WORLD_SCALE;

z=-depth*WORLD_SCALE*DEPTH_SCALE;

WORLD_SCALE is 0.01 or 0.02 for 640x480 and 320x240 depth map resolutions respectively and DEPTH_SCALE is 0.1. These values were selected empirical to offer a visually good representation of the real world when mapped into the virtual space.

Using this mapping, a number of objects were placed in front of the camera and measured in both the real world and virtual space along their x- and y-axis to provide a scale factor mapping between the two spaces. These values are given in Table 1 along with the object’s distance from the camera.

Distance from Camera	Mean Scale Factor
810mm	0.137
1380mm	0.245
2630mm	0.472
3750mm	0.666

Table 1: Scale factors between real and virtual objects at a specific distance

Plotting the two columns of Table 1 against each other illustrates a linear correlation, as shown in Figure 3.

Figure 3: Plotting distance from camera against mean depth scale factor for perspective correction

The gradient of the linear line in Figure 3 gives the perspective correction value, calculated with respect to millimetre distances as per the original set of equations and factoring in the DEPTH_SCALE and WORLD_SCALE constants as per the second set of uncorrected equations.

Kinecting up the Past

Tuesday 2 July 2013

Mapping Depth Data into Virtual Space

1 comment: