While spatial tracking is the fundamental input device for 3D UIs, it is usually not sufficient on its own. As noted above, most handheld trackers include other sorts of input, because it’s difficult to map all interface actions to position, orientation, or motion of the tracker. For example, to confirm a selection action, a discrete event or command is needed, and a button is much more appropriate for this than a hand motion. The Intersense IS900 wand is typical of such handheld trackers; it includes four standard buttons, a “trigger” button, and a 2-DOF analog joystick in a handheld form factor. The Kinect, because of its “controller-less” design, suffers from the lack of discrete inputs such as buttons.

A violation of this concept, for example, would be to use a six-DOF tracker to simultaneously control the 3D position of an object and the volume of an audio clip, since those tasks cannot be integrated by the user. Similarly, there are often problems with the mappings of input DOFs to actions. When a high-DOF input is used for a task that requires a lower number of DOFs, task performance can be unnecessarily difficult. For example, selecting a menu item is inherently an one-dimensional task.

Instead, the question is whether to use a natural metaphor. For example, in the real world I cannot pick up objects that are beyond arm's reach, but in the virtual world I can. Or should I pick up the object by pointing to it using a laser pointer metaphor, as in the HOMER technique (Bowman & Hodges, 1997)? In this case, the less natural laser pointer metaphor is more effective in terms of user performance, but enhanced natural metaphors are easy to learn and highly usable in many situations. Another DOF problem is the misuse of integral and separable DOFs.

For 3D interaction, spatial trackers are most often used inside handheld devices. These devices typically include other inputs such as buttons, joysticks, or trackballs, making them something like a “3D mouse. ” Like desktop mice, these can then be used for pointing, manipulating objects, selecting menu items, and the like. Trackers are also used to measure the user’s head position and orientation.

If users need to position their virtual hands within a menu item to select it (a 3-DOF input), the interface requires too much effort. Finally, we know that display characteristics can affect 3D interaction performance. Gloves are another type of input device that is frequently combined with spatial trackers. Pinch gloves detect contacts between the fingers, while data gloves and finger trackers measure joint angles of the fingers. Generalizing this idea, we can see that almost any sort of input device can be made into a spatial input device by tracking it. Usually this requires adding some hardware to the device, such as optical tracking markers. This extends the capability and expressiveness of the tracker, and allows the input from the device to be interpreted differently depending on its position and orientation.

Head tracking is useful for modifying the view of a 3D environment in a natural way. Rather than tracking a handheld device or a single point on the user’s head, it uses a depth camera to track the user’s entire body. The 3-DOF position of each point is measured, but orientation is not detected. And since it tracks the body directly, no “controller” is needed. Probably the best candidate for self-contained 6-DOF tracking is inside-out vision-based tracking, in which the tracked object uses a camera to view the world, and analyzes the changes in this view over time to understand its own motion. Although this approach is inherently relative, such systems can keep track of “feature points” in the scene to give a sort of absolute tracking in a fixed coordinate system connected with the scene.

The modern mouse is a highly precise, accurate, and responsive 2D spatial input device—users can point at on-screen elements, even individual pixels, quickly and accurately. In many cases of difficult tasks, the question is not whether we should use a natural or magical 3D UI, because the purely natural technique wouldn’t be practical.