Case Study: The Design Behind ‘Cubism’s’ Hand-tracking – Road to VR

Hand tracking first became available on the Oculus Quest back in late 2019. Out of enthusiasm for this new input method, I published a demo of Cubism to SideQuest with experimental hand tracking support only a few days later. Needless to say, this initial demo had several flaws, and didn’t really take the limitations of the technology into account, which is why I decided to initially omit hand tracking support from the full release of Cubism on the Oculus Store. It took more development, leaning on lessons learned from the work of fellow developers, to build something I was happy to release in the recent Cubism hand-tracking update. Here’s an inside-look at the design process.

Guest Article by Thomas Van Bouwel

Thomas is a Belgian-Brazilian VR developer currently based in Brussels. Although his original background is in architecture, his current work in VR spans from indie games like Cubism to enterprise software for architects and engineers like Resolve.

This update builds on lessons learned from many other games and developers who have been exploring hand tracking over the last year (The Curious Tale of the Stolen Pets, Vacation Simulator, Luca Mefisto, Dennys Kuhnert, and several others).

In this article I’d like to share some things I’ve learned when tackling the challenges specific to Cubism’s hand interactions.

Optimizing for Precise Interactions

Cubism’s interactions revolve around placing small irregular puzzle pieces in a puzzle grid. This meant the main requirement for hand tracking input was precision, both in picking up and placing pieces on to the grid, as well as precisely picking out pieces from a completed puzzle. This informed most of the design decisions regarding hand input.

Ghost Hands

I decided early on to not make the hands physics-based, but instead let them pass through pieces until one is actively grabbed.

This avoided clumsily pushing the floating puzzle pieces away when you are trying to grab them mid-air, but more importantly, it made plucking pieces in the middle of a full puzzle easier since you can just stick your fingers in and grab a piece instead of needing to figure out how to physically pry them out.

Signaled by their transparency, hands are not physical, making it easier to pick out pieces from the middle of a puzzle.

Contact Grabbing

There are several approaches to detecting a users intent to grab and release objects, like focusing on finger pinches or total finger joint rotation while checking a general interaction zone in the palm of the hand.

For Cubism’s small and irregular puzzle pieces however, the approach that seemed to handle the precision requirements the best was a contact based approach, where a piece is grabbed as soon as thumb and index intersect the same piece and are brought together over a small distance, without requiring a full pinch.

Similar to the approach in The Curious Tale of the Stolen Pets, the fingers are locked in place as soon as a grab starts, to help give the impression of a more stable looking grab. The piece is parented to the root of the hand (the wrist) while grabbed. Since this seems to be the most stable tracked joint, it helps produce a steadier grip, and guarantees the piece stays aligned with the locked fingers.

Piece is grabbed when thumb and index intersect it and are brought together slightly. Rotation of index and thumb are then locked in place to help give the impression of a stable grab.

As soon as a piece is grabbed, the distance between thumb and index is saved, and a release margin is calculated based on that distance. Once thumb and index move apart beyond that margin, the piece is released.

Several safeguards try to prevent unintentional releases: we don’t check for release when tracking confidence is below a certain threshold, and after tracking confidence is re-gained, we wait several frames until checking for release again. Fingers are also required to be beyond the release margin for several frames before actually releasing.

Debug visualization: during a grab, the initial grab distance between fingertips is saved (outer red circle). The piece is released when the real position of the fingertips move beyond a certain margin (blue circle).

There is also a system in place similar to Vacation Simulator’s overgrab method. Due to the lack of haptic feedback when grabbing a piece, it’s not uncommon for fingers to drift closer to one another during a grab. If they close beyond a certain threshold, the release margins are adjusted to make releasing the piece easier.

Try it yourself: to see these debug visualizations in-game, go to ‘Settings > Hand Tracking > Debug visualizations’ and turn on ‘Interactions widgets’.

Debug visualization: If fingers drift to each other during a grab over a certain threshold (inner red circle), the release margins are re-adjusted to make releasing the piece feel less “sticky”.

One limit to this approach is that it makes supporting grabbing with fingers other than the index a bit harder. An earlier implementation also allowed grabbing between middle finger and thumb, but this often led to false positives when grabbing pieces out of a full puzzle grid, since it was hard to evaluate which finger the player was intending to grab a specific piece with.

This would not have been an issue if grabbing revolved around full finger pinches, since that results in a more clear input binary from which to determine user intent (at the cost of a less natural feeling grab pose).

Midpoint Check

Besides checking which piece the index and thumb are intersecting, an additional check happens at the midpoint between index fingertip and thumb fingertip.

Whatever piece this midpoint hovers over will be prioritized for grabbing, which helps avoid false positives when a player tries to grab a piece in a full grid.

In the example below, if the player intends to grab the green piece by its right edge, they would unintentionally grab the yellow piece if we didn’t do this midpoint check.

Left: thumb, index & midpoint between fingertips are in yellow → grab yellow. Right: thumb & index are in yellow, midpoint is in green → grab green

Grabbing the Puzzle

Grabbing the puzzle works similar to grabbing puzzle pieces, except it is initiated by performing a full pinch within the grab zone around the puzzle.

The size of this zone is dynamically increased when switching from controllers to hands. This makes it a bit easier to grab, and helps reduce the likelihood of accidentally grabbing a piece in the grid instead of the grid itself.

The grab zone around the puzzle expands when switching from controllers to hands, making it easier to grab. Although it requires a full pinch, grabbing the puzzle works similar to grabbing puzzle pieces.

Dynamic Hand Smoothing

The hand tracking data provided by the Oculus Quest still can have a bit of jitter to it, even when tracking confidence is high. This can actually affect game play too, since jitter can be much more noticeable when holding the puzzle grid or a long puzzle piece by the edge, making precise placement of pieces on the grid harder.

Smoothing the tracking data can go a long way to produce more stable looking grabs, but needs to be done in moderation since too much smoothing will result in a “laggy” feeling to the hands. To balance this, hand smoothing in Cubism is dynamically adjusted depending on whether your hand is holding something or not.

Try it yourself: to see the impact of hand smoothing, try turning it off under


‘Settings > Hand Tracking > Hand smoothing’.

Increasing the smoothing of hand positions while holding objects helps produce a more stable grip, making precise placement on the grid a bit easier.

Pressing Buttons

One thing I noticed with Cubism’s original hand tracking demo was that most people tried pressing the buttons even though that was not supported at the time. Therefore, one of my goals with this new version of hand tracking was to make the buttons actually pushable.

Buttons can be hovered over when a raycast from the index finger tip hits a collider at the back of the button. If the index finger then intersects with the collider, a press is registered. If the index intersects the collider without first hovering it, no press is registered. This helps prevent false positives when the finger moves from bottom to top.

There are a few more checks in place to prevent false positives: the raycast is disabled when the finger is not facing the button, or when the player is not looking at their finger when pressing.

Try it yourself: to see this debug visualization in-game, go to ‘Settings > Hand Tracking > Debug visualizations’ and turn on ‘Interactions widgets’.

Debug visualization: a raycast from the index tip checks whether the finger is hovering over a button. To help prevent false positives, interaction is disabled when the finger is not facing the button, or when the player is not looking at their finger.

Guiding Interactions

One of the main challenges of building any interaction for hand tracking is that, in contrast to buttons on a controller which are either pushed or not pushed, there are many different ways people may try to approach an interaction with their hands while expecting the same outcome.

Playtesting with a diverse set of people can help you learn how people are approaching the interactions presented to them, and can help refine the interaction cues that guide them to the expected gestures. Playtesting can also help you learn some of the outliers you may want to catch by adding some interaction redundancy.

Interaction Cues

There are several cues while grabbing a piece. When a user first hovers over a piece, their index and thumb take on the color of that piece, both to indicate it can be grabbed, and to signal which fingers can grab it (inspired by previous work by Luca Mefisto, Barrett Fox, and Martin Schubert). The piece is also highlighted to indicate it can be grabbed.

Several cues also indicate when the grab is successful: the fingertips become solid, the highlights on the piece flash, and a short audio cue is played.

Various cues both on the hand and the puzzle piece guide and confirm the grab interaction.

Buttons have several cues to help indicate that they can be pushed. Much like with puzzle pieces, the index fingertip is highlighted in white once you hover over a button, indicating which finger can interact. Like they did with controllers, buttons extend outward when hovered, but this time the extended button can actually be pressed: once the index touches it, it follows the finger until it is fully pressed down, at which point an audio cue confirms the click.

A subtle drop shadow on the button surface indicates where the position and distance of the index to the button and helps guide the press interaction.

Various cues guide interactions with buttons: buttons extend outward when hovered, the index fingertip is highlighted, a drop shadow shows where the tip will interact, and the button follows the finger when pushed.

Interaction Redundancy

Since some people may approach some interactions in unintended ways, it can be good to try and account for this where possible by adding some redundancy to the ways people can use their hands to interact. Interaction cues can still guide them to the intended interaction, but redundancy can help avoid them getting unnecessarily stuck.

When it comes to grabbing pieces, a few playtesters would try to grab pieces by making a fist at first instead of using their finger tips. By having the colliders cover the entire finger instead of just the fingertip, a decent amount of these first grabs will still be registered.

I should note this approach still needs some improvement, since it also introduces some issues producing unintended grabs in cases when there are a lot of pieces floating around the play area. A better approach in the future might be to also perform a check on the total finger rotation to account for fist grabs instead.

Though grabbing is designed around fingertips, colliders on index and thumb cover the entire finger to help catch different forms of grabbing.

With buttons, there were a few playtesters who would try pinching them instead of pushing them. In part this seemed to occur when they previously learned how to pinch buttons in the Oculus home screen, right before launching the game.

For this reason, buttons can also be clicked by pinching once they are hovered, and hopefully cues like the highlighted index and drop shadow will eventually guide them to pressing the buttons instead.

Pinching while hovering over buttons also registers as a click.

The first button players encounter when using hands also explicitly states “Push to Start”, to help transition people from pinching to pushing after coming from the Oculus Home menu.

Teaching Limitations

Although the quality of Quest’s hand tracking has improved over the last year, it still has its limitations — and a player’s awareness of these limitations can have a big impact on how good they perceive their experience to be.

Cubism implements a few ways of teaching player’s about the current limitations of hand tracking on Quest.

When the player first switches to hand tracking (either at launch or mid-game), a modal informs them of some best practices, like playing in a well-lit space and avoiding crossing hands.

When a user switches to hand tracking, a modal informs them about limitations and best-practices. The “Push to Start” instruction helps teach new users that buttons can be naturally pushed in this game.

It is important to acknowledge that most people are likely to immediately dismiss modals like this or quickly forget its guidelines, so signaling why things can go wrong during the experience is also important.

In Cubism, hands will turn red to signal when tracking was lost. In some playtests, people would keep one hand on their lap and play with the other, and be puzzled why their lap hand would appear frozen. To help inform cases like this, a message is displayed on the hand to clearly state why the hand is frozen if tracking loss persists. If tracking is lost specifically because the player is crossing their hands, the message changes to inform them not to do that.

Left: hands turn red when tracking is first lost. Middle: when tracking loss persists, a message informs the player about what is going on. Right: if tracking is lost due to occluded hands this is also indicated

For more seasoned players, or players who prefer playing with one hand, this feature can be replaced in the settings by having hands fade out when they lose tracking instead, more closely resembling the behavior in the Oculus home menu.

The red hands and warning messages can be replaced in the settings by fading hands.

Future Work

Hand tracking on Quest still has its limitations, and though Cubism’s support for it is already in its second version, there is still plenty of room for improvement.

Regardless, I’m excited to start exploring and supporting these new input methods. In the short term, I think they can help make experiences like this more accessible and easier to share with new VR users.

Mixed reality footage captured on an iPhone with Fabio Dela Antonio’s app Reality Mixer gives an idea of what it may be like to play Cubism on an AR headset in the future.

In the long term, there seems to be a good chance that hand tracking will be the go-to input for future standalone AR devices, so hopefully this update can be a first small step towards an AR version of Cubism.


If you enjoyed this look at at the hand-tracking design in Cubism, be sure to check out Thomas’ prior Guest Article which overviews the design of the broader game.