
Sep 30, 2024
User interaction is an integral part of any XR (or WebXR) experience: it brings to life the true value of XR in expanding and improving our perception of reality. When it comes to interacting with virtual content we are mostly accustomed to controllers as a prominent choice for interaction. However, continued advances in immersive technologies open up new opportunities for natural and intuitive user interactions. Where we are now able to interact with virtual content naturally using gaze, speech or hands. By closely mirroring real-life actions, these alternative input methods aim to bridge the gap between the physical and digital worlds, enabling more immersive and realistic experiences.
In this three-part blog series, we will focus on the hand/s as an alternative input method, and hand based interaction in WebXR. In this series I will walk through how to make use of the Hand Input Module in your WebXR experiences and share my development experience in going from the API’s explainer to a functioning demo in A-Frame. I’m going to follow the structure of the explainer as much as possible, in order to mimic what a developer would go through having discovered the Hand Input Module for the first time.
The series will cover the following in 3 separate posts:
- Part 1: how to access the API and draw hand skeletons (we are here)
- Part 2: how to add simple interaction
- Part 3: how to perform simple gesture detection
Before we jump in, you might be thinking why are we using hands at all? Why hands when we have these amazing state-of-the-art controllers that everyone is used to when using XR? I have a few reasons based on previous research to make a strong case for using “natural interaction” such as hands in XR environments:
- We are inherently really good at it: we are ready made experts in using our hands (and other body parts such as eyes) for communication and interaction and accordingly face a shallower learning curve when using them. Using an accumulation of experiences, memories and previous actions during our lifetimes, we are readily very good at estimating the size, colour and location of physical objects around us and can naturally already make calculated and accurate hand based actions to interact with them.
- Improve user experience: providing different methods of interaction in an XR experience improves user experience as different input source types in XR cater to various use cases and user preferences, such as gamepads for precise control in gaming applications, touchscreens for natural touch-based interactions, and eye-tracking systems for hands-free operation and enhanced security.
- Improve accessibility: the accessibility of your XR experience or app can also be enhanced by providing alternative input methods. An experience that is only reliant on controllers excludes users that are unable to physically use a controller, whereas an XR experience that allows interaction using controllers and hand interaction as an alternative method becomes accessible to more users.
- New interaction possibilities: think about this, the latest state-of-the-art XR controller has 6 degrees of freedom (i.e. the number of ways an object can move in 3D space), while a human hand has 27 (4 in each finger, 3 for extension and flexion and one for thumb abduction and adduction), though a typical WebXR device tracks up to 6. We usually think of a hand as a singular mode of interaction, but the dexterity of the hand makes it much more than a single entity. The hand can also provide a large number of gestures — and two hands used together provide even more possibilities.
The Hand Input Module is a component of the WebXR Device API that provides developers with the ability to expose hand joints and track hand movements within the XR environment, fostering immersive and intuitive interaction methods in WebXR experiences. This tutorial series will be based on the Hand Input Module’s explainer that provides instructions on accessing hand tracking information and using that information to develop hand based interactions and gesture detection. If you are new to WebXR development or/and Web Standards, I will walk you through translating Three.js to A-Frame and writing a custom A-Frame component which may seem like daunting tasks at first. Reviewing A-Frame’s “Writing A Component” documentation is also worthwhile to learn more about writing robust custom A-Frame components. Also, you will need a device capable of hand tracking to follow along this series.
To make a start, we will create an empty A-Frame scene and an empty A-Frame custom component called hand-skeleton
. We start with adding our init
, tick
and remove
lifecycle functions that are called automatically during specific stages of a component’s life cycle. Lifecycle functions are necessary because they enable us to control and manage the behavior of components throughout their entire lifespan, ensuring proper initialisation, updating, and cleanup. We will edit this script as we go through the explainer:
Hand Input Module in A-Frame
Following the explainer that starts with Accessing this API, we need to do the following in order to access the API:
Request hand-tracking
We first must request the hand-tracking
XR Feature in our WebXR session. This entails a small addition to the A-Frame a-scene tag:
Loop through inputSources
We then need to loop through the inputSources
in our tick function that actively renders our WebXR scene. By looping through inputSources
we will be able to access the hand
attribute on both hands to use later for drawing skeletons, handling interactions and recognising gestures. To achieve this I make the following changes to the custom hand-tracking
A-Frame component:
- Define
frame
andreferenceSpace
in ourinit()
function:
init: function () {
this.referenceSpace = null;
this.frame = null;
},
- Actively check for the reference space we are viewing in the experience in our tick function, and get the reference space if not present:
tick: function () {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
} else {
// render skeleton
// add interaction
// perform gesture detection
}
},
- Add a
renderHandSkeleton()
function to loop throughinputSources
in thetick()
function:
tick: function() {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
} else {
this.renderHandSkeleton();
// add interaction
// perform gesture detection
}
},renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
console.log("The tracked hand is: " + hand)
}
}
},
- Finally, to make sure we can see these changes, we need to add our component to an entity in our A-Frame scene:
We have now successfully accessed the hand tracking XR Feature in our A-Frame scene! There is not much to see yet because we have yet to draw and the actual hand skeletons, but the console log should confirm that both “right” and “left” hands are recognised and tracked in our session.
So far so good? Next, let’s add some skeletons to these invisible tracked hands.
Following the displaying hand models using this API section of the explainer, we need to do the following in our A-Frame implementation:
- Declare an array of
orderedJoints
that stores the names of each hand joint. Since this does not change during runtime, we can place it outside of our customhand-tracking
component at the top-level of the JS
file:
const orderedJoints = [
["thumb-metacarpal", "thumb-phalanx-proximal", "thumb-phalanx-distal", "thumb-tip"],
["index-finger-metacarpal", "index-finger-phalanx-proximal", "index-finger-phalanx-intermediate", "index-finger-phalanx-distal", "index-finger-tip"],
["middle-finger-metacarpal", "middle-finger-phalanx-proximal", "middle-finger-phalanx-intermediate", "middle-finger-phalanx-distal", "middle-finger-tip"],
["ring-finger-metacarpal", "ring-finger-phalanx-proximal", "ring-finger-phalanx-intermediate", "ring-finger-phalanx-distal", "ring-finger-tip"],
["pinky-finger-metacarpal", "pinky-finger-phalanx-proximal", "pinky-finger-phalanx-intermediate", "pinky-finger-phalanx-distal", "pinky-finger-tip"]
];
- Create our own
drawSphere()
function in thehand-skeleton
component that takes the radius and position coming from thegetJointPose()
function to draw the appropriate sphere for each joint. Note that while this is not mentioned explicitly in the explainer, thedrawSphere()
anddrawCylinder()
functions are not provided by the API. Also note that I am only drawing spheres for the skeleton in this tutorial for the sake of simplicity, but you would also need to create your owndrawCylinder()
function if you wish to connect all detected hand joints with “bones”:
drawSphere: function(radius, position) {
const sphere = document.createElement('a-sphere');
sphere.setAttribute('radius', radius);
sphere.setAttribute('color', 'red');
sphere.setAttribute('position', `${position.x} ${position.y} ${position.z}`);
this.el.appendChild(sphere);
return sphere;
},
- Declare an empty
spheres{}
object to store the skeleton spheres that are created and returned using thedrawSphere()
function. We can add this to theinit()
function:
init: function () {
this.referenceSpace = null;
this.frame = null;
this.spheres = {}; // store spheres for each joint
},
- Draw a sphere for joints by looping through
orderedJoints
on each finger for each detected hand. Notice here I have added a check to only draw spheres if not already rendered on the scene for a particular joint. ThedrawSphere()
function does not get called if the sphere is rendered for a particular joint and only actively sets the sphere to the position of the joint. Without this check you run the risk of painting your screen with spheres coming from each finger in every frame, you can see this glorious failure in action in the screenshot below. We can amend ourrenderHandSkeleton()
function to do this:
renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
for (const finger of orderedJoints) {
for (const jointName of finger) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
const position = jointPose.transform.position;
if (!this.spheres[jointName]) {
this.spheres[jointName] = this.drawSphere(jointPose.radius, position);
} else {
this.spheres[jointName].object3D.position.set(position.x, position.y, position.z);
}
}
}
}
}
}
},
When rendering two hand skeletons, it’s important to distinguish between the right and left hand inputs. Otherwise, you’ll be faced with the problem of ending up with only one skeleton on one hand as in the photo below.
The reason for this is that even though two hands are detected, the spheres are drawn (and overwritten) on the last detected hand when looping through inputSources
. This happens because we are indexing spheres by joint name: this.spheres[jointName]
, so we get an override because both the left hand and right hand have joints that share the same names (i.e. those in orderedJoints
).
By referring to handedness
(that is an attribute of XRInputSource
— not the hand
), we can now index spheres by joint name and handedness
, ensuring that we have two sets of skeletons/spheres for each hand:
renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
const handedness = inputSource.handedness;
for (const finger of orderedJoints) {
for (const jointName of finger) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
const position = jointPose.transform.position;
if (!this.spheres[handedness + '_' + jointName]) {
this.spheres[handedness + '_' + jointName] = this.drawSphere(jointPose.radius, position);
} else {
this.spheres[handedness + '_' + jointName].object3D.position.set(position.x, position.y, position.z);
}
}
}
}
}
}
},
Finally, a little bit of tidying up to do is to make sure we are clearing out any spheres that are no longer needed in our remove()
function:
remove: function () {
// clean up rendered spheres
for (const jointName in this.spheres) {
this.spheres[jointName].parentNode.removeChild(this.spheres[jointName]);
}
},
Success! And here we are! We should now have two skeletons rendered on each hand. By accessing handedness
we are now able to develop unique skeletons, interactions and even gestures using both hands.
Your final script should look something like this:
Hand Input Module in A-Frame
AFRAME.registerComponent('hand-skeleton', {
init: function () {
this.referenceSpace = null;
this.frame = null;
this.spheres = {}; // store spheres for each joint
},
tick: function () {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
} else {
this.renderHandSkeleton();
// add interaction
// perform gesture detection
}
},
renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
const handedness = inputSource.handedness;
for (const finger of orderedJoints) {
for (const jointName of finger) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
const position = jointPose.transform.position;
if (!this.spheres[handedness + '_' + jointName]) {
this.spheres[handedness + '_' + jointName] = this.drawSphere(jointPose.radius, position);
} else {
this.spheres[handedness + '_' + jointName].object3D.position.set(position.x, position.y, position.z);
}
}
}
}
}
}
},
remove: function () {
// clean up rendered spheres
for (const jointName in this.spheres) {
this.spheres[jointName].parentNode.removeChild(this.spheres[jointName]);
}
},
drawSphere: function(radius, position) {
const sphere = document.createElement('a-sphere');
sphere.setAttribute('radius', radius);
sphere.setAttribute('color', 'red');
sphere.setAttribute('position', `${position.x} ${position.y} ${position.z}`);
this.el.appendChild(sphere);
return sphere;
},
});
You can also grab the full script from the hand-input-aframe GitHub repo.
Now that we have our functional hand skeletons, in the next part of this blog series we will be adding some simple hand based interactions. Stay tuned!