How to use the Hand Input Module in A-Frame (Part 1) | by Muadh Al Kalbani | Samsung Internet Developers

11 min read

Sep 30, 2024

A lady wearing an XR headset and using both hands to perform hand gestures mid air while sat down at her work desk. — Photo by Vitaly Gariev on Unsplash

User interaction is an integral part of any XR (or WebXR) experience: it brings to life the true value of XR in expanding and improving our perception of reality. When it comes to interacting with virtual content we are mostly accustomed to controllers as a prominent choice for interaction. However, continued advances in immersive technologies open up new opportunities for natural and intuitive user interactions. Where we are now able to interact with virtual content naturally using gaze, speech or hands. By closely mirroring real-life actions, these alternative input methods aim to bridge the gap between the physical and digital worlds, enabling more immersive and realistic experiences.

In this three-part blog series, we will focus on the hand/s as an alternative input method, and hand based interaction in WebXR. In this series I will walk through how to make use of the Hand Input Module in your WebXR experiences and share my development experience in going from the API’s explainer to a functioning demo in A-Frame. I’m going to follow the structure of the explainer as much as possible, in order to mimic what a developer would go through having discovered the Hand Input Module for the first time.

The series will cover the following in 3 separate posts:

Part 1: how to access the API and draw hand skeletons (we are here)
Part 2: how to add simple interaction
Part 3: how to perform simple gesture detection

Before we jump in, you might be thinking why are we using hands at all? Why hands when we have these amazing state-of-the-art controllers that everyone is used to when using XR? I have a few reasons based on previous research to make a strong case for using “natural interaction” such as hands in XR environments:

We are inherently really good at it: we are ready made experts in using our hands (and other body parts such as eyes) for communication and interaction and accordingly face a shallower learning curve when using them. Using an accumulation of experiences, memories and previous actions during our lifetimes, we are readily very good at estimating the size, colour and location of physical objects around us and can naturally already make calculated and accurate hand based actions to interact with them.
Improve user experience: providing different methods of interaction in an XR experience improves user experience as different input source types in XR cater to various use cases and user preferences, such as gamepads for precise control in gaming applications, touchscreens for natural touch-based interactions, and eye-tracking systems for hands-free operation and enhanced security.
Improve accessibility: the accessibility of your XR experience or app can also be enhanced by providing alternative input methods. An experience that is only reliant on controllers excludes users that are unable to physically use a controller, whereas an XR experience that allows interaction using controllers and hand interaction as an alternative method becomes accessible to more users.
New interaction possibilities: think about this, the latest state-of-the-art XR controller has 6 degrees of freedom (i.e. the number of ways an object can move in 3D space), while a human hand has 27 (4 in each finger, 3 for extension and flexion and one for thumb abduction and adduction), though a typical WebXR device tracks up to 6. We usually think of a hand as a singular mode of interaction, but the dexterity of the hand makes it much more than a single entity. The hand can also provide a large number of gestures — and two hands used together provide even more possibilities.

The Hand Input Module is a component of the WebXR Device API that provides developers with the ability to expose hand joints and track hand movements within the XR environment, fostering immersive and intuitive interaction methods in WebXR experiences. This tutorial series will be based on the Hand Input Module’s explainer that provides instructions on accessing hand tracking information and using that information to develop hand based interactions and gesture detection. If you are new to WebXR development or/and Web Standards, I will walk you through translating Three.js to A-Frame and writing a custom A-Frame component which may seem like daunting tasks at first. Reviewing A-Frame’s “Writing A Component” documentation is also worthwhile to learn more about writing robust custom A-Frame components. Also, you will need a device capable of hand tracking to follow along this series.

To make a start, we will create an empty A-Frame scene and an empty A-Frame custom component called hand-skeleton. We start with adding our init, tick and remove lifecycle functions that are called automatically during specific stages of a component’s life cycle. Lifecycle functions are necessary because they enable us to control and manage the behavior of components throughout their entire lifespan, ensuring proper initialisation, updating, and cleanup. We will edit this script as we go through the explainer:


Hand Input Module in A-Frame

Following the explainer that starts with Accessing this API, we need to do the following in order to access the API:

Request hand-tracking

We first must request the hand-tracking XR Feature in our WebXR session. This entails a small addition to the A-Frame a-scene tag:

Loop through inputSources

We then need to loop through the inputSources in our tick function that actively renders our WebXR scene. By looping through inputSources we will be able to access the hand attribute on both hands to use later for drawing skeletons, handling interactions and recognising gestures. To achieve this I make the following changes to the custom hand-tracking A-Frame component:

Define frame and referenceSpace in our init() function:

init: function () {
this.referenceSpace = null;
this.frame = null;
},

Actively check for the reference space we are viewing in the experience in our tick function, and get the reference space if not present:

tick: function () {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
} else {
// render skeleton
// add interaction
// perform gesture detection
}
},

Add a renderHandSkeleton() function to loop through inputSources in the tick() function:

tick: function() {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
} else {
this.renderHandSkeleton();
// add interaction
// perform gesture detection
}
},renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
console.log("The tracked hand is: " + hand)
}
}
},

Finally, to make sure we can see these changes, we need to add our component to an entity in our A-Frame scene:

We have now successfully accessed the hand tracking XR Feature in our A-Frame scene! There is not much to see yet because we have yet to draw and the actual hand skeletons, but the console log should confirm that both “right” and “left” hands are recognised and tracked in our session.

So far so good? Next, let’s add some skeletons to these invisible tracked hands.

Following the displaying hand models using this API section of the explainer, we need to do the following in our A-Frame implementation:

Declare an array of orderedJoints that stores the names of each hand joint. Since this does not change during runtime, we can place it outside of our custom hand-tracking component at the top-level of the JS
file:

const orderedJoints = [
["thumb-metacarpal", "thumb-phalanx-proximal", "thumb-phalanx-distal", "thumb-tip"],
["index-finger-metacarpal", "index-finger-phalanx-proximal", "index-finger-phalanx-intermediate", "index-finger-phalanx-distal", "index-finger-tip"],
["middle-finger-metacarpal", "middle-finger-phalanx-proximal", "middle-finger-phalanx-intermediate", "middle-finger-phalanx-distal", "middle-finger-tip"],
["ring-finger-metacarpal", "ring-finger-phalanx-proximal", "ring-finger-phalanx-intermediate", "ring-finger-phalanx-distal", "ring-finger-tip"],
["pinky-finger-metacarpal", "pinky-finger-phalanx-proximal", "pinky-finger-phalanx-intermediate", "pinky-finger-phalanx-distal", "pinky-finger-tip"]
];

Create our own drawSphere() function in the hand-skeleton component that takes the radius and position coming from the getJointPose()function to draw the appropriate sphere for each joint. Note that while this is not mentioned explicitly in the explainer, the drawSphere() and drawCylinder() functions are not provided by the API. Also note that I am only drawing spheres for the skeleton in this tutorial for the sake of simplicity, but you would also need to create your own drawCylinder() function if you wish to connect all detected hand joints with “bones”:

drawSphere: function(radius, position) {
const sphere = document.createElement('a-sphere');
sphere.setAttribute('radius', radius);
sphere.setAttribute('color', 'red');
sphere.setAttribute('position', `${position.x} ${position.y} ${position.z}`);
this.el.appendChild(sphere);
return sphere;
},

Declare an empty spheres{} object to store the skeleton spheres that are created and returned using the drawSphere() function. We can add this to the init() function:

init: function () {
this.referenceSpace = null;
this.frame = null;
this.spheres = {}; // store spheres for each joint
},

Draw a sphere for joints by looping through orderedJoints on each finger for each detected hand. Notice here I have added a check to only draw spheres if not already rendered on the scene for a particular joint. The drawSphere() function does not get called if the sphere is rendered for a particular joint and only actively sets the sphere to the position of the joint. Without this check you run the risk of painting your screen with spheres coming from each finger in every frame, you can see this glorious failure in action in the screenshot below. We can amend our renderHandSkeleton() function to do this:

renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
for (const finger of orderedJoints) {
for (const jointName of finger) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
const position = jointPose.transform.position;
if (!this.spheres[jointName]) {
this.spheres[jointName] = this.drawSphere(jointPose.radius, position);
} else {
this.spheres[jointName].object3D.position.set(position.x, position.y, position.z);
}
}
}
}
}
}
},

A large amount of computer generated spheres mimicking hand joints filling the view of the user on an XR headset and obscuring their view of the real world — *Avoid the “Attack of The Spheres” by checking if they have already been drawn on a given joint*

When rendering two hand skeletons, it’s important to distinguish between the right and left hand inputs. Otherwise, you’ll be faced with the problem of ending up with only one skeleton on one hand as in the photo below.

Two hands in office environment where the right hand is performing a thumbs up gesture and has computer generated spheres depicting joints on top of it, and a left hand performing a thumbs down gesture with no computer generated spheres on it. — *I handled this one poorly… First failed attempt at rendering two skeletons for each hand*

The reason for this is that even though two hands are detected, the spheres are drawn (and overwritten) on the last detected hand when looping through inputSources. This happens because we are indexing spheres by joint name: this.spheres[jointName], so we get an override because both the left hand and right hand have joints that share the same names (i.e. those in orderedJoints).

By referring to handedness (that is an attribute of XRInputSource — not the hand), we can now index spheres by joint name and handedness, ensuring that we have two sets of skeletons/spheres for each hand:

renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
const handedness = inputSource.handedness;
for (const finger of orderedJoints) {
for (const jointName of finger) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
const position = jointPose.transform.position;
if (!this.spheres[handedness + '_' + jointName]) {
this.spheres[handedness + '_' + jointName] = this.drawSphere(jointPose.radius, position);
} else {
this.spheres[handedness + '_' + jointName].object3D.position.set(position.x, position.y, position.z);
}
}
}
}
}
}
},

Finally, a little bit of tidying up to do is to make sure we are clearing out any spheres that are no longer needed in our remove() function:

remove: function () {
// clean up rendered spheres
for (const jointName in this.spheres) {
this.spheres[jointName].parentNode.removeChild(this.spheres[jointName]);
}
},

Success! And here we are! We should now have two skeletons rendered on each hand. By accessing handedness we are now able to develop unique skeletons, interactions and even gestures using both hands.

Two open hands in an office environment with computer generated spheres depicting joints overlaid on top of them

Two open hands in an office environment performing thumbs up gestures with computer generated spheres depicting joints overlaid on top of them — Two Hands. Two Skeletons. Natural Interaction Ready!

Your final script should look something like this:



Hand Input Module in A-Frame
AFRAME.registerComponent('hand-skeleton', {
init: function () {
this.referenceSpace = null;
this.frame = null;
this.spheres = {}; // store spheres for each joint
},
tick: function () {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
} else {
this.renderHandSkeleton();
// add interaction
// perform gesture detection
}
},
renderHandSkeleton: function() {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;
if (!this.frame || !this.referenceSpace) {
return;
}
for (const inputSource of inputSources) {
if (inputSource.hand) {
const hand = inputSource.hand;
const handedness = inputSource.handedness;
for (const finger of orderedJoints) {
for (const jointName of finger) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
const position = jointPose.transform.position;
if (!this.spheres[handedness + '_' + jointName]) {
this.spheres[handedness + '_' + jointName] = this.drawSphere(jointPose.radius, position);
} else {
this.spheres[handedness + '_' + jointName].object3D.position.set(position.x, position.y, position.z);
}
}
}
}
}
}
},
remove: function () {
// clean up rendered spheres
for (const jointName in this.spheres) {
this.spheres[jointName].parentNode.removeChild(this.spheres[jointName]);
}
},
drawSphere: function(radius, position) {
const sphere = document.createElement('a-sphere');
sphere.setAttribute('radius', radius);
sphere.setAttribute('color', 'red');
sphere.setAttribute('position', `${position.x} ${position.y} ${position.z}`);
this.el.appendChild(sphere);
return sphere;
},
});

You can also grab the full script from the hand-input-aframe GitHub repo.

Now that we have our functional hand skeletons, in the next part of this blog series we will be adding some simple hand based interactions. Stay tuned!

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

How to use the Hand Input Module in A-Frame (Part 1) | by Muadh Al Kalbani | Samsung Internet Developers | Sep, 2024

Request hand-tracking

Loop through inputSources

Table of contents

Samsung Electronics Acquires Leading Global HVAC Solutions Provider FläktGroup – Samsung Global Newsroom

Google Find My Device rebrands as Find Hub

Legion 9i 10th Gen: Lenovo’s 18-Inch Flagship Launches with Switchable 2D/3D Glass-Free Screen, A Carbon Fiber Cover, and Nvidia RTX 5090

Google Keep Web Adds Text Formatting Tools And New Icon Update

Carl Pei teases the ‘all-in’ Nothing Phone 3 with a flagship price

Trending News

Samsung Electronics Acquires Leading Global HVAC Solutions Provider FläktGroup – Samsung Global Newsroom

Google Find My Device rebrands as Find Hub

Legion 9i 10th Gen: Lenovo’s 18-Inch Flagship Launches with Switchable 2D/3D Glass-Free Screen, A Carbon Fiber Cover, and Nvidia RTX 5090

Google Keep Web Adds Text Formatting Tools And New Icon Update

Samsung Electronics Acquires Leading Global HVAC Solutions Provider FläktGroup – Samsung Global Newsroom

Google Find My Device rebrands as Find Hub

Legion 9i 10th Gen: Lenovo’s 18-Inch Flagship Launches with Switchable 2D/3D Glass-Free Screen, A Carbon Fiber Cover, and Nvidia RTX 5090