Sunday, April 20, 2025

I created an Augmented Reality Piano in WebXR | by Muadh Al Kalbani | Samsung Internet Developers | Feb, 2025

Share

Samsung Internet Developers
A close up photo of piano keys
Photo by Johannes Plenio on Unsplash

My colleague has recently written a brilliant post that introduces the Web Audio API highlighting the great features developers can tap into, which got us thinking… what would happen if we couple the Hand Input Module from the WebXR Device API with the Web Audio API to further enrich immersive experiences?

I narrowed down the several ideas we had to a drumming kit or a piano, and eventually an Augmented Reality Piano seemed like a cool challenge. A piano would make good use of the joint information coming from the Hand Input Module as it requires more complex gestures and hand movements in comparison to a drumming kit (which would essentially only need a fist gesture to hold the sticks). Additionally a piano would allow for separate finger movements in the case of pressing multiple keys simultaneously which will better showcase the responsiveness of the Web Audio API when coupled with the versatility and precision of the Hand Input Module. The expected lack of haptic feedback also made an AR piano even more interesting, so here we are! Before we jump in, there are a couple of disclaimers that I need to make:

  • I am not a musician and I am pretty useless at using my left limbs so not the best piano player either, even if I try my hardest, so apologies in advance to all the audio experts out there for any lingo that doesn’t sound musical enough! My main focus in this post is to walk you through how to use some key features of the Web Audio API in a WebXR experience.
  • The Web Audio API is MASSIVE. As in it takes a while to load the web page of the spec kind of massive, so I won’t be covering every single feature the API provides, rather mostly focus on providing you with a starting point to explore the API further in a WebXR context.

Prior to kicking off development, I worked on a quick list of requirements that I wanted for this virtual piano. Note that these are my selfish requirements and yours may differ, so please feel free to unleash your imagination and adapt these as you see fit:

  • No dependencies: I personally am not a big fan of directing developers to get external files or resources, so I will attempt to create all the notes of a piano from scratch using the API. Luckily the Web Audio API allows us to do just that.
  • One Piano. Two hands: since we already covered using the Hand Input Module in previous posts, I will aim to use elements of that to allow the user to play the AR piano using both hands.
  • Super duper realism is a non-goal: while I try my best to make this feel like a “real” piano, this is not my ultimate goal here. The key focus will be on successfully linking the two APIs together in your WebXR environment.

To be completely honest, I had to do some research on what different pianos look like and figure out what an Octave means exactly before making a start, which tells you all you need to know about the breadth of my musical skills! But that’s the point of this post — you absolutely don’t need to be Mozart to get this working. I use the image below as a reference for the piano we’ll create in A-Frame, note that I will only be creating the keys for Octaves 3 and 4 for simplicity.

An illustration of full piano keys showing corresponding keys for every Octave from 0 to 8
Reference image for the AR Piano. By AlwaysAngry — Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=20429663

I use A-Frame’s primitive objects to create the piano. The positions in the script below worked well for my working space so that the piano is placed on a physical table, but feel free to adapt the values based on your environment. The scene also requests the hand-tracking XR feature in our WebXR session to access the Hand Input Module:




AR Piano

tick: function () {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
}

if (this.frame && this.referenceSpace) {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;

for (const inputSource of inputSources) {
if (inputSource.hand) {
const handedness = inputSource.handedness;
// main function to play sounds, add interactions etc.
}
}
}
},
});

So far all the usual is happening in this hand-tracking component — we loop through the inputSources in our tick() function that actively renders our WebXR scene, actively check for the reference space we are viewing in the experience, access the hand and handedness attributes to use later for distinguishing between right and left key presses. There is however one small change to the way I’ve done this previously, I am no longer actively rendering spheres for detected joints. Instead I make use of the hand model that comes with the hand-tracking-controls A-Frame component hence the two additional rightHand and leftHand entities in our scene. This provides a better visualisation of the user’s hands on top of the piano.

Two tracked hands with fully extended fingers overlaid with two hand models on top of 3D virtual piano that is placed on a physical table.
A-Frame hand models for better visualisation.

Now we can move on to the MVP (most valuable player) of this post, the Web Audio API. First thing we need to do is to create an instance of AudioContext, which is the primary interface for working with the Web Audio API. Since this does not change during runtime, we can place it outside of our custom hand-tracking component at the top-level of the JS file:

const audioContext = new (window.AudioContext || window.webkitAudioContext)();

Since I am keen to create notes from scratch for this piano, we’ll need to create a function that plays synthesised notes. This function felt a bit crowded for me during development so I will try to break it down into three parts to explain what’s going on:

function playNote(frequency) {
const oscillator = audioContext.createOscillator();
const gainNode = audioContext.createGain();
oscillator.type = 'sine';
oscillator.frequency.setValueAtTime(frequency, audioContext.currentTime);
. . .
}

The oscillator variable represents a periodic waveform, such as a sine wave to generate sound. gainNode is used to control the volume of the audio signal created and oscillator.frequency sets the pitch of the oscillator which will be passed to this function later. I chose a sine wave for this demo, but there are other wave types such as square, triangle, sawtooth and others (even custom ones) so feel free to experiment with those.

Next we need to form an audio processing chain by connecting the output of one node to the input of another:

function playNote(frequency) {
. . .
oscillator.connect(gainNode);
gainNode.connect(audioContext.destination);
}

And finally we need to add some controls to the oscillator:

function playNote(frequency) {
. . .
oscillator.start();
gainNode.gain.exponentialRampToValueAtTime(0.001, audioContext.currentTime + 1);
oscillator.stop(audioContext.currentTime + 1);
}

oscillator.start() starts the oscillator’s sound generation and oscillator.stop() stops the oscillator after a specified time. One extra thing I’ve done here to flex the capabilities of the API in editing audio is to create a decay in the volume of the note to simulate how a real piano note fades over time. This is done using the exponentialRampToValueAtTime method which schedules the gain value to decrease exponentially to 0.001 by the time (currentTime + 1) is reached.

The playNote() function is now all sorted and should look like this:

function playNote(frequency) {
const oscillator = audioContext.createOscillator();
const gainNode = audioContext.createGain();
oscillator.type = 'sine';
oscillator.frequency.setValueAtTime(frequency, audioContext.currentTime);
oscillator.connect(gainNode);
gainNode.connect(audioContext.destination);
oscillator.start();
gainNode.gain.exponentialRampToValueAtTime(0.001, audioContext.currentTime + 1);
oscillator.stop(audioContext.currentTime + 1);
}

Next we need to add some note frequencies for each key in an object to pass to the playNote() function later, this can again go at the top of the JS file since it won’t change during runtime. I grabbed piano key frequencies from the Wikipedia page, but you can of course play around with these values to achieve the sounds that work for you (the only advice I can give here — which is no way scientific and purely based on trial and error — is that the values should gradually go up or down as you move along the piano to sound like a real one):

const noteFrequencies = {
// octave 3
'C3': 130.8128, 'Db3': 138.5913, 'D3': 146.8324, 'Eb3': 155.5635, 'E3': 164.8138, 'F3': 174.6141, 'Gb3': 184.9972, 'G3': 195.9977, 'Ab3': 207.6523, 'A3': 220.0000, 'Bb3': 233.0819, 'B3': 246.9417,
// octave 4
'C4': 261.6256, 'Db4': 277.1826, 'D4': 293.6648, 'Eb4': 311.1270, 'E4': 329.6276, 'F4': 349.2282, 'Gb4': 369.9944, 'G4': 391.9954, 'Ab4': 415.3047, 'A4': 440.0000, 'Bb4': 466.1638, 'B4': 493.8833
};

Now that we set up all the audio-related parts of the script, we need to create the main function in the hand-skeleton component which takes the hand and handedness information, and loops through the piano keys and detected joints actively checking for collisions between them. For this we need to define two arrays at the top of the JS file, one for key ids and another for the joints:

const tipJoints = ['thumb-tip', 'index-finger-tip', 'middle-finger-tip', 'ring-finger-tip', 'pinky-finger-tip'];
const keyIds = [
// octave 3
'C3', 'D3', 'E3', 'F3', 'G3', 'A3', 'B3', 'Db3', 'Eb3', 'Gb3', 'Ab3', 'Bb3',
// octave 4
'C4', 'D4', 'E4', 'F4', 'G4', 'A4', 'B4', 'Db4', 'Eb4', 'Gb4', 'Ab4', 'Bb4'
];

Note that I am only using the tip joints here and not all the detected joints. I made this adjustment after some initial testing with some users that found tracking all joints to be problematic in terms of mis-clicks. Using the tip joints only was a better reflection of playing a piano. We also need to add a couple of variables to our init() function:

init: function () {
. . .

this.collidingKeys = { left: {}, right: {} };
this.soundPlayed = { left: {}, right: {} };
},

The collidingKeys object will keep track of the keys currently being interacted with, and soundPlayed will be used to prevent multiple sounds from playing simultaneously for the same key (i.e one key press = corresponding sound plays once).

Now that we have all the needed info, we can create the two main functions for checking interactions and playing the sounds:

checkKeyInteractions: function (hand, handedness) {
for (const keyId of keyIds) {
const keyEl = document.querySelector(`#${keyId}`);
const keyObj = keyEl.object3D;
const keyBB = new THREE.Box3().setFromObject(keyObj);

let isInteracting = false;

for (const jointName of tipJoints) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
if (jointPose) {
const jointPos = new THREE.Vector3().copy(jointPose.transform.position);

if (keyBB.containsPoint(jointPos)) {
isInteracting = true;
break;
}
}
}
}

if (isInteracting) {
if (!this.collidingKeys[handedness][keyId]) {
this.collidingKeys[handedness][keyId] = true;
this.soundPlayed[handedness][keyId] = false;
}
} else {
if (this.collidingKeys[handedness][keyId]) {
delete this.collidingKeys[handedness][keyId];
delete this.soundPlayed[handedness][keyId];
}
}
}

this.playKeySounds(handedness);
},

playKeySounds: function (handedness) {
const keyIds = Object.keys(this.collidingKeys[handedness]);

for (const keyId of keyIds) {
if (!this.soundPlayed[handedness][keyId]) {
const frequency = noteFrequencies[keyId];
if (frequency) {
playNote(frequency);
this.soundPlayed[handedness][keyId] = true;
}
}
}
}

All that’s left to do now is to add the checkKeyInteractions() function in the inputSources loop:

for (const inputSource of inputSources) {
if (inputSource.hand) {
const handedness = inputSource.handedness;
this.checkKeyInteractions(inputSource.hand, handedness);
}
}
Two tracked hands overlaid with two hand models playing the keys of a 3D virtual piano.
Playing the virtual piano without any visual feedback. If only you can hear this photo!

So far this works and it’s pretty cool, but we absolutely need some visual feedback in there. It’s the least we can do without haptic feedback so I had a simple idea for this — change the colour of the key pressed depending on which hand is colliding with it (i.e right or left). The pressed key would change colour to blue if the right hand is used, and red if the left hand is used. For this we’ll need to add two additional functions:

changeKeyColor: function (keyEl, handedness) {
const color = handedness === 'left' ? 'red' : 'blue';
keyEl.setAttribute('color', color);
},

resetKeyColor: function (keyEl) {
const keyId = keyEl.getAttribute('id');
const whiteKeys = [
'C3', 'D3', 'E3', 'F3', 'G3', 'A3', 'B3', 'C4', 'D4', 'E4', 'F4', 'G4', 'A4', 'B4'
];
const blackKeys = [
'Db3', 'Eb3', 'Gb3', 'Ab3', 'Bb3', 'Db4', 'Eb4', 'Gb4', 'Ab4', 'Bb4'
];

if (whiteKeys.includes(keyId)) {
keyEl.setAttribute('color', 'white');
} else if (blackKeys.includes(keyId)) {
keyEl.setAttribute('color', 'black');
}
}

Two tracked hands overlaid with two hand models playing the keys of a 3D virtual piano. Pressed keys change colour to Red if the user is pressing with the left hand, and to Blue if pressing with the right hand.
Success! Playing the piano in WebXR activated!
A GIF showing two tracked hands playing the keys of a 3D virtual piano. Pressed keys change colour to Red if the user is pressing with the left hand, and to Blue if pressing with the right hand.
Take it away, Muadh!

I had a chance to do some informal testing of this demo and got some great feedback to further improve it which I hope can give you some ideas as well:

  • Play around with key sizes, spacing and orientation: the spacing between each key got some positive feedback from testers, but I found key sizes could be a bit smaller for a more natural playing feeling. You can also consider adjusting the orientation of keys so that they’re tilted down to make it more comfortable for users to play.
  • Key press animation: due to the lack of tactile feedback, a key press animation could be a good addition for better visual feedback. You can make use of A-Frame’s a-animation component for this.
  • Think outside the box to provide tactile feedback: I used a physical table as tactile feedback for this demo, where users would feel their key press by “tapping” on the table which made a positive difference. I haven’t enforced this in this demo though, but you can ensure the piano is always placed on a surface by using the Hit Test and Anchors WebXR Device API modules. Another idea which was interesting is using the haptics on the controllers where they can be placed on the same surface as the piano and would vibrate if a key is pressed. A-Frame’s haptics component could provide a good starting point for this. You can also learn more about implementing vibrations over the web by referring to the Vibration API.
  • Key sensitivity can be an issue: since the users can’t technically “feel” the key press, mis-collisions can occur where the user would unintentionally “press” on a key. This can be mitigated by adding custom thresholds, for example — hand tips need to meet a certain penetration depth into the keys, or keys need to be pressed for a certain amount of time to count as a press.
  • I created notes from scratch here, which is not the go-to way of using the Web Audio API. Audio playback is usually the most straight forward way to get started, and you can find various open source project that provide piano key samples on GitHub such as Salamander Grand Piano v3 and fuhton/piano-mp3.
  • Keep in mind that Positional Audio is a big part of WebXR environments where you can attach audio to different objects in your scene or/and at different locations and distances. I didn’t cover this in this post but it’s definitely worth learning more about since it’s very relevant to WebXR experiences. You can check out the WebXR Immersive Web Sample or the Three.js documentation to learn more about this.
  • This demo was a piano, but the web is truly your canvas when it comes to this API and you can simulate any other instruments!

And that’s that, I hope this helps in unleashing your musical adventures in WebXR!

Muadh out — Until next time! 🤜🤛

P.S.

The full script should now look like this. You can also grab the full script from the web-audio-hand-input-aframe GitHub repo:




AR Piano

function playNote(frequency) {
const oscillator = audioContext.createOscillator();
const gainNode = audioContext.createGain();
oscillator.type = 'sine';
oscillator.frequency.setValueAtTime(frequency, audioContext.currentTime);
oscillator.connect(gainNode);
gainNode.connect(audioContext.destination);
oscillator.start();
gainNode.gain.exponentialRampToValueAtTime(0.001, audioContext.currentTime + 1);
oscillator.stop(audioContext.currentTime + 1);
}

const noteFrequencies = {
// octave 3
'C3': 130.8128, 'Db3': 138.5913, 'D3': 146.8324, 'Eb3': 155.5635, 'E3': 164.8138, 'F3': 174.6141, 'Gb3': 184.9972, 'G3': 195.9977, 'Ab3': 207.6523, 'A3': 220.0000, 'Bb3': 233.0819, 'B3': 246.9417,
// octave 4
'C4': 261.6256, 'Db4': 277.1826, 'D4': 293.6648, 'Eb4': 311.1270, 'E4': 329.6276, 'F4': 349.2282, 'Gb4': 369.9944, 'G4': 391.9954, 'Ab4': 415.3047, 'A4': 440.0000, 'Bb4': 466.1638, 'B4': 493.8833
};

AFRAME.registerComponent('hand-skeleton', {
init: function () {
this.frame = null;
this.referenceSpace = null;
this.collidingKeys = { left: {}, right: {} };
this.soundPlayed = { left: {}, right: {} };
},

tick: function () {
if (!this.frame) {
this.frame = this.el.sceneEl.frame;
this.referenceSpace = this.el.sceneEl.renderer.xr.getReferenceSpace();
}

if (this.frame && this.referenceSpace) {
const session = this.el.sceneEl.renderer.xr.getSession();
const inputSources = session.inputSources;

for (const inputSource of inputSources) {
if (inputSource.hand) {
const handedness = inputSource.handedness;
this.checkKeyInteractions(inputSource.hand, handedness);
}
}
}
},

checkKeyInteractions: function (hand, handedness) {
const tipJoints = [
'thumb-tip', 'index-finger-tip', 'middle-finger-tip', 'ring-finger-tip', 'pinky-finger-tip'
];

const keyIds = [
// octave 3
'C3', 'D3', 'E3', 'F3', 'G3', 'A3', 'B3', 'Db3', 'Eb3', 'Gb3', 'Ab3', 'Bb3',
// octave 4
'C4', 'D4', 'E4', 'F4', 'G4', 'A4', 'B4', 'Db4', 'Eb4', 'Gb4', 'Ab4', 'Bb4'
];

for (const keyId of keyIds) {
const keyEl = document.querySelector(`#${keyId}`);
const keyObj = keyEl.object3D;
const keyBB = new THREE.Box3().setFromObject(keyObj);

let isInteracting = false;

for (const jointName of tipJoints) {
const joint = hand.get(jointName);
if (joint) {
const jointPose = this.frame.getJointPose(joint, this.referenceSpace);
if (jointPose) {
const jointPos = new THREE.Vector3().copy(jointPose.transform.position);

if (keyBB.containsPoint(jointPos)) {
isInteracting = true;
break;
}
}
}
}

if (isInteracting) {
if (!this.collidingKeys[handedness][keyId]) {
this.collidingKeys[handedness][keyId] = true;
this.soundPlayed[handedness][keyId] = false;

this.changeKeyColor(keyEl, handedness);
}
} else {
if (this.collidingKeys[handedness][keyId]) {
delete this.collidingKeys[handedness][keyId];
delete this.soundPlayed[handedness][keyId];

const otherHand = handedness === 'left' ? 'right' : 'left';
if (!this.collidingKeys[otherHand][keyId]) {
this.resetKeyColor(keyEl);
}
}
}
}

this.playKeySounds(handedness);
},

playKeySounds: function (handedness) {
const keyIds = Object.keys(this.collidingKeys[handedness]);

for (const keyId of keyIds) {
if (!this.soundPlayed[handedness][keyId]) {
const frequency = noteFrequencies[keyId];
if (frequency) {
playNote(frequency);
this.soundPlayed[handedness][keyId] = true;
}
}
}
},

changeKeyColor: function (keyEl, handedness) {
const color = handedness === 'left' ? 'red' : 'blue';
keyEl.setAttribute('color', color);
},

resetKeyColor: function (keyEl) {
const keyId = keyEl.getAttribute('id');
const whiteKeys = [
'C3', 'D3', 'E3', 'F3', 'G3', 'A3', 'B3', 'C4', 'D4', 'E4', 'F4', 'G4', 'A4', 'B4'
];
const blackKeys = [
'Db3', 'Eb3', 'Gb3', 'Ab3', 'Bb3', 'Db4', 'Eb4', 'Gb4', 'Ab4', 'Bb4'
];

if (whiteKeys.includes(keyId)) {
keyEl.setAttribute('color', 'white');
} else if (blackKeys.includes(keyId)) {
keyEl.setAttribute('color', 'black');
}
}
});

Read more

Trending News