Taking a screenshot of the web page you’re browsing is a pretty straightforward task to complete. It’s almost second nature for users now to simply click on the “PrtScn” button on their keyboard to grab a screenshot. So why am I yapping about an ancient browsing feature that’s been in our lives since the 1980s roughly?
Here is why— I was using an XR headset recently and I needed to take a screenshot for something that I was working on — for that I had to click on two buttons simultaneously (i.e. the infamous power + volume up/down combo). This wasn’t really surprising, I have been using XR devices for a while and I am fully aware simultaneous button clicks are the default method to take screenshots. But it did get me thinking (typing?)… why is the default method so physically demanding? And why is it very similar to how we take screenshots on mobile phones even though the form factor and ergonomics of a mobile phone and a headset are so starkly different?
Join me in questioning some fundamental design choices in XR in this post to explore the rationale behind them.
Physically? At this point in time? Not at all! They look, weigh and feel different. But — they are both smart devices that are powered by similar (if not the same) operating systems. This is mainly why some user interaction methods and UI elements migrate in a way from a phone to an XR headset. For example — if you are using an Android phone, you need to press (and release) the power and volume down buttons simultaneously. Now if you use an Android XR headset, you will most likely be expected to carry out the same action to take a screenshot. This is also generally applicable to other operating systems for phones and their headset counterparts.
From a design/user experience point of view this makes perfect sense — users already have many years of experience in taking screenshots using this method on their phones so there is a minimal learning curve, and development wise — you’re minimising potentially duplicate work by re-using some of these components which can work across both devices. This all makes sense, but I do have a few issues with some of these user experience assumptions when it comes to the default method of tasking screenshots:
Same action, but more limbs
Taking a screenshot by clicking on two buttons simultaneously on a mobile phone can generally be done using one hand. Using an XR headset however, this manual action becomes bi-manual and requires the use of both hands to do comfortably — mainly due to the buttons being placed at opposite ends of the headset. More limbs means higher physical and mental loads — this is not to say that it’s impossible to press on two buttons simultaneously using one hand when wearing a headset, but it’s highly likely to be uncomfortable and also cause the headset to physically move slightly while worn, which accordingly affects the quality of the screenshot.
Same action, but different physical loads
Phones and XR headsets have, obviously, different designs, and even though users are expected to do the same action to take a screenshot (i.e. click on two buttons simultaneously), they are completing that action along two different physical widths. Most mobile phones have an average width spanning from ~2 to 3 inches, while XR headsets have an average width spanning from ~4 to 8 inches — this is almost double the width that users are expected to press buttons along. Another key difference between doing the same action on the two devices is the height and arm movement required to press on the buttons — users can generally take a screenshot on a mobile phone while resting their hand in practically any position. Using an XR headset however, users are generally required to raise (and momentarily hold) both their arms to reach and click on two buttons simultaneously on the top of the headset.
Different clicking consequences
When taking a screenshot on your phone by pressing on two buttons simultaneously, there is generally no pressure on any other body part (you may argue there is a bit of physical strain on the hand from holding the phone, but that’s super minor). Using a headset however, pressing on buttons puts some pressure/strain against the face and head that’s wearing the headset (you are wearing something on your face and forehead after all). For some this may even be slightly painful if the headset is not secure in place while clicking on buttons simultaneously, which can be far from ideal and user friendly.
There is a learning curve (kind of)
I know I said earlier there is a minimal learning curve to press on two buttons simultaneously, but allow me to change my mind slightly — I personally think there — kind of is — a learning curve when it comes to taking screenshots using XR headsets, as users need to know where the right buttons are on a headset which are not physically the same as on a phone. This learning curve becomes even steeper for novice users that haven’t used XR headsets before.
This will probably be hard to believe after everything I said above — but I am not actually saying we should get rid of simultaneous button clicks to take screenshots. It’s a great, secure and very familiar method to users (though users are definitely interacting with two different devices in terms of physical features and form factor). Pressing on two buttons simultaneously buttons greatly limits human error by design (i.e. taking screenshots by accident) because it’s a very intentional user action where actual physical buttons need to be pressed.
What I am trying to say in this post is that — it is absolutely crucial to provide users with different options for taking screenshots for better accessibility and overall user experience. The good news is — most, if not all, XR headset manufacturers provide an alternative controller based method for taking screenshots (in addition to the default method of clicking on two buttons simultaneously). The not so great news however — is that not many provide natural methods to take screenshots via voice, gaze or speech. This blows my mind slightly because we heavily rely on using natural methods of interaction in immersive environments anyway, so it makes sense to add an alternative method that makes use of natural user interfaces in XR. At this point you’d be right to think “add yet another method? Isn’t that a bit of an overkill for a simple screenshot action?” and that’s a good point, but hear me out in these super quick scenarios:
- User wants to take a screenshot, but can’t physically reach to press on physical buttons— they can use a simple voice command.
- User wants to take a screenshot, but their controller battery ran out and they can’t use it — they can click on two buttons simultaneously or use a simple voice command.
- User wants to take a screenshot, but they are unable to speak out loud in the environment they’re in or press on physical buttons— they can use a gesture based method.
Because interaction in XR goes beyond clicking on physical buttons and desktop like interactions, providing alternative methods that suit different user actions and immersive contexts would ensure users are never stuck, and would generally improve accessibility and usability.
The main point I am trying to get across here is that — just because an action is the same (i.e. you press on two physical buttons simultaneously), doesn’t necessarily mean it equates to the same physical and mental loads to complete that action across immersive and non-immersive devices.
With XR devices now progressing rapidly in terms of physical design, providing alternative methods and more natural ones that are suitable for immersive experiences will ensure user needs/preferences are met. This is especially important if future XR devices start moving away from physical buttons and controllers — which is not very far off. Providing more natural methods ensures we are ahead of this inevitable curve.
Screenshots (and even screen recordings) are important features for XR experiences and should ideally be very easy to access in whatever context the user is in, as they are one of the main methods for users to share what they’re experiencing and seeing in immersive environments.
Make taking screenshots easier. The fewer the steps the user needs to take — the better.
Muadh out — Until next time! 🤜🤛