A Concise Beginner's Guide to Apple Vision Pro Design & Development

Apple Vision Pro has brought new ideas to the table about how XR apps should be designed, controlled, and built. In this Guest Article, Sterling Crispin offers up a concise guide for what first-time XR developers should keep in mind as they approach app development for Apple Vision Pro.

Guest Article by Sterling Crispin

Sterling Crispin is an artist and software engineer with a decade of experience in the spatial computing industry. His work has spanned between product design and the R&D of new technologies at companies like Apple, Snap Inc, and various other tech startups working on face computers.

Editor’s Note: The author would like to remind readers that he is not an Apple representative; this info is personal opinion and does not contain non-public information. Additionally, more info on Vision Pro development can be found in Apple’s WWDC23 videos (select Filter → visionOS).

Ahead is my advice for designing and developing products for Vision Pro. This article includes a basic overview of the platform, tools, porting apps, general product design, prototyping, perceptual design, business advice, and more.

Overview

Apps on visionOS are organized into ‘scenes’, which are Windows, Volumes, and Spaces.

Windows are a spatial version of what you’d see on a normal computer. They’re bounded rectangles of content that users surround themselves with. These may be windows from different apps or multiple windows from one app.

Volumes are things like 3D objects, or small interactive scenes. Like a 3D map, or small game that floats in front of you rather than being fully immersive.

Spaces are fully immersive experiences where only one app is visible. That could be full of many Windows and Volumes from your app. Or like VR games where the system goes away and it’s all fully immersive content that surrounds you. You can think of visionOS itself like a Shared Space where apps coexist together and you have less control. Whereas Full Spaces give you the most control and immersiveness, but don’t coexist with other apps. Spaces have immersion styles: mixed, progressive, and full. Which defines how much or little of the real world you want the user to see.

User Input

Users can look at the UI and pinch like the Apple Vision Pro demo videos show. But you can also reach out and tap on windows directly, sort of like it’s actually a floating iPad. Or use a bluetooth trackpad or video game controller. You can also look and speak in search bars. There’s also a Dwell Control for eyes-only input, but that’s really an accessibility feature. For a simple dev approach, your app can just use events like a TapGesture. In this case, you won’t need to worry about where these events originate from.

Spatial Audio

Vision Pro has an advanced spatial audio system that makes sounds seem like they’re really in the room by considering the size and materials in your room. Using subtle sounds for UI interaction and taking advantage of sound design for immersive experiences is going to be really important. Make sure to take this topic seriously.

Development

If you want to build something that works between Vision Pro, iPad, and iOS, you’ll be operating within the Apple dev ecosystem, using tools like XCode and SwiftUI. However, if your goal is to create a fully immersive VR experience for Vision Pro that also works on other headsets like Meta’s Quest or PlayStation VR, you have to use Unity.

Apple Tools

For Apple’s ecosystem, you’ll use SwiftUI to create the UI the user sees and the overall content of your app. RealityKit is the 3D rendering engine that handles materials, 3D objects, and light simulations. You’ll use ARKit for advanced scene understanding, like if you want someone to throw virtual darts and have them collide with their real wall, or do advanced things with hand tracking. But those rich AR features are only available in Full Spaces. There’s also Reality Composer Pro which is a 3D content editor that lets you drag things around a 3D scene and make media rich Spaces or Volumes. It’s like diet-Unity that’s built specifically for this development stack.

One cool thing with Reality Composer is that it’s already full of assets, materials, and animations. That helps developers who aren’t artists build something quickly and should help to create a more unified look and feel to everything built with the tool. Pros and cons to that product decision, but overall it should be helpful.

Existing iOS Apps

If you’re bringing an iPad or iOS app over, it will probably work unmodified as a Window in the Shared Space. If your app supports both iPad and iPhone, the headset will use the iPad version.

To customize your existing iOS app to take better advantage of the headset you can use the Ornament API to make little floating islands of UI in front of, or besides your app, to make it feel more spatial. Ironically, if your app is using a lot of ARKit features, you’ll likely need to ‘reimagine’ it significantly to work on Vision Pro, as ARKit has been upgraded a lot for the headset.

If you’re excited about building something new for Vision Pro, my personal opinion is that you should prioritize how your app will provide value across iPad and iOS too. Otherwise you’re losing out on hundreds of millions of users.

Unity

You can build to Vision Pro with the Unity game engine, which is a massive topic. Again, you need to use Unity if you’re building to Vision Pro as well as a Meta headset like the Quest or PSVR 2.

Unity supports building Bounded Volumes for the Shared Space which exist alongside native Vision Pro content. And Unbounded Volumes, for immersive content that may leverage advanced AR features. Finally you can also build more VR-like apps which give you more control over rendering but seem to lack support for ARKit scene understanding like plane detection. The Volume approach gives RealityKit more control over rendering, so you have to use Unity’s PolySpatial tool to convert materials, shaders, and other features.

Unity support for Vision Pro includes for tons of interactions you’d expect to see in VR, like teleporting to a new location or picking up and throwing virtual objects.

Product Design

You could just make an iPad-like app that shows up as a floating window, use the default interactions, and call it a day. But like I said above, content can exist in a wide spectrum of immersion, locations, and use a wide range of inputs. So the combinatorial range of possibilities can be overwhelming.

If you haven’t spent 100 hours in VR, get a Quest 2 or 3 as soon as possible and try everything. It doesn’t matter if you’re a designer, or product manager, or a CEO, you need to get a Quest and spend 100 hours in VR to begin to understand the language of spatial apps.

I highly recommend checking out Hand Physics Lab as a starting point and overview for understanding direct interactions. There’s a lot of subtle things they do which imbue virtual objects with a sense of physicality. And the Youtube VR app that was released in 2019 looks and feels pretty similar to a basic visionOS app, it’s worth checking out.

Keep a diary of what works and what doesn’t.

Ask yourself: ‘What app designs are comfortable, or cause fatigue?’, ‘What apps have the fastest time-to-fun or value?’, ‘What’s confusing and what’s intuitive?’, ‘What experiences would you even bother doing more than once?’ Be brutally honest. Learn from what’s been tried as much as possible.

General Design Advice

I strongly recommend the IDEO style design thinking process, it works for spatial computing too. You should absolutely try it out if you’re unfamiliar. There’s Design Kit with resources and this video which, while dated, is a great example of the process.

The road to spatial computing is a graveyard of utopian ideas that failed. People tend to spend a very long time building grand solutions for the imaginary problems of imaginary users. It sounds obvious, but instead you should try to build something as fast as possible that fills a real human need, and then iteratively improve from there.

Continue on Page 2: Spatial Formats and Interaction »

Spatial Formats and Interaction

You should expect people to be ‘lazy’ and want to avoid moving most of the time. Generally in spatial computing the more calories people burn using your app the less they’ll use it. I’m not saying you shouldn’t build your VR boxing game. But you should minimize the required motion as much as possible, even if it’s a fundamental part of what your app is.

To that point, the purpose of your app should be reflected in its spatial arrangements and interaction pattern—form follows function.

For example, if you’re making a virtual piano app, you probably want to anchor it on a desk so people make contact with a physical surface when they touch a key.

There’s a saying like, ‘when you want to say something new, use a familiar language.’ If every aspect of your app is totally innovative it will likely be incomprehensible to users. So pick and choose your battles when it comes to innovation and make sure there’s a familiarity in the UI and experience.

Prototyping

I highly recommend paper and cardboard prototyping. Don’t start in Figma. Literally get some heavy weight paper or cardboard and make crude models of your interface. If you’re expecting users to directly touch your UI, pay attention to how much muscle strain in your shoulder the design creates. Use masking tape against a wall and sticky notes to mock up some UI. Then take a few steps back from it, pretend you’re in VR, and feel out how much head motion your layout requires.

Again I think everyone needs a Quest to try existing apps. And as prototyping tools they can be great, even before writing any code. There’s an app called ShapesXR that lets you sketch out ideas in space, create storyboards, and it supports real time collaboration with remote users. It can be a great tool during early development of spatial apps.

You also use the Quest to mockup ‘AR in VR’ by creating a scene with a realistic virtual living room, and having other objects appear as if they’re AR. It’s not as good as a full passthrough setup, but it’s better than nothing. And the virtual living room is helpful if you’re sharing the demo with people in other locations. If you have a bigger budget, Quest Pro will let you do full ‘passthrough AR’, just like Vision Pro, which can give you a head-start in your prototyping process before you can get your hands on Apple’s headset.

If you’re building with the budget of a larger company you might even want to consider a Varjo XR-3. It’s the current Rolls Royce of VR and the closest thing to the Vision Pro on the market with high quality passthrough, high res displays, hand tracking, world mapping, etc. But they’re $6,500 each and need a $2–3k PC to power them. If you’re a giant company with the budget and worried about getting access to a Vision Pro dev kit I would probably get at least one XR-3 setup.

Visual and Perceptual Comfort

When designing for spatial computing in general you need to consider the whole body of your user, their sensory systems, and how their brain integrates those senses.

For example, you might arrange an iPhone app to have a menu near the bottom of a screen for easy reach of a user’s thumb. Likewise in VR you might arrange the UI to be centrally located to a user’s natural line of sight, so that head and eye motion is minimized. Every design choice has both ergonomic and cognitive impacts. Fitts’ Law is useful, but there are so many other things to consider.

I highly recommend watching this WWDC talk in its entirety if you’re new to spatial design. It covers a lot of perceptual and cognitive design constraints that are unique to spatial computing. Your design choices will either create or reduce eye fatigue, discomfort, and motion sickness. If your app makes people sick or hurts their neck, it’ll outweigh anything good your app does.

Users tend to attribute these problems to headsets themselves, but it’s also the responsibility of each app to keep users comfortable.

UI Design

Honestly for UI design you should just copy as much of what Apple has already figured out and published as guidelines so your app blends in. They make it easy to create a good looking UI if you use their tools. But generally you want to be subtle with space and motion, don’t go wild just because you can. Don’t make icons or text 3D. Usually a 2.5D approach is best, where it’s basically a 2D UI with some depth to communicate hierarchy. Again, look around at what works on the Quest and the decisions Apple made. You don’t need to reinvent the wheel unless the point of your app is to experience novel kinds of interaction.

Web Design

Vision Pro is another device for responsive web design, but don’t think of it like another 2D screen. Like I said above, when you’re designing for spatial computing it’s valuable to make some paper prototypes, stick some stuff to a wall, back up, and understand how your design decisions will impact someone’s whole body. You have to unlearn habits from desktop and mobile design.

Also there’s cool opportunities to use WebXR, which allows websites to become fully immersive VR experiences. If your website is media rich or dealing with anything potentially 3D, you should do something with WebXR. There’s a few WWDC talks that cover this topic.

Games and Media

In my personal opinion, you’ll likely want to use Unity to build a game or complex immersive experience for Vision Pro instead of the Apple ecosystem tools. Technically you could build a game with SwiftUI and RealityKit and ARKit but it might be pretty painful.

However there are some reasons to use their tools instead of Unity. Maybe you’re an iOS dev and are super comfortable with Swift and Apple’s dev ecosystem, and you don’t want to learn Unity, and you don’t care if your experience works on any other headset. Or maybe you want to build something that’s extremely high performance and really push what’s possible on the device.

Designing Games and Experiences:

Everything in this article so far also applies to designing games and immersive experiences. And like I said, the combinatorial possibilities between spatial locations and input methods can be overwhelming. So brainstorm broadly at first, make paper prototypes, and try as much Quest content as you can.

If you’re coming from film making or 2D media, learn from game developers and how they guide a user’s attention through an interactive 3D space. A lot of that is applicable to spatial computing.

Remember, an experience could be 2D and appear in a floating window, like a PS4 game shown on a virtual TV or movie theater. Or anchored to a wall of a room, or horizontally on the surface of a table, or on the floor. The same goes for 3D experiences, which then can also become fully immersive like VR, or still grounded in the player’s real room. Or you can make weird 2.5D things with elements ‘leaking’ out of a 2D screen and into the rest of the room.

Each one of those contexts provides totally different opportunities for gameplay mechanics. You should take advantage of these new possibilities and not just VR-ify a game that already exists.

As for inputs, you can look at something and pinch, or reach out and touch it if it’s floating in space or anchored to a surface, you can use bluetooth game controllers, a keyboard and mouse, or your voice. All of these also provide unique opportunities for gameplay.

And you could incorporate other devices. I think there’s a lot of opportunity to design around the fact that there will be way more iPhones than Vision Pros.

So what would an experience be like with one player in the Vision Pro and another on their phone, either in the same room or anywhere in the world? Maybe the Vision Pro wearer is like Godzilla and they’re walking through a 3D city destroying it. The iPhone players might be looking at a map and deploying army troops to stop them, which then show up in the immersive city. There are a few existing VR games that are designed around this kind of asymmetry and have gameplay like this. Keep Talking and Nobody Explodes is a good example, where the VR user is trying to diffuse a bomb and the player not in VR is telling them what wire to cut based on instructions only they can see.

Existing PC Games

Apple announced a developer tool called Game Porting Toolkit for bringing high end PC games to the Mac. I’m not sure if they’ve explicitly said ported games could run on the Vision Pro yet or not. But with the virtual desktop feature you’ll be able to see your Mac as a giant virtual screen in the Vision Pro, connect a bluetooth video game controller, and go nuts. If you’re technically minded and don’t want to wait there’s some guides on getting existing games like Elden Ring running on your Mac today. But remember this is a dev tool and existing games haven’t been optimized.

Startups and Businesses

There’s obviously a ton of potential business use cases for Vision Pro and spatial computing in general. Everything from the Product Design section of this article heavily applies and I would read that first. Start with real problems that real people have and try to solve those. Resist the urge to imagine something fantastical people might eventually want in the far future.

Continue on Page 3: Strengths of Spatial Computing »

Strengths of Spatial Computing

Spatial computing in general is great for teaching people spatial things. It would be much easier to learn how to assemble Ikea furniture in VR than using the paper instructions. To that point, the aircraft manufacturer Airbus got rid of all their paper instructions for assembly and does everything on tablet computers. And they’ve used the HoloLens to help speed up the process of installing hundreds of miles of wiring in airplanes during manufacturing.

Likewise, it’s great for viewing things at scale and relating to them with your body. Amazon’s mobile app already has a ‘View In Your Room’ feature for lots of products so you can see a couch in your room with AR and understand it in context and at proper scale. You can imagine how much better things like that might be in a headset.

The range of user inputs is going to be great for expressive applications. You could imagine an audio production app that simulates a ton of music equipment in a more tactile way, turning your desk into drum pads and keyboards.

And it’s obviously great for immersive media.

Overall there’s likely going to be a bunch of general uses that are 5% better because you can surround yourself with virtual screens. But I would try to focus on transformative moments and real problems real people have.

Weaknesses of Spatial Computing

It’s not great for anything you need to move quickly while doing. A spatial computing golf swing trainer that records your motion and plays back the best swings might sound fun. But wearing a headset while you’re doing that is probably going to give you motion sickness, the computer vision that tracks your body will likely fail, and the headset might go flying off your head and break.

An idea like that might work if you put an iPhone on a tripod and filmed the person using ARKit’s body tracking API. But even then, fast motion will likely break it.

VR and AR headsets are generally not great for long term use. I think I read that the average PSVR user spends 50 minutes in an experience, which is already a long time. I don’t have anything to say about the Vision Pro on this topic. But in general head mounted displays have historically been best for short duration uses.

Fail Fast and Pivot

People say ‘Fail Fast’ but may not practice that mindset. This talk by Eric Ries
is basically a condensed version of his Lean Startup book and extremely relevant to product development for spatial computing. Build something fast that mocks up your idea and get user feedback, and pivot away from things that aren’t working. You can ‘wizard of oz’ some features and fake them just to get user feedback quickly.

For Existing Products

You should ship as much of your existing app as you can to the Vision Pro. But try to identify specific moments that would benefit from spatial computing and make those the highlight of your app. Again, check out ShapesXR on the Quest and use it to make crude mockups in VR and have people try them to get feedback. Resist making things overly spatial and spreading out content everywhere just because you can, it’s easy to make a mess out of an experience.