Cricket is at the heart of Australian sporting culture. But while you can see replays of famous matches on the television, what if you could experience a moment in cricket history play out right in front of you using stumps lying around in your backyard?
With augmented reality, this is possible — and something I recently explored as part of a prototype I developed for an AR cricket experience.
The concept was relatively straightforward: an AR experience involving a virtual cricket player walking up to stumps and performing a batting routine. But rather than simply create a canned AR experience that could be used in a specific public space, I wanted to create an app that could detect physical cricket stumps. This way, people would use the app at home with their own cricket equipment, allowing them to own the AR experience.
Since the field of XR (Extended Reality) is still quite new and uses for this technology are yet to be fully explored, I spend much of my time as Art Processors’ XR Developer on research and development. So in this post, the first of a two-part series, I want to show you how I created this AR cricket experience.
Since the experience required solid cricket stump detection, we’ll take a look at the different methods required to accomplish detection in the current AR landscape — and how the most obvious solutions didn’t always work and required some out-of-the-box thinking.
Developing an AR Cricket Experience: Exploring ARKit
For this prototype, I wanted to focus on using a single AR framework to test my ideas. With this in mind, I used Apple’s ARKit since it works natively on iOS and let me iterate quickly.
Apple breaks ARKit into 3 different layers: tracking, scene understanding and rendering. In this post, we’ll focus on exploring and extending the scene understanding aspects of ARKit.
Apple gives developers a few tools to help understand the AR world. The 2 we’ll be looking at in this post are image and object detection. In part 2 of this series, we’ll explore some of their limits using custom deep learning-based object and landmark detectors.
ARKit Object Detection
One of ARKit’s main features is the ability to detect and localise a 3D object in a scene. To accomplish this, you must first scan and then save a representation of this object (see below). When you’ve done this, ARKit can then detect the object in a new environment.
An object in the world (left) and its representation in ARKit after scanning (right). Source: Apple
To understand the limitations of this method (and ultimately why ARKit’s approach didn’t work for me) you must first look at how ARKit understands the world.
To accomplish world tracking, ARKit uses a method called visual inertial odometry (VIO). To break this down, let’s first look at the inertial part of VIO.
In inertial tracking, an inertial measurement unit (IMU) is used to measure the change in position and rotation of the mobile device. This is accomplished using a device’s gyroscope, which allows rotation to be tracked, and the accelerometer, which allows for the change in position to be tracked. This solution isn’t perfect, however, as using an accelerometer for calculating position leads to accumulated error over time due to the math needed to find the position and the accuracy of the sensors. Ultimately, this means that using an IMU solely for world tracking is insufficient for AR, which needs stable tracking.
This is where the visual part of VIO comes in. In visual tracking, you can compare 2 images (e.g. the current and previous frame of a camera feed) using techniques like feature point detection and optical flow, which lets you calculate the movement of the device (see below). While not as responsive as inertial tracking, visual tracking allows ARKit to correct for the drift caused by the accumulated error of the inertial tracking.
Calculating the change in position of a device using visual odometry. Feature points are represented by the blue, red and orange circles. Source: Apple.
The key aspect we’ll be exploring in this post with regards to world tracking is this feature point detection step for the visual part of VIO.
There are many different algorithms for feature point detection, but the general goal is to find points in an image that are stable (i.e. won’t change too much) and distinct (i.e. unique enough so we can tell which points correspond to each other). With this in mind, most feature detection algorithms prefer images that have objects with a lot of texture, like a piece of wood.
Relating this back to object detection, it turns out that the method used by the ARKit object detector is based from these features found during visual tracking. This means that an ideal object to scan will also have varying texture and visually distinct points similar to an ideal scene for world tracking.
Unfortunately this wasn’t the case with the solid yellow cricket stumps I used for my prototype.
To further explore (and hopefully help anyone reading this who might run into these issues), let’s take a look at what happens when you scan an object like the cricket stumps pictured below.
The stumps are first scanned (left). Then a test is done in the same environment which is successful (middle). Lastly, a test in a different environment is unsuccessful (right).
Scanning was successful and when tested in the scanned environment, ARKit was able to recognise and localise the stumps in a fairly short amount of time (3-5 seconds). The problem, however, was when the object was moved to a new environment — no matter which angle I tried, recognition was never successful.
Note: I’m not sure which feature detector ARKit uses behind the scenes, but ORB is fast, accurate and free so it is a solid choice.
A test image fed to the ORB feature detector (left). Detection results shown in green and overlaid on the test image (right).
As you can see in the image above, the points that ORB detected are in green. To understand this inability to relocate the object and have it recognised, take a look at the top left of the cricket stumps. There, you can see the features detected were caused by the woodgrain background intersecting with the stump.
This means that if I moved the stumps to any other surface, the background would likely not have the same corresponding points, meaning the point cloud for the stumps would be totally different. This ultimately led to ARKit not being able to find the object in any other environment then the one it was scanned in.
Aside: Merging ARKit Object Scans
The ARKit scanning app has the ability to take scans of an object in multiple locations and merge the results in an attempt to fix some of these scene dependencies. However, when I tested this, the stumps were so scene dependent that the merging processes wasn’t successful.
With this in mind, it was time to try a different approach.
ARKit Image Detection
ARKit offers another method for finding user defined objects in the form of image detection.
Image detection works on similar principles to object detection. However, instead of using the 3D features of the VIO, image detection works directly with the output from a feature detector.
When you run the same detector on a 2D reference image, ARKit is then able to compare the current scene with this image. If enough points match between the 2 images, a perspective transform can be applied to localise the image in the AR world (see below).
A feature detection algorithm is run on a reference image (left) and the current scene (right). If enough matching points exist, a perspective transform is applied to find the transform of the image in the world. Source: Apple
One of the requirements of this kind of image detection is the fact that in order for the perspective transform to be correct, both the reference image and the image in the world must share the same plane. This means that it only works on flat images like a painting or printed picture.
Knowing this, I was curious to see if the image recognition could work on a sticker attached to the curved surface of one of the cricket stumps.
Short answer, no (see below).
A sticker placed on a stump to test 2D image recognition. No results found.
Looking back, I may have been able to get it to work if I used a smaller sticker or one with better features (as image detection relies on feature point detection, a textured image would’ve worked better). But the initial failure led me to think that there must be a better way to achieve cricket stump detection.
Ultimately, ARKit didn’t work and I had to find another method for localising the cricket stumps in the AR world. In part 2 of this series, we’ll explore other methods I tested, and my ultimate success in creating a an AR cricket experience prototype.
Hopefully, this post has been informative if you’re seeking to use object detection for an AR experience. If you would like to know more about the concept or the motivation behind creating this experience, get in touch: @email.