XbotGo has released three generations of AI sports cameras over the years:
- The original XbotGo Gimbal was essentially a 3-axis gimbal. Tracking and video processing ran on the smartphone.
- Chameleon is a 2-axis motorized pan/tilt mount for smartphones. Tracking runs on the device itself, while video processing still relies on the smartphone.
- Falcon is a standalone, all-in-one device that can pan and tilt by itself, with both tracking and video processing handled on the device.
FoloCam is conceptually closest to the first-generation gimbal approach, but instead of proprietary hardware, it uses Apple’s DockKit framework to control third-party smart gimbals. To the best of our knowledge, this is the first use of DockKit specifically for automated sports tracking.
AI Processing Power
The difference in available AI computing power is substantial. The Chameleon is reportedly equipped with a 2 TOPS Neural Processing Unit (NPU), while the Falcon’s NPU is rated at 6 TOPS — roughly equivalent to an iPhone 11.
FoloCam, by contrast, runs on the iPhone itself and supports iPhone 13 and newer. The iPhone 13 already delivers roughly 16 TOPS, and newer models continue to scale upward, with all iPhone 16 and 17 series reaching 35 TOPS. Because the computing power comes from the iPhone, FoloCam can use progressively larger and more sophisticated tracking models on newer devices, improving accuracy over time without requiring new dedicated hardware.
As a solid example of what more AI power can bring, on newer iPhones (15 or 14 Pro), FoloCam supports a more advanced AI tracking model that improves tracking accuracy and reliability, and this model is too slow to run on older devices.
Camera
The Falcon's main sensor records 4K video with roughly a 90-degree horizontal field of view. That yields about 44 pixels per degree, slightly better than most 180° panoramic systems. The secondary sensor is only 2K, and its exact function is not clearly documented. The system does not appear to offer optical zoom, so zooming relies on cropping the main sensor, which reduces image quality at higher zoom levels.
Modern iPhones offer more flexible imaging options. Standard iPhone 15 and newer models provide lossless 2× zoom from the main sensor, which aligns well with FoloCam’s recommendation of using 2× zoom to film full soccer fields. All Pro models also include dedicated 3× telephoto cameras, allowing FoloCam to capture 4K footage with significantly more detail at long distances.
For a soccer game played on a full-size field, filmed at 2× zoom, the resolution is 4K pixels spread over only 35°. This gives over 115 pixels per degree — roughly 2.5 times more detail than the XbotGo Falcon. The following table shows how subjects appear at 75 yards — a typical distance for filming a full-size soccer field:
| XbotGo Falcon | FoloCam (2× zoom) | |
|---|---|---|
| Pixels per degree | 44 | 115 |
| Player (5 ft tall) at 75 yards | 14px by 56px | 36px by 144px |
| Soccer ball (9″ diameter) at 75 yards | 9px | 23px |
Tracking
Tracking philosophy is where the approaches diverge most. Because FoloCam typically operates at around 2× zoom combined with higher-resolution sensors and stronger AI models, it can directly track the ball in many situations. Ball-centric tracking aims to keep the true focal point of the game in frame.
XbotGo, on the other hand, tends to rely more on player movement, estimating where the action is rather than identifying the ball itself. This can lead to occasional distractions when referees, coaches, substitutes, or spectators move near the field edge, since those movements may appear more prominent than the actual play.
Can the XbotGo Falcon apply similar ball-centric tracking techniques with software updates alone? First, digitally upscaling the 9-pixel ball would not significantly improve its visibility. Second, even if the 2K sensor can be used to provide a better pixel-to-degree ratio, the Falcon's lower AI processing power (6 TOPS) would likely limit its ability to run more advanced tracking models.
Long-Term Hardware Strategy
In hindsight, one could argue that XbotGo moving away from the original gimbal concept just as smartphone hardware began advancing rapidly is perhaps not the best choice. A phone-based system benefits automatically from each new generation of processors, cameras, and sensors. As iPhones become more powerful, FoloCam can run larger models, use better zoom capabilities, and improve performance through software updates alone, without users needing to replace specialized equipment. This flexibility may become increasingly important as mobile AI hardware continues to evolve.