Most conversations about AI still center on text - most of which is in the format of chatbots, documents, or code. These domains operate within cloud and desktop environments, while the work they aim to support takes place elsewhere—on job sites, shop floors, and in the physical world. But that boundary is beginning to blur. A growing number of AI models are designed to operate with real-time awareness of surroundings. These models generate responses based on images, proximity, and physical context.
Many are calling this emerging category “physical AI”—systems designed to understand and respond to the physical world in real-time. We wrote about this conceptually in late 2024, in The Rise of Spatial Intelligence. Now it’s becoming a reality.
Instead of working from language alone, these new models incorporate visual input, spatial cues, and location-specific context. The result is a new kind of interface that operates alongside users in space, reacts to physical presence, and delivers utility based on environmental signals.
Several of the largest technology companies are now building toward this shift.
Apple recently introduced new features in visionOS that let digital content live within physical environments: for example spatial widgets that anchor to real-world surfaces and shared experiences for viewing 3D designs. The direction is clear—Apple is treating physical space as an extension of the UX interface.
Niantic, once known for consumer AR titles, has restructured its business around infrastructure. After selling its games division (including Pokemon Go) for $3.5B, the company raised $250M for a new initiative, ”Niantic Spatial”, focused entirely on geospatial mapping and semantic understanding of the physical world. At the AWE conference, it announced a partnership with Snap to begin constructing a shared 3D model of the physical world, infused with AI to identify objects, places, and meaning.
Meta continues to pursue physical AI through a long-term platform strategy. Its work on AR hardware, including the Orion project, and its Llama family of open-source AI models reflects a goal to become the next computing platform—not just an app developer. Meta appears to be targeting a vertically integrated stack: language, vision, interface, and device. That ambition is taking shape through products like the second-generation Ray-Ban smart glasses, which feature on-device AI for image capture, audio, and voice interaction. The company also plans to expand the line through partnerships with Oakley and Prada, signaling a push to make ambient, AI-powered computing part of everyday wear.
Google is making advancements in parallel, but through a different channel. Late last year it announced Android XR, the operating system for physical AI. (See Will AI’s Future Form Factor Be AR Glasses?) This year at InfoComm, Google announced that Beam—formerly Project Starline—will launch later this year in collaboration with HP and Zoom. The system uses AI-enhanced cameras, depth sensors, and spatial audio to create high-fidelity 3D telepresence. Beam creates a sense of shared presence through light-field displays and multi-view capture, compressing the distance between two distant physical spaces. It even allows for real time translation between languages.
These developments point to a shift in how computing surfaces in the world. Interfaces are starting to move beyond phones and monitors, showing up across physical spaces and shared environments. AI is no longer limited to typed prompts or text-based exchanges—it’s beginning to shape how information is delivered in real-time, based on what’s happening nearby.
Whether embedded in spatial displays, wearables, or shared environments, these systems rely on a new stack: sensors that observe, models that interpret, and interfaces that stay lightweight enough not to distract. The goal is alignment between the machine’s understanding and the user’s current situation. What makes this shift notable is the design constraint it implies—fitting knowledge into the world around us.
This is where the concept of situational computing starts to take shape: intelligence that adapts to its surroundings, in context and in real time—better reflecting how people interact with the world around them.