planning Expected July 2026

Phase 1 — Brain on the Bench

An AI that sees a room and answers questions about it. No moving parts.

The brain comes first because it is the part of the robot that does not need a robot. A stereo camera on a tripod, a microphone, a speaker, and a Jetson running ROS2 are enough to build something interesting. A vision-language model loop takes frames from the camera and produces a text understanding of the room at one to three second cadence. A voice loop closes around it. Speak to it, it listens, it reasons, it answers.

The harder work here is not any single piece. Speech-to-text, vision-language models, and behaviour trees are all solved problems. The work is in stitching them into something that feels coherent, runs at a useable latency, and has a credible cost-per-hour to operate. Profile early. The wrong API call pattern can turn a sensible build into an expensive demo.

Milestone is “Chappie at a desk”. The full perception and dialogue stack running end-to-end on the Jetson, under three seconds from question to answer. The system sees the room, hears questions, and returns contextual responses in real time.