A system that has to phone home to think is not thinking. It is querying. The distinction sounds pedantic until you try to build something that perceives the physical world in real time — and then it becomes the whole game.
These are working notes from building local-first cognition: perception, memory, and reasoning that run where the sensors are, on hardware you can hold.
The latency argument
Cognition that matters happens on a clock. A camera frame is relevant for milliseconds. A LiDAR sweep describes a world that is already changing. If your perception loop includes a round trip to a datacenter, you have added a tax that the physical world does not pay and will not wait for.
Run the loop locally and the tax disappears. On a Raspberry Pi 5 with a Hailo accelerator, you can close a perception loop in single-digit milliseconds and never leave the device. That is not a cost optimization. It is the difference between a system that reacts to the world and one that reacts to a memory of the world.
If it has to ask permission to perceive, it is already too late to act.
The sovereignty argument
A local-first system owns its own senses. Its camera feed, its audio, its spatial map, its memory of what it has seen — none of it has to leave the device for the device to be useful. For anything that lives in a home, near people, or inside a trust boundary that matters, that is not a feature. It is the only acceptable default.
This is the architecture behind the JARVIS work: camera, microphone, LiDAR, a memory graph, belief validation, and identity-bounded recall, all running local-first. The cloud is an option the system can reach for, not a dependency it cannot live without.
Memory is not a vector dump
The lazy version of "memory" is to embed everything and stuff it in a vector store. That is recall, not memory. Real memory has structure: entities, relationships, and — this is the part everyone skips — a confidence the system holds in each belief.
A system that cannot say "I am not sure I saw that" cannot be trusted to act on what it thinks it saw. Calibrated uncertainty is not a nice-to-have on a perception system. It is load-bearing.
Build memory as a graph with validated, identity-scoped beliefs and you get something that can be wrong on the record — which means it can be corrected, audited, and improved. Build it as an undifferentiated pile of embeddings and you get a system that is confidently amnesiac.
What the constraints teach you
Edge silicon is unforgiving. Thermal limits, power budgets, memory ceilings — they push back on every lazy decision. That pushback is the point. A model that has to run on a Pi forces you to ask what computation actually earns its place in the loop. The answer is almost always less than you started with, which is the ZeroBandwidth principle showing up again: reduce noise before you transmit, even when the wire is internal.
Cognition belongs at the edge because that is where the world is. Everything else is a query.