Field note · 03

Don't Guess. Let the Customer Tell You.

Notes on building robotics observability without inventing the detection system the customer should own.

Published February 2026 · 13 min read · by Mukul Dharpure

Three pilots into a robotics observability tool, you start to notice a pattern.

Every customer arrives with the same first question: "does it detect [our specific thing]?" For one, it was a particular FSM state that meant "the dock alignment failed, retry without escalating." For another, it was an altitude band their drone shouldn't be in during specific phases of a mission. For a third, it was a torque signature on a manipulator that meant the gripper had slipped half a millimeter. None of these were in the tool. None of them could plausibly be in the tool, because no two customers had the same one.

The tempting move at this point is to build the smart system. Train a model on the customer's data, let it learn what "normal" looks like, surface anomalies automatically. It's the move every PM is told to make in 2026. It demos brilliantly. It reads as visionary in the deck. And it is, almost always, wrong.

This is a post about why I made the opposite call. Why, after watching the customer-specific-detector pattern repeat across deployments, I shipped a configurable rule framework instead of an auto-detection system, why I think most robotics observability companies are going to build the wrong thing here, and what to build instead.

The 45-anomaly problem

If you sit down with a senior robotics engineer and ask them to enumerate everything that can go wrong on a real robot in production, you'll end up with something like 45 distinct anomaly types. They decompose pretty cleanly into six domains: locomotion, perception, planning, localization, behavior, and hardware.

The 8 cyan items show up in almost every fleet. The other 30+ are deeply customer-specific.

Mature fleets don't actually monitor all 45. Most monitor between 5 and 15 at any given time. About 8 of those are the same across most fleets: stall, collision, sensor dropout, planning failure, localization loss, FSM error, battery low, e-stop. If you're shipping any kind of mobile autonomy product, you'll meet those 8 in your first month.

The other 30+ are deeply customer-specific. The drone customer cares about altitude deviation and motor RPM divergence. The surgical arm cares about force exceedance during contact. The picking robot cares about gripper slip and object misdetection. The construction-site AGV cares about lane departure relative to a custom-marked path. None of these are "common." All of them are critical to the customer that has them.

If you build a robotics observability product, this is the central tension. The 8 common anomalies feel like the floor. They tempt you to ship them, declare victory, call it a day. The 30+ tail is where every customer actually lives, and it's the part that makes them happy or unhappy in the long run.

The first time I got this wrong, I shipped two hardcoded detectors: stall and path-deviation. Three pilots in, every customer wanted three more of theirs. The math didn't work. I was going to be writing Python forever, one customer at a time, while the engineering team grew angrier and the velocity collapsed.

That was the moment to rebuild the abstraction.

Why "the system figures it out" is a trap

When you're staring at the long-tail problem, the temptation to reach for ML is strong. Industry conditioning around AI in 2026 makes it almost reflexive. Every adjacent observability product is "AI-powered." Every conference talk is about "intelligent anomaly detection." Every investor wants to know how the model gets smarter over time.

I almost did it.

Then I sat down and worked out what shipping an ML-based anomaly detector actually costs the customer. Not in compute. In actual operational burden, distributed over the life of the deployment.

Auto-detect carries a recurring labor tax across all four dimensions. Rules don't.

Training data. To detect anomalies, you need a labeled corpus of "normal" and "abnormal" runs. Most customers don't have one. Many can't generate one without a quarter of dedicated work, and even then the labeling is contested ("is that a real anomaly or just a Friday afternoon shift change?"). You either supply your own pre-trained model that may not match the customer's robot, or you push the data-collection problem onto a customer who has many other problems.

Drift. The customer's robot will change. New product mix on the line, new dock layout, software update, hardware revision. Each of these shifts the distribution your model trained on. False positives spike, the customer loses trust, you have to retrain, and you have a recurring labor cost embedded in every deployment forever.

Opacity. When the model fires for what looks like a wrong reason, no one in the customer's ops team can tell why. They have to call you. You have to look at the inputs, decide whether the model behaved correctly given those inputs, and then explain it. This is consulting work, billed as a product feature.

False positives at scale. A 95%-accurate model on a robot publishing five state messages a second produces a false positive every ten minutes. The customer turns off the alerts within a week. The product is now decorative.

The "auto-detection" framing makes a promise. The promise is: you, the customer, won't have to specify what an anomaly is. We, the vendor, will figure it out. This is a beautiful promise. It is also a transfer of work that the customer's ops team will eventually catch you on, because in production, the work doesn't disappear. It moves.

The customer's domain knowledge — the thing the customer actually has and you don't — is the most valuable asset in the room. The auto-detect framing wastes it.

Give the customer the language

The opposite move is unsexy and right.

Don't try to figure out what an anomaly is. Give the customer a small, sharp language for telling you what an anomaly is, and honor what they tell you.

Two rule kinds — state equality and numeric threshold — cover most of the territory.

Two rule kinds is enough for most of the territory. State equality: "fire when this field equals this value for at least N seconds." Numeric threshold: "fire when this field is above or below this value for at least N seconds." Add cooldowns to prevent flap. Add a graceful "missing field" handler so a misnamed topic doesn't crash the agent. Ship one Python detector module that loads the rules at startup.

That's it. That's the framework.

A customer whose robot publishes a state topic with a field that takes the value "ERROR" when their FSM enters its error state writes one YAML block. The agent fires when it sees that condition for the duration they specified. Cooldown prevents re-firing during a five-minute recovery loop. The customer didn't need to teach the agent anything. They told it.

The same module supports thousands of deployed rules across thousands of different robot schemas. The agent doesn't grow. The rules grow.

This is the pattern, and once you see it, you start to see how often it's the right shape for hardware/software products that interact with diverse customer domains. Don't model the customer's world inside your product. Give the customer a notation for describing their own world, and execute against the description.

Why this is hard to defend in the building

If letting the customer tell you is so obviously right, why does almost every competitor build the auto-detect version?

Because the pressures inside the company push the wrong way.

Engineering wants to build the smart system. It's more interesting work. Auto-detection has papers, conferences, and an ML team that wants something to do. A YAML rule loader has none of that. The senior engineer who could lead an ML platform doesn't want to lead a config parser.

Sales wants to demo "AI-powered." It sells decks. Saying "the customer writes a config file" loses the room before you've finished the sentence, even though it describes a more durable product.

Marketing wants to claim the AI category. It wants to show up in analyst reports as an AI-driven observability product. A rule engine reads as commodity.

The customer's procurement team, who reads buzzword checklists from vendor briefings, will sometimes ask "does it learn?" because they've been told to ask. Saying "no, it's configurable" sounds like the wrong answer.

Investors want ML in the architecture diagram. It supports the valuation story.

And the friendly champion in the pilot, the one who is excited about the demo, often wants to feel like the tool is "smart." A YAML file feels less smart than a model.

Every one of these pressures pushes you toward the auto-detect version. The PM is the only person in the building whose job is to absorb those pressures and hold the position, because the actual user — the customer's ops team on day 90 — will be served by the boring, transparent, configurable thing. Not the smart, opaque, retraining-needing thing.

This is what "decisions over features" looks like in the wild. You write the decision down, you defend it through the bullshit cycle, you ship the version that the day-90 user will thank you for, and you accept that the demo doesn't sparkle as much.

When ML actually earns its place

To be intellectually honest: ML earns its place sometimes.

Three of four quadrants are rules-win. ML earns its place only in the upper-left.

The case where rules genuinely lose is the upper-left of that 2×2. A failure mode is frequent enough to matter, AND its signature is genuinely hard to express as a rule. "The lidar point cloud looks weird in a way that correlates with the robot getting lost twenty seconds later." A human operator can sometimes recognize it; nobody can write a clean rule for it.

That is real ML territory. Cross-modal patterns, subtle drift in high-dimensional sensor data, signatures that humans recognize but can't reduce to a topic-and-field condition. If you have a customer whose primary failure mode lives there, and they have the data to support training, ML may be the right move for that detector.

But notice what you've done. You've earned the right to use ML for one specific detector after exhausting the rule framework, on a problem that genuinely demands it, with a customer whose data supports it. That is a much narrower claim than "we are an AI-powered observability platform," and it is the only honest one most products can make.

The rule framework should be the floor. ML, when it's right at all, should be a specific module that earns its place against a specific failure mode for a specific customer. Not the architectural premise. Not the marketing claim. A tactic, used carefully, in the small slice of the territory where it works.

Closing

The bigger lesson here is about respect.

The auto-detect framing is, underneath, a quiet claim that the vendor knows the customer's robot better than the customer does. That with enough data and enough cleverness, you can figure out what they should care about. This is almost never true. The customer has been running their robot for years before you arrived. They know its failure modes intimately, in the granular, situational way that is hard to externalize but absolutely real.

The rule framework is the opposite claim. It says: you know your robot. We know how to capture, replay, and operate observability infrastructure. Your half is the domain knowledge; our half is the substrate. Tell us what to look for in your language, and we'll do the rest.

That posture — treating the customer as the domain expert and yourself as the substrate provider — is one of the most underrated PM positions you can take in a hardware-adjacent product. It's harder to demo. It's harder to market. And in the long run, it's what makes the product survive contact with twenty different customers' twenty different worlds.

Don't try to guess what the customer means by "broken." Build the language they can use to tell you. Then get out of the way.