How we built the trust dial: engineering autonomy that earns its place

A technical deep-dive into per-bucket per-platform autonomy gating. Schema design, the auto-reply gate, and why we chose programmatic ladders over a learned policy.

The first time the trust dial almost shipped, it was a single boolean column. users.auto_reply_enabled. Default false. Flip it to true, the agent stops asking and starts sending. We had it working in a branch for about three days before Zeming and I sat down at the whiteboard and killed it.

The reason we killed it is that it answered the wrong question. A boolean answers “should the agent auto-reply?” The right question is “who is the agent allowed to auto-reply to, on which platform, under what conditions, with what fallback if it gets it wrong?” That’s a matrix, not a switch. And it’s a matrix that has to evolve, autonomy that’s earned, not declared.

This post is about how we engineered that. The schema, the runtime gate, the rollout sequence, and the trade-off we deliberately made between programmatic ladders and learned policy. If you’re building an AI product where the agent acts on the user’s behalf, you’ll hit some version of these decisions. We landed on a shape that’s been holding up in beta, and I want to walk through why.

The schema

The state of the trust dial lives in three tables.

contact_buckets, one row per contact, per user. Columns: user_id, platform, external_id (the contact’s identifier on that platform), bucket (one of family | close_friends | friends | work | unknown), bucket_source (one of user_set | auto_classified | default), last_classified_at. The bucket is the relationship category we use to decide how aggressively the agent can act. User-set buckets always win; auto-classified buckets are reconsidered after thirty days.

auto_reply_overrides, per-user, per-(platform, bucket, scope) opt-ins or opt-outs. A row says “for user X, on platform P, for bucket B, in scope S (where scope can be a workspace ID for Slack or NULL for everything), the auto-reply behavior is <auto | hold | off>.” User overrides override the default policy.

cards.replyability, on the cards table itself, a per-card flag set at ingest time. Values: replyable | non_replyable | NULL. This is the FLOOR gate, a card classified non_replyable (an OTP, a marketing email, a system-generated notification) never auto-sends, regardless of bucket or override. Replyability is checked first.

Three tables, each answering a different question. contact_buckets answers “who is this person to the user?” auto_reply_overrides answers “what has the user explicitly told us to do here?” cards.replyability answers “is this even a thing we should reply to?”

The decision logic at runtime takes all three.

The auto-reply gate

When the worker is about to send a drafted card, it calls applyAutoReplyGating(card, userPolicy, contactBucket, replyability) in backend/src/worker/pipeline-jobs.ts. The gate is a deterministic decision tree. Pseudo-code:

if replyability !== 'replyable' → SHOW_CARD (no auto-send, ever)
if user has explicit override for (platform, bucket, scope) → apply override
elif contactBucket is 'unknown' → SHOW_CARD
elif user is on day < 30 of this (platform, bucket) lane → SHOW_CARD
elif rolling approval rate on this lane < 0.95 → SHOW_CARD
elif edit-rate on this lane > 0.20 → SHOW_CARD
else → AUTO_SEND

Two things to notice. First, the gate is a sequence of explicit checks. There’s no model judgment here. The worker isn’t asking an LLM “should I send this?”, it’s running a predicate. That’s the deliberate choice. We’ll come back to it.

Second, the FLOOR gate (replyability !== 'replyable') is the first check, and it’s the one we care about most. We added it after a beta tester reported the system had auto-replied “thanks!” to a one-time-password email. The OTP was sent for a banking login, and our system, eager to keep the inbox tidy, fired off a “thanks!” before the user could see it. Nothing bad happened, banks don’t read your reply emails, but it could have. The floor gate now treats every OTP, marketing message, and bot-sent notification as non_replyable and shows the card without auto-send, no matter what.

That kind of bug is the reason this whole system is matrix-shaped. A single boolean can’t capture “auto-send to your sister but never auto-send to a banking auth email.” A matrix can.

Programmatic ladders, not learned policy

The biggest design call we made was choosing a programmatic ladder, explicit thresholds, hardcoded sequences, over a learned policy that uses ML to decide when to open the dial.

The learned-policy version is tempting on paper. Imagine: feed the model historical approval-rate data, edit-rate, response time, contact engagement, and let the model decide when this user is ready for auto-reply on this lane. Adaptive. Smart. Personalized.

We rejected that for three reasons.

Reason one, auditability. When a user asks “why did you auto-send this?”, the answer needs to be a sentence, not a softmax distribution over a feature vector. With a programmatic ladder, the answer is “because you’re on day 47 of (Family, iMessage), your rolling approval rate is 0.97, your edit rate is 0.04, and you have no override on this lane.” That sentence is reproducible, debuggable, and contestable. With a learned policy, the answer is “the model thinks you’re ready”, which is the opposite of what trust requires.

Reason two, failure modes. A learned policy can drift. A user who hasn’t approved cards in two weeks (because they’re on vacation) might end up with a degraded model state that incorrectly opens or closes lanes. A programmatic ladder is brittle in known ways, if the user is on vacation, no progress, no problem. If the thresholds are wrong, we adjust them with a config change, not a model retrain.

Reason three, speed of correction. If a user reports “the system auto-sent something it shouldn’t have,” we need to be able to fix it within hours. A schema field flip, an override row insert, a threshold adjustment, minutes. A learned-policy fix is a retraining cycle, an evaluation, a rollout. Days, in the best case.

So we picked deterministic. Programmatic. Testable. The trade-off is that the dial is less personalized than it could be, every user gets the same thresholds. We accepted that. We’d rather have a slow but auditable trust gradient than a fast but opaque one.

The rollout sequence

When a new user connects their first platform, the dial is closed everywhere. Every draft shows as a card. No auto-send.

Once the user enables auto-send on a given (platform, bucket) lane, the gate keeps evaluating rolling-window quality. If approval rate or edit-rate degrades, the dial closes for that lane only. Other lanes stay exactly where the user left them.

If the user starts denying or heavily editing cards on that lane, the rolling-window stats degrade and the gate closes the dial again. We don’t tell them, we just stop auto-sending. The next time they look at their feed, their cards are back. The expectation we set in onboarding is that the trust dial breathes; it tightens when it should and loosens when it can.

As the matrix matures, Family-on-iMessage, Close-Friends-on-iMessage, and Family-on-WhatsApp can auto-send overnight if the user enabled those lanes. Work-on-Gmail and Unknown-on-anything can still wait. The user sees ten cards in the morning instead of forty.

The hard part nobody mentions

The hard part of building this system was not the schema or the gate. It was the realization that autonomy is a product surface, not just a backend behavior. Users have to be able to see the dial. They have to know what’s open, what’s closed, what’s been auto-sent overnight, what they need to approve.

We spent more time on the dial-visualization UI than on the gate logic. The mobile screen at frontend/frontend_mobile/app/(app)/settings/auto-reply.tsx shows the current matrix as a grid: rows are buckets, columns are platforms, each cell is a state badge. Tap a cell to see the override options. Long-press to see the rolling-window stats that drove the current state.

The macOS screen does the same, with a sidebar showing recently auto-sent messages so the user can confirm they would have approved them. The principle is: the user should never be surprised by what the agent did overnight. If they are, we lost the trust we were trying to build.

Closing, what I’d tell another founder building this

If you’re building an AI product where the agent acts on the user’s behalf, the temptation is to start with a switch (“auto-pilot ON”) and add nuance later. Don’t. Start with a matrix. The schema is harder up front and the runtime is harder to reason about, but the user-trust property you get out of it is structural, not bolted on.

Pick programmatic over learned. Audit beats accuracy when the stakes are real money or real relationships. Make the floor gate explicit and uncompromising, the worst auto-send is the one you couldn’t have prevented because the gate didn’t exist.

And ship the visualization. The dial isn’t a backend feature. It’s a contract you’re showing the user every time they open the app.

— Haiyang Wu, CTO of Wuvov

Quarterly notes from the build.

We send a short email when we ship something we're proud of. No growth-hacker tricks, no spam — just notes from the founders.