May 7, 2026 11 min read

Inbox-native vs chat windows — the architecture difference

Why ChatGPT, Claude, and Notion AI all live in chat windows, why moving messages out of native inboxes is friction, and how Krewva's local-first connector architecture changes the latency, privacy, and cost math.

By Zeming Liang

technical
architecture

This is the third of three opening posts on the Krewva blog. The first one covered why we built the product at all. The second covered the trust dial. This one is about the engineering choice that makes both of those work — what we call inbox-native architecture, in contrast to the chat-window architecture that powers most consumer AI products today.

I want to do this one as a technical post rather than a founder note. There are real engineering tradeoffs in the difference between “AI that lives in a chat window” and “AI that lives inside the apps where your messages already are,” and most of them don’t get talked about because the chat-window model has been the de-facto default since 2023.

The chat-window model

ChatGPT, Claude, Gemini, Perplexity, Notion AI, Cursor, Linear’s AI features, Slack’s AI summaries — all of them share a basic shape. The user opens a chat surface (a tab, a panel, a sidebar). The user types a prompt. The model returns a completion. The user reads it, optionally copies pieces of it elsewhere, and the interaction ends.

This model has three architectural properties that look different at first but become very obvious when you stack them:

The user is always present in the loop. The model does no work without a prompt; the prompt requires a human typing it.
The data lives wherever the user pasted it. If the user wants the model to know about an email thread, the user has to paste the email thread into the chat. The model does not have direct access to the user’s inbox.
The compute happens on a stateless backend, billed per token, with no awareness between sessions. Every chat turn is its own request. Memory features bolt context on top, but the underlying API call is a fresh stateless completion each time.

Those three properties are why the chat window is the perfect UI for Q&A. Ask a question, get an answer. They are the wrong properties for an assistant that’s supposed to handle ongoing message work.

Why messages don’t fit chat windows

Messages don’t behave like Q&A. They behave like a continuous stream against a long-running set of relationships.

When your sister texts you, the relevant context isn’t the latest message. It’s the last two months of the relationship, the dinner you missed last week that you said you’d reschedule, the thing your mom mentioned in a different thread, your shared calendar, and the specific way you’ve been signing off lately. None of that fits in a chat-window paste.

So the chat-window pattern, applied to messaging, ends up looking like this in practice: the user opens ChatGPT, types here's an email I got from my sister, draft a reply: <paste>, gets a draft, copies it, switches back to Mail, pastes it, edits it for the parts the model couldn’t have known, sends. That’s the chat-window-as-AI-assistant. It is, charitably, a slightly faster typewriter.

The friction sources are not subtle:

Tab switching. Every interaction crosses two app boundaries.
Manual context export. The user has to pull context out of the inbox by hand.
Lost relationship state. The chat window has zero memory of what the model said about this contact yesterday.
No ongoing work. Closing the tab ends the assistant’s day. The chatbot doesn’t read your inbox while you sleep.

The inbox-native model

Krewva is structured the opposite way.

We do not run inside a chat window. We run as a backend worker process, with a set of typed connectors to your messaging platforms. The worker reads your inboxes on its own polling schedule. When new inbound material lands, the worker classifies it, fetches the relevant relationship context, drafts a reply (or a triage decision, or a summary, or a digest), and writes that decision into a feed you review.

The shape of the data flow is roughly:

Inbound message
  → Connector reads it (Gmail API, IMAP, AppleScript, browser, etc.)
  → Worker stores it in our database with relationship + context fingerprints
  → Pipeline job classifies + drafts (calls into DeepSeek)
  → Card gets pushed into the feed via WebSocket
  → User taps approve/deny in the app
  → Outbound action goes back through the connector to the original platform

Compare this to the chat-window flow:

Inbound message
  → User reads it in the native app
  → User opens chatbot in a separate tab
  → User pastes message + manual context
  → Chatbot returns a draft
  → User pastes draft back into native app
  → User sends

The two flows look similar from a “the AI helped” perspective — both produced a reply to the inbound message. But every step that involves the user manually shuttling data is a step where the chat-window model is bottlenecked on the user’s awake hours.

The inbox-native model has zero such steps. The user is involved only at the approval/denial moment, which is one tap. Everything else runs in the worker.

Latency math

Here’s the latency comparison for replying to a single email, measured roughly:

Chat-window flow:

User reads the email: 30 seconds
User opens chatbot tab: 5 seconds
User pastes context + types prompt: 60 seconds
Model produces draft: 8 seconds
User reads + copies draft: 15 seconds
User switches back to mail tab + pastes + sends: 20 seconds
Total: roughly 2 minutes 18 seconds, all of it in the user’s foreground.

Krewva flow:

Connector polls and ingests: happens in the worker, not blocking the user
Pipeline job drafts: ~6 seconds in the worker, happens in parallel with the user’s life
User opens app, sees card with draft already prepared: 0 seconds of model wait
User taps approve: 1 second
Outbound send through connector: ~3 seconds in the worker
Total user-foreground time: roughly 1 second per message.

Multiply by forty messages a day. Chat-window: 92 minutes of foreground attention. Krewva: ~40 seconds of foreground attention plus a few minutes of wall time spent reading drafts.

The reason is not that our LLM is faster. We use DeepSeek; the model latency is comparable to what ChatGPT runs on. The reason is that the user is no longer the synchronization point. The chat window forces the user to be the request initiator, the context provider, the result transcriber, and the sender. Inbox-native lets all of that happen offstage.

Privacy: local-first, where it has to be

The architectural property that probably matters most to users — though it’s the one that gets the least attention in our marketing — is local-first data handling for the platforms that don’t have real APIs.

iMessage is the cleanest example. Apple does not provide a server-side API for iMessage. The chat history lives in a SQLite database on each user’s local machine, at ~/Library/Messages/chat.db. The only way to read it is to be running on the user’s machine with Full Disk Access permission.

A chat-window AI cannot do this. The chat-window AI is, by construction, server-side. To get iMessage data into the chat window, the user has to manually paste it. Most users will never do that, both because of friction and because pasting your message history into a chatbot feels gross.

Krewva’s macOS desktop client runs a co-located mac-agent process on the user’s machine. That mac-agent reads chat.db locally — never uploaded raw, never streamed off-device — and only sends abstracted decision artifacts (a draft reply, a contact bucket, a confidence score) up to the worker. The bulk of the message content stays on the user’s hard drive. The cloud only ever sees the decision graph, not the message body in raw form, except where the user has explicitly approved a draft and an outbound send is being constructed.

This is not a privacy theater move. It is what the platform’s architecture forces — and the chat-window AI products simply cannot offer iMessage support without crossing a privacy line that we (and Apple) think shouldn’t be crossed. Inbox-native architecture is what makes “we read your iMessage” a feature you’d accept rather than reject.

WhatsApp Web is similar in spirit. We run a Playwright-driven browser session per user, against a persistent profile, on the user’s mac-agent. The session credentials never leave the user’s machine. Backend has the orchestration; the browser-level access lives locally.

Gmail is different — Google has a real API, OAuth-scoped, with audit logging on Google’s side. There Krewva is a server-side OAuth client and uses scoped Gmail API calls from our worker. But that’s also fine, because Gmail’s API is exactly the kind of API that allows server-side automation responsibly. The data never goes anywhere outside the relationship between Google’s servers and our scoped reads.

Different platforms get different treatment. The unifying principle is: the agent runs as close to the data as the platform’s privacy posture allows, and never further than that.

Cost: not paying for context tax

There’s a cost dimension to inbox-native architecture too, and it’s worth being honest about.

Chat-window AI products are billed per token. Every time you paste an email thread into ChatGPT and ask for a reply, you’re paying — directly or indirectly — for those tokens to be re-uploaded and re-tokenized. If your thread context is 4,000 tokens and the reply is 200 tokens, you’re paying for 4,200 tokens every interaction, even though the 4,000 of context didn’t change between the last interaction and this one.

Krewva’s pipeline maintains structured per-contact context in the database. When the worker drafts a reply, it doesn’t paste the entire thread history into the model — it constructs a typed context payload that includes only the deltas, the relationship metadata, the voice profile, and the most relevant recent messages. The model sees roughly half the tokens it would see in the chat-window equivalent, because we know what’s been seen before.

We can also do work offline. A safety-classification pass is much smaller than a draft pass. A bucket-classification pass is smaller still. Those run when needed, in their own jobs, in their own pipeline lanes. The chat-window model can’t subdivide because it doesn’t have a concept of “the same conversation across many turns” — every turn rebuilds context from scratch.

The result, end-to-end, is that we burn meaningfully fewer tokens per active user-day than a chat-window AI handling the same inbox volume would. That’s the cost angle on top of the privacy angle on top of the latency angle on top of the user-experience angle.

Engineering tradeoffs we accepted

I want to close honestly on what inbox-native costs us, the team building it.

Connector engineering is hard. Every platform is a separate engineering project. Gmail OAuth + Pub/Sub watch renewal is its own subsystem. iMessage chat.db reads + AppleScript sends are their own subsystem. WhatsApp Web Playwright automation with selector configs is its own subsystem. Slack’s events API is its own. None of this is reusable across platforms; each one is bespoke. We’ve shipped six connectors and we’ll ship more, and every new one is real engineering, not a prompt change.

Selector drift is real. WhatsApp Web changes its DOM every few months, and historically those changes have broken our automation in ways that don’t show up in CI but do show up in our error monitors. We solved this with a selectors.json file the runtime reads at startup — so we can patch DOM drift without a code redeploy — but the underlying problem doesn’t go away. Browser-level automation is fragile by nature. Embracing it was a choice; we don’t pretend it’s free.

Worker scheduling is its own beast. We run a single Postgres jobs table with row-level locking — no Redis, no external queue. That’s deliberate; fewer moving parts beats more moving parts, especially at our stage. But it means the worker process needs to be carefully managed: only one worker per environment, careful claim semantics, idempotency keys on every mutation, audit logs for every action. Standard chat-window products don’t have a worker at all. Ours is load-bearing.

Multi-process privacy boundaries are nontrivial. mac-agent is a separate process from the desktop renderer is a separate process from the backend worker. Those boundaries exist for good reason (separating local secrets from server-side compute), but they make debugging harder, deploys more careful, and architecture diagrams more crowded. If we’d built a chat window, all of that complexity would be invisible — there’d be no mac-agent, no co-located worker, no per-platform connector engineering — but the product would be a worse product.

We made the trade. We think it’s right. The point of this post is to be transparent about what the trade actually is.

Closing

The chat-window pattern is the perfect UI for the question-and-answer interaction shape. It’s the wrong UI for ongoing messaging work, where the user is asynchronous, the data is platform-locked, and the assistant has to keep working when the user steps away.

Inbox-native is the architectural answer to “what should an AI assistant for messaging actually look like?” The answer involves real connector engineering, real privacy constraints, real worker infrastructure, and real model choices. None of it is glamorous. All of it is what makes the difference between “an AI assistant in a tab” and “a crew that handles your inbox while you sleep.”

If you’ve read all three of our opening posts — manifesto, trust dial, this one — you have the thesis. We’re building Krewva for the people who decided, like we did, that the chat-window era is not the destination.

— Zeming

Quarterly notes from the build.

We send a short email when we ship something we're proud of. No growth-hacker tricks, no spam — just notes from the founders.