Why most AI assistants fail at messaging

AI works in chat windows where context is bounded. Messaging is unbounded, your iMessage going back four years includes inside jokes, code-switching, context shifts. That's the wall most AI hits.

I scrolled back through my iMessage thread with my best friend the other night. The last message was a single character, k, sent at 11:04pm. The message before it, at 11:02pm, was a 600-word voice memo I’d transcribed manually two months earlier. The message before that was a forwarded screenshot of a meme from 2023. The message before that was a voice note from his mom, in Mandarin, saying she’d cooked too much and would I please come pick up dumplings.

That’s a four-message slice of one thread. It includes English, Mandarin, voice transcription, image content, generational handoff, an inside joke (“k” is shorthand for a thing that happened in 2019 that I will not explain here), and a thread that’s been alive since 2021. If you handed any chat-window AI those four messages and asked it to draft the next reply, you’d get a fluent paragraph that completely missed the point. It would not know that “k” means anything other than “ok.” It would not know his mom. It would not understand that the screenshot was from a movie we both quote constantly.

I think this is the thing most AI products don’t admit out loud: they’re built for bounded context, and messaging is unbounded.

Bounded context, what chat windows are good at

Open ChatGPT. Open Claude. Open Cursor. The interaction shape is the same: you arrive with a question. You give the model whatever context fits in the prompt. The model answers. The session ends, or you continue refining within the same session. Either way, the universe of relevant information has a shape: it’s the prompt plus the model’s general knowledge of the world.

That shape is great for a class of tasks. Debug this regex. Explain this paper. Summarize this PDF. Write a Python script that does X. The information you need to solve the task is either in the prompt or in the model’s pretraining. Bounded.

The chat window IS the context. Once you close it, the context is gone. Open a new session, you start over. That clean boundary is the entire reason chat-window AI works as well as it does. The model never has to reason about what it doesn’t know, because the user is implicitly drawing the boundary every time they open a new conversation.

Unbounded context, what messaging is

Now look at your inbox.

Pick any thread that’s been alive for more than a year. Mine, with my co-founder Haiyang, goes back to 2024. Inside it: design discussions, infra fights, late-night memes, a moment in November 2024 where we genuinely thought we were going to shut the company down, our reconciliation the next day, the day we agreed on the rebrand from Famvoy to Krewva, the day we shipped our first connector. Some of those events are referenced obliquely in messages two years later. “Remember when we almost killed it?” is a real text that lives in that thread.

If you ask a chat-window AI to draft a reply to a message in that thread, it has access to maybe the last 20 messages, whatever you pasted in. It does not have access to the moment in November 2024. It does not know that “remember when” is loaded with two years of shared history. So it drafts something polite and generic, and the reply you would have actually sent never gets written.

This is the wall. Messaging context is unbounded in time (threads live for years), unbounded in modality (text, voice, images, code, attachments), unbounded in social register (you code-switch between your boss and your sister in the same hour), and unbounded in convention (each thread develops its own micro-language). No prompt window is going to hold all of that.

The wrong answers I keep seeing

When I look at AI products that try to do messaging, I see three failure patterns. Each one is a different way of pretending the unbounded problem is bounded.

Failure pattern one: paste-the-thread-in. The user is asked to copy a thread into a chat window, then ask the AI to draft a reply. This is what the early ChatGPT-for-email plugins did. It’s also what every “browser extension that summarizes your inbox” does. It works on a single message. It collapses on a thread that’s been alive longer than the prompt window. And the user is now doing the work of selecting which messages matter, which is most of the work.

Failure pattern two: vector-search the history. A more recent pattern. Embed the user’s whole message history, retrieve the top-k most semantically similar messages when drafting a reply, stuff them into the prompt. This is better than nothing. It’s also brittle in the exact ways messaging is brittle: it retrieves on semantic similarity, but messaging context isn’t always semantic. Sometimes the relevant context is temporal (the message right before this one, even if topically unrelated). Sometimes it’s relational (this is your mom, here’s what your mom replies look like). Vector search misses both.

Failure pattern three: pretend it doesn’t matter. The largest category. Ship a generic LLM with a generic system prompt that says “draft a reply.” The model produces fluent, polite, totally wrong replies. The user catches this on the first message and never trusts the product again.

What we had to build instead

When Haiyang and I started Krewva, the first architectural decision we made was that the agent doesn’t live in a chat window. There’s no prompt box where you paste a message and ask for a draft. Instead, the agent reads your inboxes directly, on a server, on its own schedule, and the context it builds is rooted in the connector, not in a session.

That meant we had to build, from scratch, a per-user model that holds:

  • Per-contact bucket assignment (Family, Close Friends, Friends, Work, Unknown, see backend/migrations/ for the bucket schema). Each contact has a relational tag, and that tag changes how the model drafts.
  • Per-platform conventions (an iMessage reply has different shape than a Gmail reply has different shape than a Slack reply). The draft prompt in backend/src/shared/ai/draft.ts switches behavior on platform.
  • Voice profile (how you write, distinct from how the model writes). We mine your sent-history to learn your voice, then constrain the draft generator to match it. The schema is in backend/migrations/ under voice profile.
  • Recency-weighted context selection (the last few messages in a thread matter more than the messages from two years ago, but the messages from two years ago aren’t zero, they shape relationship). We score relevance with both temporal proximity and semantic similarity, then truncate to fit the prompt window.

Every one of those is a separate engineering project. Every one of them is the reason “wrap an LLM” doesn’t work for messaging.

Why chat windows will keep failing here

I want to be specific about what I think is going to happen.

Over the next two years, every chat-window AI is going to add some flavor of “messaging integration.” ChatGPT will add Gmail. Claude will add iMessage (probably not, given Apple’s permissions model, but let’s say). Notion will add Slack. They’ll all be marketed the same way: “now your AI has access to your messages.”

Most of those will fail in the same place ours could have failed: they’ll treat messaging as one more data source to feed into the chat window. The user will still be opening a chat window, asking a question, getting an answer that involved their messages. That’s not a messaging product. That’s a search product wearing a messaging costume.

The actual messaging product is the one that works without the user ever opening a chat window. The one that drafts replies on its own and lands them in your feed already addressed to the right person, in your voice, with the right register, on the right platform, taking into account two years of thread history. That’s a hard problem. It’s also the only one worth solving.

The honest version

I’ll close with the honest version, because I think it’s the part most people don’t want to say out loud.

I don’t think any AI today fully solves unbounded-context messaging. Including ours. We are early. Our voice profile is good but not your voice. Our bucket inference works but mis-classifies sometimes. Our context selection picks better than naive recency but misses inside jokes regularly. There’s a long tail of “this thread has its own private language and we don’t know that language yet” that I expect will take years to fully crack.

What we have solved is the architecture problem. We’ve built the system that can hold unbounded context, even if we haven’t fully filled it in yet. The chat-window AIs haven’t even tried, they’ve assumed bounded context as a premise of their UI. Once you’ve made that assumption, messaging is permanently out of reach.

So the wall isn’t “AI can’t do messaging.” The wall is “AI built for chat windows can’t do messaging.” That’s a different statement. It implies the fix isn’t a better model. It’s a different shape of product.

We picked the different shape. If you’re building an AI product, ask yourself, bounded or unbounded?, and design from there.

— Zeming Liang, Founder & CEO of Wuvov

Quarterly notes from the build.

We send a short email when we ship something we're proud of. No growth-hacker tricks, no spam — just notes from the founders.