Durable AI Initiatives
Company survival map for when intelligence is bought from AI model providers.
Two questions:
- Are we a token business? Or an intelligence business?
- Is our AI strategy sufficiently bitter-lesson-pilled?
Bowtie:
- We're an insurance business. We grow market share, pass cost savings, keep customers healthy, minimise loss ratio, and invest the float. We are happy to buy tokens if it gives us leverage in these activities. Does that make tokens a cost center?
- Modern AI apps are changing all customer touchpoints. Though imperfect we will meet them where they are headed. We will build environments, tools, and feedback loops that interoperate with all frontier models. We will post-train AI models only where we have proprietary data and tacit knowledge.
Modern internet sales funnel redux
From 2010-2025, customers hear about a brand / product, search for it, and convert via webpages and marketplaces.
More generally, a business optimises each step of this funnel, driving a potential customer from top to bottom: Awareness → Consideration → Decision → Conversion → Loyalty
Each step of the sales funnel contains channels and touchpoints that reach the customer. Some are owned properties, some are not. For example on the internet, landing pages are owned, seo channels are hybrid, paid media channels are not.
Pursuing optimisation ('growth'), we invented ways to increase the share of the funnel that is owned. The funnel for internet businesses evolved: Acquisition → Activation → Retention → Referral → Revenue
More owned properties means more can be directly measured. We invented jobs to measure, experiment, and roll out data-driven initiatives to drive customers from top to bottom. Data collection exploded.
In b2c apps and b2b saas, we mastered freemium with free trials, in-app purchases and starter plans. Let the customer try first, find the aha moment, then pay and upgrade for more.
Then we mastered network effects and referrals. Well-run software businesses boast negative churn.
In enterprise software we added 'bottom-up sales', we learnt to 'land' with small teams and individual users, 'and expand' headcount.
These lessons led to 'best practices': own your audience ('lists'), meet customers where they are ('omnichannel'), personalize every message ('a/b testing'), build a referral flywheel ('loops'), measure and nurture each customer account ('crm').
Best practice has calcified around today's owned and un-owned customer touchpoints.
- Lists - make a hard-to-ignore offer, and collect email and phone via marketing channels and websites
- Omnichannel - manage a dozen social media accounts, repurpose content for a dozen formats, run campaigns that contact users just enough without annoying them
- A/B testing - test messages, test landing pages, test offers
- Loops - reward user behaviour and test rewards
- CRM - measure each step of customer journey and segment users
How do AI apps affect these sales and growth funnels?
AI apps are changing all customer touchpoints. We will throw away today's best practice.
Mid-2025, ChatGPT has >500M WAUs, Google "AI Search" has >1.5B MAUs. Meta AI has >1B MAUs.
Take how search changes:
- User makes high-intent keyword search → makes high-intent semantic search
- User skims landing page loaded in 0.3s → reads AI generated result
Taken to its logical conclusion, AI apps will gatekeep each step of the sales funnel.
- Awareness - AI tells user what they are interested in
- Consideration - AI helps user decide what qualities they value
- Decision - AI tells user which product to buy
- Conversion - AI helps user buy and start using the product
- Loyalty - AI helps user maximise product value or get rid of it
All owned properties that aren't the product shall adapt to being consumed by AI apps.
The new growth funnel (AI apps edition):
- Acquisition - AI helps user read landing pages, feature pages, and starts signup
- Activation - AI reminds user to try the new product
- Retention - AI helps user process remarketing ads and retention campaigns
- Referral - AI helps user process referral rewards
- Revenue - AI tells user which product to buy, and whether to upgrade
Lists of humans shall become lists of AI apps. Omnichannel was important before computers could easily use language. A/B testing shall become busywork (easier to execute but all low hanging fruits are picked). Loops shall stay the same so long as humans still use the software. CRM shall be much harder to do when AI apps take over the customer journey.
As an aside, the proliferation of AI-generated sales and marketing automation rapidly causes AI fatigue. Users will lean on AI apps even more to avoid slop in their feeds and inboxes.
Barbell strategy continues to be critical in reaching customers:
- To reach human customers: branded / premium / sensationalist content / (other memetic secrets) that play 'the algorithm'.
- Adapt sales funnel to cater to AI apps since humans are not personally viewing the owned properties.
Web search ability is not limited to b2c AI apps. Frontier AI models are better at searching and processing web search results than most humans. Expect b2b AI apps to read all the content too.
AI apps are imperfect assistants. We will meet them where they are.
Mid-2025, frontier AI models have saturated math and coding benchmarks, but still struggle with logical deduction (longer running tasks with harder to verify outcomes).
Although AI assistants are in the mainstream. Humans still and will likely continue to keep AI apps on a very short leash.
In other words, nobody will purchase one of the 16 deep researched recommendations from Manus AI without some heavy guidance on the purchase decision.
Adapt sales funnel to cater to AI apps viewing owned properties.
Although we should pay attention to search engine visibility for AI apps, it is a solved problem. AI apps use the same search indexes as humans (e.g. Google, Brave, Bing) and a few new ones (e.g. Exa).
AI apps can generate a large variety of search queries that cover the full semantic meaning of the human's request. We can reuse current tactics but apply it to a wider set of keywords.
In addition to surfacing links for the user, AI apps also assist the user by reading the webpage content directly. Searched and read content may or may not be included in the final response shown to the user.
We not only have to show up in the index. We must now ensure the correct text (and media) is returned. And we must take steps to maximise the chance that our content 'wins'.
More generally, AI web search is a type of tool-use. We shall adapt to AI apps' native tools, and we can also develop our own tools to connect with AI apps directly.
AI apps use tools reliably enough today to justify sales funnel adaptations.
| AI apps ability | Sales funnel adaptations | |
|---|---|---|
| Searching web (semantic) | Reliable | Optimize semantic search rankings Add content optimized for AI retrieval Add human CTAs via prompt injection |
| Using tools, integrations | Reliable | Ditto - Add content and human CTAs Connect tool when user Acquired Add tool features to help users Activate Add tool features that use the product |
| Using memory, long term planning | Toy | Ditto - Add content and human CTAs Personalize to AI app's stored memory of human preferences Personalize to AI app's existing plan |
| Using browser, computer | Toy | Add computer-use instructions |
AI agents that reliably complete tasks without human supervision remain in the sandbox. Code-generation leads the way, and is the only exception.
Again for high stakes tasks like signups and purchases, users will likely continue to keep AI apps on a very short leash for the foreseeable future.
AI apps are reliable assistants across the sales funnel. But the human still holds agency.
As we adapt our sales funnel to AI apps, we can also redesign the user experience of the remaining touchpoints that a user will still encounter.
Depending on how 'self-serve' our product is, we shall optimize our CTAs to match the new AI-assisted sales funnel.
For example, after the AI app has compared all the skus and recommended the top 3 choices, the user may want to do a final review before completing the purchase. We can give the AI app up-to-date information to generate a final review, or a CTA to a personalized comparison webpage, or a phone number to get a walkthrough from a salesperson.
The less 'self-serve' our product is, the greater the opportunity to redesign CTAs to delight users:
- Sign up and start right away - AI app helps user sign up
- Sign up then complete setup - AI app helps user complete setup
- Sign up then wait for review - AI app helps user follow up and keep track of review progress, AI app helps user complete additional information
- Talk to someone then sign up - AI app schedules call with our reps, AI app helps user prepare for call, AI app helps user ensure call is completed to satisfaction
At Bowtie, one of our goals is to eliminate insurance agents from the purchase journey. We want to put the user in the driver's seat. We publish lots of education content to help users pick the right products. We build a modern mobile-optimised underwriting experience that can be completed in 5 minutes.
Adapting our sales funnel to AI apps:
- Make education content help AI apps pick the right products
- AI assistant can qualify the users' needs and priorities and recommend a bundle
- AI assistant can handhold users through the underwriting experience
Redesigning our CTAs to delight users of AI apps:
- Inject signup offer and link into AI apps response
- Inject phone or whatsapp number with AI or human agent
- Let AI pre-fill underwriting questions that can be prefilled, and let the human complete the rest
AI apps are tireless assistants and imperfect agents for back-office tasks.
Alluded to above, AI apps are already reliable when using tools. Mid-July, in fact even open-source frontier models (Kimi K2) are saturating tool-use benchmarks.
The easier it is to evaluate and verify the outcomes from AI apps, the further we can push them on the autonomy scale.
Examples of AI app tasks plotted by how easy they are to verify and how much autonomy we give them.
Let's recap the landscape:
- Most frontier models are amazing at following instructions and using tools
- Open models are not far behind in capability
- Proprietary data and verifiable feedback unlocks OOM of performance, cost, latency gains with post-training
Frontier model labs will continue to put out models with increasing ability to follow instructions, use tools and reason. But it is our responsibility to engineer the models' goals, context and feedback loop (just like a human employee).
"Before asking for more headcount and resources, teams must demonstrate why they cannot get what they want done using AI. What would this area look like if autonomous AI agents were already part of the team?"
— Lutke, 20251
We engineer environments that are interoperable across models. We provide goals, tools, and feedback loops for AI apps to take on increasingly long-running back-office tasks.
- Orchestrate AI agent workflows reliably
- Manage versions of prompts and available tools
- Give access to tools and code environments (code-gen)
- Give access to instant feedback (e.g. type analysis / calculator)
- Evaluate model generations during run or after completion
- Observe model generations and reasoning traces
Specific engineering 'techniques' are less important as they are constantly changing. What's important is engineering these outcomes:
- Improving AI app / agent pass-rate
- Decreasing rate of human intervention
Even if GPT-5 can one-shot everything, we still need our observability and test / eval harness, at minimum for compliance!
Improving AI agents in back-office tasks is in large part data management.
The frontier model labs have already baked 99.9% intelligence into the weights. Our job is humdrum - building data models, preparing data to feed to the models, reviewing data output by the models.
Each of the following ways to improve AI agents boils down to data management:
- What should the AI model see right now to complete its task?
- Retrieval - What content should be surfaced?
- Formatting - What should the retrieved content look like?
- Ordering - What steps should come first? How to manage context rot?
- Tool calls - What are the available tools? When should they be used?
- Memory - What should be surfaced from user memory?
- How do we improve what the model sees over time?
- Evaluation - Are our decisions helping or hurting performance?
- Collection - What traces do we save? What metrics are we surfacing?
- Learning from use - Which traces to distill into new prompts, new examples, new finetunes?
- Drift - Are new versions helping or hurting performance?
- Feedback - Are we collecting human feedback on failures that only humans see? Or ground-truth data?
| Business impact | Data availability | |
|---|---|---|
| Customer service agent - help customers manage account and purchases | Customer satisfaction | Past customer service call and email transcripts Synthetic multi-lingual customer queries |
| Insurance recommendation agent - discuss with user best product bundle | New customer growth | Past customer service calls and email transcripts Synthetic / simulated user personas, goals, use cases |
| Inpatient claims agent - help customers navigate arcane hospital requirements | Customer satisfaction and operations efficiency | Hospitals mapped to known requirements and document sources |
| Underwriting agent - follow up with customers on cases requiring additional info | New customer growth and operations efficiency | Past underwriting decision call and email transcripts Synthetic cases that require human intervention |
| Document processing - help customers and back office with OCR and extraction | User satisfaction and operations efficiency | Past user submitted documents mapped to extracted data |
In general, we map each model generation and reasoning trace to its evaluated result and real-world result or human feedback if available. For example:
- Customer satisfaction score
- Completed conversion events
- Problem resolution rate
Alluded to above, the harder it is to verify the thought process and/or result of the AI agent, the less autonomy we give each instance of AI agent.
We engineer feedback loops to evaluate the context each AI agent gets to complete its task, and evaluate its thought process, actions, and result over time.
Mid-July, more frontier models are expected to share their full reasoning traces, especially with DeepSeek R2 and OpenAI's open model coming soon.
We shall prototype and benchmark AI agent performance on frontier and near-frontier API models. We push prompt engineering and context engineering to the limit, and benchmark gains against human-only workflows.
Then we can manage performance and costs by leveraging new developments in open-source post-training techniques, like multi-turn RL suited to long-running traces with tool-use.
We lock in performance and manage cost and latency by post-training open-source models. In the long run we will reward correct and efficient tool-use, tune our AI agents' vibe and personality, penalize hallucinations, and reward real-world impact.
Probability of positive ROI from post-training AI models increases with variability of task outcomes:
- PHD-level math proofs - low probability, just wait for GPT-5
- ACME corp GAAP audit report - medium probability, task outcomes standardise to small number of common formats
- Manual underwriting insurance approval - high probability, nuanced decision tree, requires tacit knowledge, some human interaction, and verifiable task outcome
- @tobi, "Reflexive AI usage is now a baseline expectation at Shopify", X (2025). ↑