2025-08-05 Prepared for Bowtie management

Durable AI Initiatives

tl;dr

Company survival map for when intelligence is bought from AI model providers.

Two questions:

Are we a token business? Or an intelligence business?
Is our AI strategy sufficiently bitter-lesson-pilled?

Bowtie:

We're an insurance business. We grow market share, pass cost savings, keep customers healthy, minimise loss ratio, and invest the float. We are happy to buy tokens if it gives us leverage in these activities. Does that make tokens a cost center?
Modern AI apps are changing all customer touchpoints. Though imperfect we will meet them where they are headed. We will build environments, tools, and feedback loops that interoperate with all frontier models. We will post-train AI models only where we have proprietary data and tacit knowledge.

Modern internet sales funnel redux

From 2010-2025, customers hear about a brand / product, search for it, and convert via webpages and marketplaces.

More generally, a business optimises each step of this funnel, driving a potential customer from top to bottom: Awareness → Consideration → Decision → Conversion → Loyalty

Each step of the sales funnel contains channels and touchpoints that reach the customer. Some are owned properties, some are not. For example on the internet, landing pages are owned, seo channels are hybrid, paid media channels are not.

Pursuing optimisation ('growth'), we invented ways to increase the share of the funnel that is owned. The funnel for internet businesses evolved: Acquisition → Activation → Retention → Referral → Revenue

More owned properties means more can be directly measured. We invented jobs to measure, experiment, and roll out data-driven initiatives to drive customers from top to bottom. Data collection exploded.

In b2c apps and b2b saas, we mastered freemium with free trials, in-app purchases and starter plans. Let the customer try first, find the aha moment, then pay and upgrade for more.

Then we mastered network effects and referrals. Well-run software businesses boast negative churn.

In enterprise software we added 'bottom-up sales', we learnt to 'land' with small teams and individual users, 'and expand' headcount.

These lessons led to 'best practices': own your audience ('lists'), meet customers where they are ('omnichannel'), personalize every message ('a/b testing'), build a referral flywheel ('loops'), measure and nurture each customer account ('crm').

Best practice has calcified around today's owned and un-owned customer touchpoints.

Lists - make a hard-to-ignore offer, and collect email and phone via marketing channels and websites
Omnichannel - manage a dozen social media accounts, repurpose content for a dozen formats, run campaigns that contact users just enough without annoying them
A/B testing - test messages, test landing pages, test offers
Loops - reward user behaviour and test rewards
CRM - measure each step of customer journey and segment users

How do AI apps affect these sales and growth funnels?

AI apps are changing all customer touchpoints. We will throw away today's best practice.

Mid-2025, ChatGPT has >500M WAUs, Google "AI Search" has >1.5B MAUs. Meta AI has >1B MAUs.

Take how search changes:

User makes high-intent keyword search → makes high-intent semantic search
User skims landing page loaded in 0.3s → reads AI generated result

Taken to its logical conclusion, AI apps will gatekeep each step of the sales funnel.

Awareness - AI tells user what they are interested in
Consideration - AI helps user decide what qualities they value
Decision - AI tells user which product to buy
Conversion - AI helps user buy and start using the product
Loyalty - AI helps user maximise product value or get rid of it

All owned properties that aren't the product shall adapt to being consumed by AI apps.

The new growth funnel (AI apps edition):

Acquisition - AI helps user read landing pages, feature pages, and starts signup
Activation - AI reminds user to try the new product
Retention - AI helps user process remarketing ads and retention campaigns
Referral - AI helps user process referral rewards
Revenue - AI tells user which product to buy, and whether to upgrade

Lists of humans shall become lists of AI apps. Omnichannel was important before computers could easily use language. A/B testing shall become busywork (easier to execute but all low hanging fruits are picked). Loops shall stay the same so long as humans still use the software. CRM shall be much harder to do when AI apps take over the customer journey.

As an aside, the proliferation of AI-generated sales and marketing automation rapidly causes AI fatigue. Users will lean on AI apps even more to avoid slop in their feeds and inboxes.

Barbell strategy continues to be critical in reaching customers:

To reach human customers: branded / premium / sensationalist content / (other memetic secrets) that play 'the algorithm'.
Adapt sales funnel to cater to AI apps since humans are not personally viewing the owned properties.

Web search ability is not limited to b2c AI apps. Frontier AI models are better at searching and processing web search results than most humans. Expect b2b AI apps to read all the content too.

AI apps are imperfect assistants. We will meet them where they are.

Mid-2025, frontier AI models have saturated math and coding benchmarks, but still struggle with logical deduction (longer running tasks with harder to verify outcomes).

Although AI assistants are in the mainstream. Humans still and will likely continue to keep AI apps on a very short leash.

In other words, nobody will purchase one of the 16 deep researched recommendations from Manus AI without some heavy guidance on the purchase decision.

Adapt sales funnel to cater to AI apps viewing owned properties.

Although we should pay attention to search engine visibility for AI apps, it is a solved problem. AI apps use the same search indexes as humans (e.g. Google, Brave, Bing) and a few new ones (e.g. Exa).

AI apps can generate a large variety of search queries that cover the full semantic meaning of the human's request. We can reuse current tactics but apply it to a wider set of keywords.

In addition to surfacing links for the user, AI apps also assist the user by reading the webpage content directly. Searched and read content may or may not be included in the final response shown to the user.

We not only have to show up in the index. We must now ensure the correct text (and media) is returned. And we must take steps to maximise the chance that our content 'wins'.

More generally, AI web search is a type of tool-use. We shall adapt to AI apps' native tools, and we can also develop our own tools to connect with AI apps directly.

AI apps use tools reliably enough today to justify sales funnel adaptations.

	AI apps ability	Sales funnel adaptations
Searching web (semantic)	Reliable	Optimize semantic search rankings Add content optimized for AI retrieval Add human CTAs via prompt injection
Using tools, integrations	Reliable	Ditto - Add content and human CTAs Connect tool when user Acquired Add tool features to help users Activate Add tool features that use the product
Using memory, long term planning	Toy	Ditto - Add content and human CTAs Personalize to AI app's stored memory of human preferences Personalize to AI app's existing plan
Using browser, computer	Toy	Add computer-use instructions

AI agents that reliably complete tasks without human supervision remain in the sandbox. Code-generation leads the way, and is the only exception.

Again for high stakes tasks like signups and purchases, users will likely continue to keep AI apps on a very short leash for the foreseeable future.

AI apps are reliable assistants across the sales funnel. But the human still holds agency.

As we adapt our sales funnel to AI apps, we can also redesign the user experience of the remaining touchpoints that a user will still encounter.

Depending on how 'self-serve' our product is, we shall optimize our CTAs to match the new AI-assisted sales funnel.

For example, after the AI app has compared all the skus and recommended the top 3 choices, the user may want to do a final review before completing the purchase. We can give the AI app up-to-date information to generate a final review, or a CTA to a personalized comparison webpage, or a phone number to get a walkthrough from a salesperson.

The less 'self-serve' our product is, the greater the opportunity to redesign CTAs to delight users:

Sign up and start right away - AI app helps user sign up
Sign up then complete setup - AI app helps user complete setup
Sign up then wait for review - AI app helps user follow up and keep track of review progress, AI app helps user complete additional information
Talk to someone then sign up - AI app schedules call with our reps, AI app helps user prepare for call, AI app helps user ensure call is completed to satisfaction

At Bowtie, one of our goals is to eliminate insurance agents from the purchase journey. We want to put the user in the driver's seat. We publish lots of education content to help users pick the right products. We build a modern mobile-optimised underwriting experience that can be completed in 5 minutes.

Adapting our sales funnel to AI apps:

Make education content help AI apps pick the right products
AI assistant can qualify the users' needs and priorities and recommend a bundle
AI assistant can handhold users through the underwriting experience

Redesigning our CTAs to delight users of AI apps:

Inject signup offer and link into AI apps response
Inject phone or whatsapp number with AI or human agent
Let AI pre-fill underwriting questions that can be prefilled, and let the human complete the rest

AI apps are tireless assistants and imperfect agents for back-office tasks.

Alluded to above, AI apps are already reliable when using tools. Mid-July, in fact even open-source frontier models (Kimi K2) are saturating tool-use benchmarks.

The easier it is to evaluate and verify the outcomes from AI apps, the further we can push them on the autonomy scale.

Examples of AI app tasks plotted by how easy they are to verify and how much autonomy we give them.

Let's recap the landscape:

Most frontier models are amazing at following instructions and using tools
Open models are not far behind in capability
Proprietary data and verifiable feedback unlocks OOM of performance, cost, latency gains with post-training

Frontier model labs will continue to put out models with increasing ability to follow instructions, use tools and reason. But it is our responsibility to engineer the models' goals, context and feedback loop (just like a human employee).

"Before asking for more headcount and resources, teams must demonstrate why they cannot get what they want done using AI. What would this area look like if autonomous AI agents were already part of the team?"
— Lutke, 2025¹

We engineer environments that are interoperable across models. We provide goals, tools, and feedback loops for AI apps to take on increasingly long-running back-office tasks.

Orchestrate AI agent workflows reliably
Manage versions of prompts and available tools
Give access to tools and code environments (code-gen)
Give access to instant feedback (e.g. type analysis / calculator)
Evaluate model generations during run or after completion
Observe model generations and reasoning traces

Specific engineering 'techniques' are less important as they are constantly changing. What's important is engineering these outcomes:

Improving AI app / agent pass-rate
Decreasing rate of human intervention

Even if GPT-5 can one-shot everything, we still need our observability and test / eval harness, at minimum for compliance!

Improving AI agents in back-office tasks is in large part data management.

The frontier model labs have already baked 99.9% intelligence into the weights. Our job is humdrum - building data models, preparing data to feed to the models, reviewing data output by the models.

Each of the following ways to improve AI agents boils down to data management:

What should the AI model see right now to complete its task?
- Retrieval - What content should be surfaced?
- Formatting - What should the retrieved content look like?
- Ordering - What steps should come first? How to manage context rot?
- Tool calls - What are the available tools? When should they be used?
- Memory - What should be surfaced from user memory?
How do we improve what the model sees over time?
- Evaluation - Are our decisions helping or hurting performance?
- Collection - What traces do we save? What metrics are we surfacing?
- Learning from use - Which traces to distill into new prompts, new examples, new finetunes?
- Drift - Are new versions helping or hurting performance?
- Feedback - Are we collecting human feedback on failures that only humans see? Or ground-truth data?

	Business impact	Data availability
Customer service agent - help customers manage account and purchases	Customer satisfaction	Past customer service call and email transcripts Synthetic multi-lingual customer queries
Insurance recommendation agent - discuss with user best product bundle	New customer growth	Past customer service calls and email transcripts Synthetic / simulated user personas, goals, use cases
Inpatient claims agent - help customers navigate arcane hospital requirements	Customer satisfaction and operations efficiency	Hospitals mapped to known requirements and document sources
Underwriting agent - follow up with customers on cases requiring additional info	New customer growth and operations efficiency	Past underwriting decision call and email transcripts Synthetic cases that require human intervention
Document processing - help customers and back office with OCR and extraction	User satisfaction and operations efficiency	Past user submitted documents mapped to extracted data

In general, we map each model generation and reasoning trace to its evaluated result and real-world result or human feedback if available. For example:

Customer satisfaction score
Completed conversion events
Problem resolution rate

Alluded to above, the harder it is to verify the thought process and/or result of the AI agent, the less autonomy we give each instance of AI agent.

We engineer feedback loops to evaluate the context each AI agent gets to complete its task, and evaluate its thought process, actions, and result over time.

Mid-July, more frontier models are expected to share their full reasoning traces, especially with DeepSeek R2 and OpenAI's open model coming soon.

We shall prototype and benchmark AI agent performance on frontier and near-frontier API models. We push prompt engineering and context engineering to the limit, and benchmark gains against human-only workflows.

Then we can manage performance and costs by leveraging new developments in open-source post-training techniques, like multi-turn RL suited to long-running traces with tool-use.

We lock in performance and manage cost and latency by post-training open-source models. In the long run we will reward correct and efficient tool-use, tune our AI agents' vibe and personality, penalize hallucinations, and reward real-world impact.

Probability of positive ROI from post-training AI models increases with variability of task outcomes:

PHD-level math proofs - low probability, just wait for GPT-5
ACME corp GAAP audit report - medium probability, task outcomes standardise to small number of common formats
Manual underwriting insurance approval - high probability, nuanced decision tree, requires tacit knowledge, some human interaction, and verifiable task outcome

@tobi, "Reflexive AI usage is now a baseline expectation at Shopify", X (2025). ↑