CTO Mode

CTO Mode

By CTOs, for CTOs

Editor’s Primer

OpenAI makes multi-agent orchestration nearly free with GPT-5.4 Nano, Mistral drops an enterprise training platform and a formal verification agent on the same day, and the Anthropic-Pentagon legal fight just got 150 retired judges involved. In today's brief: I pointed Perplexity Computer at real workflows for a week, and where it earns its $200/month versus where it burns your credits isn't where you'd guess.

 

Today’s Signal

01

OpenAI Ships GPT-5.4 Mini and Nano, Purpose-Built for Multi-Agent Architectures

Nano at $0.20/M input tokens makes subagent orchestration almost free. If you're building tiered agent systems, the cost barrier to routing simple tasks to cheap, fast models just disappeared. Time to rethink which tasks actually need the flagship.

AI / ML

02

Mistral Launches Forge, a Full-Stack Platform for Training Models on Enterprise Data

Not fine-tuning - full pre-training to RLHF on your proprietary data, with forward-deployed engineers. For regulated industries where data sovereignty is non-negotiable, this is the first credible alternative to building an internal ML team from scratch.

Platform

03

Mistral Releases Leanstral, Open-Source Agent for Formal Code Verification in Lean 4

Beats Claude Sonnet on proof engineering tasks at 1/15th the cost. Formal verification of AI-generated code is the missing piece in the 'vibe coding' story. Apache 2.0, 6B active params, MCP-native. Worth tracking as a signal of where trustworthy AI coding is heading.

Open Source

04

150 Retired Judges File Brief Supporting Anthropic as Pentagon Hearing Approaches March 24

Bipartisan judges say the Pentagon misused the supply-chain statute. If this designation stands, any frontier AI vendor faces new procurement risk from policy disagreements. Every CTO building on Claude should be watching the March 24 hearing.

Regulation

05

Nvidia Confirms H200 Production Restart for China with Orders from ByteDance, Alibaba, Tencent

400k+ chips approved across three Chinese giants. The compute gap that export controls created is about to narrow. If you're competing against Chinese-built models or benchmarking against them, factor in a capability bump within the next two quarters.

Business

The Brief

What Perplexity Computer Is Actually Good For

By Marcus Chen  ·  3 min read  ·  OPINION

I pointed Perplexity Computer at a competitive landscape analysis last week - three companies, their recent product moves, pricing changes, hiring patterns, anything signaling strategic direction. It ran seven search types in parallel, cross-referenced public filings, and delivered a structured brief with citations in about four minutes. The last time I asked someone to pull this together, it took most of a week.

That's where this tool actually lives. Not as the "AI operating system" Aravind Srinivas keeps pitching, but as a genuinely powerful research and synthesis engine that orchestrates 19 different models behind the scenes. Gemini for deep research, Opus for reasoning, Grok for speed. It reads full source pages, not snippets, and holds context across long sessions better than anything else I've used. The multi-model routing sounds like marketing until you watch it handle a query that would choke any single model.

The non-obvious value isn't the flashy demos. It's the recurring intelligence workflows your team already does every month and nobody enjoys. A pricing and feature tracker that monitors competitors and flags changes automatically. An API cost simulator that models what happens when you hit the next vendor pricing tier. Meeting transcripts turned into Linear tickets with actual acceptance criteria. These work because they fit Computer's real strengths: multi-source research, structured synthesis, and native integrations with Slack, Notion, Snowflake, and dozens more. If the job is "gather information from five places, synthesize it, put the output somewhere useful" - this thing is legitimately great.

 

It replaces time, not judgment.

But here's the critical frame: it replaces time, not judgment. One agency handed it a six-month brand strategy project and got deliverables in two hours. They still had to quality-check everything. If you're evaluating this for team deployment, understand that you're shifting the bottleneck from production to review, not eliminating it.

And don't write code with it. A Builder.io reviewer burned 10,000 credits on a basic website because npm install silently failed in the sandbox and the agent kept pushing broken builds to Vercel without ever reporting the error. The credit system makes this worse - you genuinely don't know what a workflow costs until it finishes running. Connector reliability is spotty enough that handing it production credentials to GitHub or Salesforce deserves real security scrutiny before you commit.

One reviewer nailed it: "Expensive, occasionally infuriating, and genuinely useful in ways that single-model tools aren't." Point it at research and synthesis, and it's the best tool available at $200 a month. Point it at code, and you're paying to watch an agent chase its tail. Pick one recurring intelligence workflow this week and try it. You'll know within an hour if it earns a spot.

Hidden Gem

Hidden Gem Tweet

Thanks for reading today’s edition of CTO Mode. If you’d like to advertise to our readers, please reach out.

Meme

Keep Reading