About the role
Bytown runs a router agent (Gemini 2.5 Flash) that classifies user intent and dispatches to specialized agents — Claude Haiku for tool-heavy work, Claude Sonnet for creative and reporting. Agents use MCP to read Google Ads data and the Google Ads SDK, Meta Business SDK, and LinkedIn/Reddit REST clients to write. Responses stream back to the frontend over SSE. You'd own that whole stack.
This is a production role, not a research role. Real customers are running real ad spend through the agents you touch. You'll iterate on prompts, build evals from customer transcripts, trade off latency and cost deliberately, and make the human-review step catch what it needs to catch before anything ships.
What you'll do
- Own the agent framework end to end — router, specialized agents, tools.
- Run multi-model routing: Gemini as dispatcher, Claude Haiku for tools, Claude Sonnet for creative.
- Build tool-calling integrations into Google Ads, Meta, LinkedIn and Reddit.
- Ship SSE streaming from agents back to the frontend.
- Build evals. Debug regressions with real customer transcripts, not vibes.
- Own per-agent cost, latency and quality — and trade them off deliberately.
- Add guardrails to the human-review step so marketers catch what they need to.
What we're looking for
- Shipped LLM-powered features in production — not just demos.
- Comfort with tool-calling, evals, streaming, and multi-provider routing.
- Strong Python. Bonus points for
FastAPIexperience. - Opinions on when to fine-tune, when to prompt, when to build a real tool.
- Comfort in a small remote team with no ticket factory.
Nice to have
- Hands-on MCP work, especially with Google Ads or similar.
- Past experience owning an LLM eval framework in production.
- Side projects — open source, notebooks, or public write-ups.
What we offer
- Remote — US & Canada.
- Competitive comp + meaningful equity.
- Real API budget for experiments and evals.
- Direct access to customers — their transcripts are your eval set.
How to apply
Email [email protected] with a short note: the role, a link to LLM work we can read, and one production regression you debugged recently. No cover letter theater.