I built my own job-search agent.
After 20+ years in operations, I'm in Act 2, looking for what comes next. I could've used a generic resume builder and a job board. Instead I built the system I would've wanted at every transition: compounding company research, HITL at every gate, three deliverables per submission, and an agent that runs from my Mac because the cloud kept hitting Cloudflare.
The problem
Job search is broken in a specific way: every application starts from zero. You research the company on Tuesday, write a tailored resume on Wednesday, submit it on Thursday. Two weeks later you spot a different role at the same company, and your Tuesday research is gone. Or worse, you have it but it's buried in a Notion page you can't find.
RAG doesn't fix this. Vector search retrieves chunks but synthesizes nothing permanent. Every query starts fresh; nothing accumulates. That works for chatbot Q&A. It doesn't work for you, applying to 200 jobs over six months. By application 50 you should be smarter than at application 1, and the system should make that automatic.
And then there's the application package itself. Most people send a tailored resume. Done. That's a 1-out-of-3 shot. A real submission needs three things ready before you press send: a tailored resume, a cover letter that explains the transition, and an outreach sequence to a hiring manager who knows you exist. Mismatch any of the three and the whole submission gets weaker.
The design constraints
- Compounding knowledge, not RAG. Every pipeline run produces a permanent markdown artifact in
wiki/. The 10th application is generated with dramatically better training data than the 1st, without any extra effort from me. - Human-in-the-Loop at every gate. No autonomous sending, submitting, or scheduling. The Klarna case study sits over my shoulder: AI that optimized for the measurable metric (speed) while destroying the unmeasurable one (trust). I stay in the loop until the agent has absorbed enough of my taste to be trusted semi-autonomously.
- Three deliverables per submission. No job moves to
outreach_sentuntil tailored resume, cover letter, AND outreach sequence are all generated and approved. The three either land together or the submission is incomplete. - Atelier, not SaaS. The UI is a workbench for craft work: newspaper-front-page references, engineering notebooks, gold accents on warm paper. Not a dashboard. The aesthetic reinforces the register.
The stack
How a job becomes an application
1. Discovery (autonomous)
The local agent runs on a 22-hour throttle, fetching listings from Welcome to the Jungle and RemoteOK. Each scrape produces structured rows in jobs_raw: title, company, URL, snippet, source. No Apify, no cloud workers, no Cloudflare 403s. Just a real browser on a residential IP doing what a person would do.
2. Qualification (LLM-driven)
The qualifyJd processor pulls each new job, hits Claude with the job description and a fit-scoring prompt, and produces a numerical score plus rationale. Scores ≥ 70 enter the pipeline; below that, archived with reasons. Crucially: the rationale is permanent, written to wiki/companies/[company].md. If I see a different role at the same company two weeks later, the company context is already there.
3. Resume generation (HITL)
This is the hardest part. The naïve approach is "compare Alfonso's history to the JD, generate bullets." That produces resumes constrained by how I originally described my work. The better approach: start from the job-title family profile (what does a strong Operations Analyst resume look like?), backward-engineer my real story into that template, then run Coach Me only to fill the gaps the profile leaves open.
Resume v1 is the HITL trigger point. Coach Me is the extraction mechanism, not the starting point. The system asks targeted questions ("what was the team size?", "what was the metric?") instead of open-ended ones. Every approved bullet writes back to wiki/bullets/[achievement-type].md with metadata. Status ladder: baseline → approved → exemplar.
4. Three-deliverable package
For each job that progresses past resume approval, the pipeline produces a cover letter (anchored to the same JD context) and an outreach sequence (3 to 5 messages targeting a specific hiring manager LinkedIn profile). All three are reviewed in the same surface (JobDocsEditor) before anything moves to "ready to send."
5. The Wire (live activity)
Across the top of the app, a ticker shows the agent's heartbeat: jobs scraped, scores assigned, drafts ready, my own approvals. It runs at the masthead level so I can see the system breathing without opening any specific surface. The masthead and ticker timestamps recently moved from UTC to Central time. When my watch says 19:30, the Desk says 19:30, and I don't have to do timezone math in my head.
The architectural decisions that mattered
Career Desk has 10 ADRs (architectural decision records). Five of them carry most of the weight:
The build, week by week
What's built
What success looks like
Pre-launch: I've not yet sent the first HITL-approved package. What I'm watching for:
- Coverage: 30+ qualified jobs in the pipeline within the first 14 days of running. The throttle is conservative; if real-world drift drops me below 30, the discovery layer is the bottleneck.
- Velocity: resume + cover letter + outreach package generated, reviewed, and approved in under 30 minutes per job. Faster than that and I'm not reading carefully; slower and the system isn't pulling its weight.
- Match score: 85-90% match on Teal's 15-check gate after Coach Me fills placeholders. The job-title family profile architecture is supposed to start drafts at 90-100% match. If first drafts land at 70%, the profile is the problem; if at 95%, the model's working.
- Compounding: the 10th application meaningfully better than the 1st without extra effort. Bullet library status ladder (
baseline → approved → exemplar) should fill out organically. - Outcome: first response to outreach within 14 days of first send. First interview within 30. First offer when it lands. The outcome metric is downstream of all four above.
↻ I'll add a "30 days in" section once first applications go out.
What I'd do differently
- Skip Apify entirely from day 1. Two full ships built on Apify before the architecture failed too often to be reliable. Cloudflare 403s on data-center IPs, selector drift in cloud Puppeteer, version-build env-var caching surprises. The local agent on a residential IP just works. The cost of "Mac off = no scrape" is acceptable for a one-person system; the cost of fighting cloud bot mitigation isn't.
- Lock the time-display layer earlier. Six weeks of UTC timestamps in the UI made me do mental tz math on every job. Should have pinned to America/Chicago in week 1, not week 6.
- Don't ship duplicate utility code. Two separate
relTimeimplementations existed in two components for weeks before consolidation. Single source of truth from the start would have caught theMath.round → Math.floorbug earlier. - Build the bullet library wiki structure on day 1. ADR-005 was day 2; the first ten bullets weren't structured to the status ladder. Retrofitting metadata onto existing bullets was lower-priority work that piled up.
What unlocked the speed
- The Wiki Paradigm itself. Once it clicked that the LLM is a compiler managing markdown (not a chat assistant retrieving context), every architectural decision after became simpler. Every output goes somewhere permanent.
- ADRs from day 1. Decision Log up to 10 entries by week 2. When a question comes up ("why did I pick this?"), the answer is one grep away. Decision drift would have eaten weeks otherwise.
- Claude Code's plan/review skills.
/plan-eng-review,/cso,/investigateon every meaningful change. The Apify-to-local migration was a 6-hour refactor with full review coverage instead of a 3-day debug spiral. - Open Brain in place before Career Desk. The qualitative memory layer existed before the structured layer. Bullets had a place to call from before the wiki was rebuilt around them.
- Atelier-not-SaaS aesthetic locked early. The design system constraints (warm paper, Playfair Display, no gradients, gold reserved for accent) meant I never spent time picking a button style. The reference is "engineering notebook"; everything else falls out.
LLM Wiki ParadigmOpen BrainHITLJob-search agentPuppeteer stealthSupabaseClaude APIReactAI-assisted dev