Skill IACompany ResearchSales

Build a scored prospect list in minutes — drop 50 accounts, get back research + CSV — Claude Skill

Une compétence Claude pour Claude Code par Browserbase — exécuter /company-research dans Claude·Mis à jour le 21 mai 2026·v1.1.0

Compatible avec·Claude

Scored prospect list with research and ICP fit, CSV-ready for Apollo.

  • Scored CSV — ICP-fit 1-10 per account with the evidence, ready to sort and import
  • Plan → Research → Synthesize loop — same pattern a senior SDR would run, but at scale
  • Three depth modes — Quick (~100 accounts), Deep (~50), Deeper (~25 with full intelligence)
  • Anti-hallucination guards — refuses to invent product descriptions from a company's fonts (a real failure mode in account research tools)
  • Output on your Desktop — scored CSV + HTML overview, opens in browser on completion
VousAujourd'hui

You open LinkedIn, Crunchbase, Google. For each account you check funding date, employee count, tech stack, recent posts. Type findings into a Notion doc. Repeat for 50 accounts. 90+ minutes per batch — before you've sent a single email. Quality drops by account 20 because you're tired.

I spent half my Monday building the target list. By the time I started sending, half the morning was gone.

SDR at Series A SaaS
Avec /company-research

One prompt with your ICP. The skill runs the same research loop you would — but in parallel across 50 accounts. You get a scored CSV and an HTML overview on your Desktop in ~6 minutes. You spend the rest of Monday on outbound, not research.

1 Define ICP once (the skill learns it from your company + targeting input2 Drop in a request: '50 mid-market fintech Series B3 Get a scored CSV + HTML overview on your Desktop4 Open the top 10 accounts in browser, draft outreac5 Reuse the same workflow tomorrow for the next segment

Pour qui

Ce qu'il fait

Daily prospecting batch — 50 fresh accounts before lunch

Drop your ICP definition in once. Every morning ask for 50 new accounts matching it. Get a scored CSV before standup, work down the list during the day.

New segment expansion

Sales lead says 'try US fintech Series B'. Run a Deeper-mode pass to get 25 fully-researched accounts with hiring signals, tools they already use, and recent funding — enough to draft your first 10 emails the same afternoon.

Pre-meeting account research

Big prospect just booked a discovery call. Run a single-account Deeper research to get every public signal (recent news, customer wins, hiring spikes) before the call — instead of opening 15 tabs yourself.

Quarterly territory refresh

AE asks 'what's changed in my territory since last quarter?'. Run a Quick-mode 100-account pass on existing accounts — flag the ones with new funding rounds, new hires, or product launches that justify re-outreach.

Fonctionnement

1

Tell it your company + which segments you're targeting (e.g. 'mid-market e-commerce, Series A+, US')

2

It researches your own ICP first — what you sell, who you sell to, what competitors look like — and confirms with you before going wider

3

Discovers target accounts via search queries tuned to your segments (industry + stage, tools they already use, competitor adjacencies)

4

Researches each account with depth matching your budget — homepage, funding, hiring, customers — and scores ICP fit 1-10 with evidence

5

Outputs a scored CSV + HTML overview on your Desktop — opens in browser, ready to drop into Apollo or your CRM

Options d'entrée

Quick mode (~100 accounts)

Homepage + 1-2 supplementary searches per account. Best for broad scans when you need volume over depth.

Exemple

Your ask
Find 50 mid-market e-commerce SaaS companies (Series A+, US-based) that look like they could use AI inventory automation. Score by ICP fit and give me a CSV.
What you get
Top 5 in chat (sorted by score)
Acme Inc — score 9/10 — Series A e-commerce SaaS, uses Selenium for scraping, expanding to EU
Bolt Commerce — score 9/10 — recently raised $14M Series A, hiring 3 data engineers
Flux Inventory — score 8/10 — DTC-focused, customer-facing pricing page mentions multi-warehouse coordination problems
GridShop — score 8/10 — public roadmap mentions inventory automation as Q2 priority
Helio Brands — score 7/10 — adjacent vertical (subscription DTC), some signal overlap
Files on your Desktop
~/Desktop/yourco_research_2026-05-21/
  ├ index.html              ← opens automatically: scored overview table
  ├ results.csv             ← drop into Apollo / ZoomInfo / Sheets
  ├ companies/acme-inc.html ← per-account deep page
  └ companies/...
.md per company with: product summary, ICP fit reasoning, evidence with confidence levels and source URLs
The methodology that drove the scores
Plan: 4 sub-questions per account (product, tools they already use, growth signals, problem fit)
Research: 5-8 web searches + page fetches per account in Deep mode
Synthesize: ICP score with cited evidence — refuses to score above 3 if homepage was unreadable (no hallucination on thin signal)

Métriques améliorées

Conversion Rate
Reps work down a pre-scored list (top 8-10s first) instead of round-robin, lifting first-touch → meeting conversion.
Sales
ICP Clarity
Scored ICP fit 1-10 per account with cited evidence — replaces gut-feel sorting and reveals which accounts actually match definition.
Sales
Meeting Rate
Higher booking rate per account because outreach is grounded in specific researched signals (funding, hiring, customer wins) — not 'I see you're at Acme'.
Sales

Compatible avec

Envie d'utiliser Company Research ?

Choisissez comment commencer.

Exécuter dans Claude Code
Gratuit. Open source.

Installez et exécutez ce skill localement sur votre ordinateur.

1
Installer Claude Code

Ouvrez un terminal sur votre ordinateur et collez cette commande :

2
Installer le skill

Cela télécharge le skill avec tous ses fichiers sur votre ordinateur :

Ajoutez -g à la fin pour le rendre disponible dans tous vos projets.

3
Lancez-le

Démarrez Claude Code, puis tapez la commande :

puis
Voir la source sur GitHub
Utiliser sur ElasticFlow
Fonctionnalités d'équipe et de collaboration

Exécutez les skills depuis votre navigateur. Partagez les résultats, gérez les accès, collaborez avec votre équipe. Sans terminal.

Essai gratuit de 14 jours. Annulez à tout moment.

Company Research

Discover and deeply research companies to sell to. Uses Browserbase Search API for discovery and a Plan→Research→Synthesize pattern for deep enrichment — outputting a scored research report and CSV.

Required: BROWSERBASE_API_KEY env var and browse CLI installed.

First-run setup: On the first run you'll be prompted to approve browse cloud fetch, browse cloud search, cat, mkdir, sed, etc. Select "Yes, and don't ask again for: browse cloud fetch:*" (or equivalent) for each to auto-approve for the session. To permanently approve, add these to your ~/.claude/settings.json under permissions.allow:

"Bash(browse:*)", "Bash(bunx:*)", "Bash(bun:*)", "Bash(node:*)",
"Bash(cat:*)", "Bash(mkdir:*)", "Bash(sed:*)", "Bash(head:*)", "Bash(tr:*)", "Bash(rm:*)"

Path rules: Always use the full literal path in all Bash commands — NOT ~ or $HOME (both trigger "shell expansion syntax" approval prompts). Resolve the home directory once and use it everywhere. When constructing subagent prompts, replace {SKILL_DIR} with the full literal path.

Output directory: All research output goes to ~/Desktop/{company_slug}_research_{YYYY-MM-DD}/. This directory contains one .md file per researched company plus a final .csv. The user gets both the scored spreadsheet and the full research files on their Desktop.

CRITICAL — Tool restrictions (applies to main agent AND all subagents):

  • All web searches: use browse cloud search. NEVER use WebSearch.
  • All page content extraction: use node {SKILL_DIR}/scripts/extract_page.mjs "<url>". This script fetches via browse cloud fetch --output, parses title + meta tags + visible body text, and automatically falls back to browse get markdown when fetch fails or returns thin JS-rendered content. NEVER hand-roll a browse cloud fetch | sed pipeline — it strips meta tags and doesn't parse the stdout JSON envelope. NEVER use WebFetch.
  • All research output: subagents write one markdown file per company to {OUTPUT_DIR}/{company-slug}.md using bash heredoc. NEVER use the Write tool or python3 -c. See references/example-research.md for the file format.
  • Report + CSV compilation: use node {SKILL_DIR}/scripts/compile_report.mjs {OUTPUT_DIR} --open — generates HTML report and CSV in one step, opens overview in browser.
  • URL deduplication: use node {SKILL_DIR}/scripts/list_urls.mjs /tmp after discovery.
  • Subagents must use ONLY the Bash tool. No other tools allowed.
  • Main agent NEVER reads raw discovery JSON batch files. Use list_urls.mjs for dedup.

CRITICAL — Anti-hallucination rules (applies to main agent AND all subagents):

  • NEVER infer product_description, industry, or target_audience from a site's fonts, framework (Framer/Next.js/React), design system, or typography. These are cosmetic and say nothing about what the company sells.
  • NEVER let the user's own ICP leak into a target's description. If you don't know what the target does, write Unknown — do not pattern-match them onto the ICP.
  • product_description MUST quote or paraphrase a specific phrase from extract_page.mjs output (TITLE, META_DESCRIPTION, OG_DESCRIPTION, HEADINGS, or BODY). If none of those fields yield a recognizable product statement, write Unknown — homepage content not accessible.
  • If product_description is Unknown, cap icp_fit_score at 3 and set icp_fit_reasoning to Insufficient evidence — homepage returned no readable content.

CRITICAL — Minimize permission prompts:

  • Subagents MUST batch ALL file writes into a SINGLE Bash call using chained heredocs. One Bash call = one permission prompt.
  • Batch ALL searches and ALL fetches into single Bash calls using && chaining.

Pipeline Overview

Follow these 5 steps in order. Do not skip steps or reorder.

  1. Company Research — Deeply understand the user's company, product, and who they sell to
  2. Depth Mode Selection — Choose research depth based on how many targets they want
  3. Discovery — Find target companies using diverse search queries
  4. Deep Research & Scoring — Research each company, score ICP fit
  5. Report & CSV — Present findings, compile scored CSV

Step 0: Setup Output Directory

Before starting, create the output directory on the user's Desktop:

OUTPUT_DIR=~/Desktop/{company_slug}_research_{YYYY-MM-DD}
mkdir -p "$OUTPUT_DIR"

Replace {company_slug} with the user's company name (lowercase, hyphenated) and {YYYY-MM-DD} with today's date. Pass {OUTPUT_DIR} (as a full literal path, not with ~) to all subagent prompts so they write research files there.

Also clean up discovery batch files from prior runs:

rm -f /tmp/company_discovery_batch_*.json

Step 1: Deep Company Research

This is the most important step. The quality of everything downstream depends on deeply understanding the user's company.

  1. Ask the user for their company name or URL

  2. Check for an existing profile:

    • List files in {SKILL_DIR}/profiles/ (ignore example.json)
    • If a matching profile exists → load it, present to user: "I have your profile from {researched_at}. Still accurate?" If yes → skip to Step 2.
    • If no profile exists → proceed with deep research below.
  3. Run a full deep research on the user's company using the Plan→Research→Synthesize pattern. See references/research-patterns.md for sub-question templates and research methodology.

    Key research steps:

    • Search: browse cloud search "{company name}" --num-results 10
    • Fetch homepage: node {SKILL_DIR}/scripts/extract_page.mjs "{company website}"
    • Discover site pages via sitemap (do NOT hardcode paths like /about or /customers):
      1. browse cloud fetch --allow-redirects "{company website}/sitemap.xml" — sitemap is small, raw browse cloud fetch is fine
      2. Scan for URLs with keywords: customer, case-stud, pricing, about, use-case, industry, solution
      3. Optionally also fetch /llms.txt for page descriptions
      4. Pick 3-5 most relevant URLs and extract with extract_page.mjs (NOT raw browse cloud fetch)
    • Search for external context and competitors
    • Accumulate findings with confidence levels

    Synthesize into a profile: Company, Product, Existing Customers, Competitors, Use Cases. Do NOT include ICP or sub-verticals — those are per-run decisions.

  4. Present the profile to the user for confirmation. Do not proceed until confirmed.

  5. Save the confirmed profile to {SKILL_DIR}/profiles/{company-slug}.json

  6. Ask clarifying questions using AskUserQuestion with checkboxes:

    • "Which segments are you targeting?" with options derived from the company research
    • "Company stage?" — Startups, Mid-market, Enterprise, All
    • "How many companies / depth?" — Quick (~100), Deep (~50), Deeper (~25)
    • This is the ONLY user interaction. After this, execute silently until results are ready.

Step 2: Depth Mode Selection

ModeResearch per companyBest for
quickHomepage + 1-2 searches~100 companies, broad scan
deep2-3 sub-questions, 5-8 tool calls~50 companies, solid research
deeper4-5 sub-questions, 10-15 tool calls~25 companies, full intelligence

Step 3: Discovery

Formula: ceil(requested_companies / 35) search queries needed. Over-discover by ~2-3x because filtering typically drops 50-70%.

Generate search queries with these patterns:

  • Industry + company stage + geography ("fintech startups series A Bay Area")
  • Technology stack + use case ("companies using Selenium for web scraping")
  • Competitor adjacency ("alternatives to {known company in ICP}")
  • Buyer persona + pain point ("engineering teams struggling with browser automation")

Process:

  1. Launch ALL discovery subagents at once (up to ~6 per message). Each runs its queries in a SINGLE Bash call:
    browse cloud search "{query}" --num-results 25 --output /tmp/company_discovery_batch_{N}.json
    
  2. After all waves complete, deduplicate: node {SKILL_DIR}/scripts/list_urls.mjs /tmp
  3. Filter the URL list — remove:
    • Blog posts, news articles (globenewswire.com, techcrunch.com, etc.)
    • Directories/aggregators (tracxn.com, crunchbase.com, g2.com)
    • The user's own competitors and existing customers (from profile) Keep only company homepages.

See references/workflow.md for subagent prompt templates and wave management.

Step 4: Deep Research & Scoring

Launch subagents to research companies in parallel. See references/workflow.md for the enrichment subagent prompt template. See references/research-patterns.md for the full research methodology.

Process:

  1. Split filtered URLs into groups per subagent (quick: ~10, deep: ~5, deeper: ~2-3)

  2. Launch ALL enrichment subagents at once (up to ~6 per message)

  3. Each subagent uses ONLY Bash — for each company:

    Phase A — Plan (skip in quick mode): Decompose into 2-5 sub-questions based on ICP and enrichment fields.

    Phase B — Research Loop: Search and fetch pages, extract findings. Respect step budget (quick: 2-3, deep: 5-8, deeper: 10-15).

    Phase C — Synthesize: Score ICP fit 1-10 with evidence. Fill enrichment fields from findings.

  4. Subagents write ALL markdown files in a SINGLE Bash call using chained heredocs to {OUTPUT_DIR}/

  5. After ALL subagents complete, proceed to Step 5

Critical: Include the confirmed ICP description verbatim in every subagent prompt. Pass the full literal {OUTPUT_DIR} path to every subagent.

Step 5: Report & CSV

  1. Generate HTML report + CSV (opens overview in browser automatically):

    node {SKILL_DIR}/scripts/compile_report.mjs {OUTPUT_DIR} --open
    

    This generates:

    • {OUTPUT_DIR}/index.html — overview page with scored table (opens in browser)
    • {OUTPUT_DIR}/companies/*.html — individual company pages (linked from overview)
    • {OUTPUT_DIR}/results.csv — scored spreadsheet for import into sheets/CRM
  2. Present a summary in chat too:

## Company Research Complete

- **Total companies researched**: {count}
- **Depth mode**: {mode}
- **Score distribution**:
  - Strong fit (8-10): {count}
  - Partial fit (5-7): {count}
  - Weak fit (1-4): {count}
- **Report opened in browser**: ~/Desktop/{company_slug}_research_{date}/index.html
  1. Show the top companies sorted by ICP score in a table:
| Company | Score | Product | Industry | Fit Reasoning |
|---------|-------|---------|----------|---------------|
| Acme | 9 | AI inventory management | E-commerce SaaS | Series A, uses Selenium, expanding to EU |
  1. For the top 3-5 companies, show a brief research summary — key findings, why they're a good fit, and what specific angle to approach them with.

Offer to dig deeper into specific companies, adjust scoring criteria, or re-run discovery with different queries.

Reference documents

Example Company Research File

Each research subagent writes one markdown file per company to {OUTPUT_DIR}/{company-slug}.md, where {OUTPUT_DIR} is the per-run Desktop directory set up by the main agent in Step 0 (e.g., ~/Desktop/acme_research_2026-04-23/). The YAML frontmatter contains structured fields for report + CSV compilation. The body contains human-readable research.

Template

---
company_name: Acme Inc
website: https://acme.com
product_description: AI-powered inventory management for e-commerce brands
industry: E-commerce / SaaS
target_audience: Mid-market e-commerce brands
key_features: demand forecasting | automated reordering | multi-warehouse sync
icp_fit_score: 8
icp_fit_reasoning: Series A e-commerce SaaS, uses Selenium for scraping, expanding to EU — strong fit
employee_estimate: 50-100
funding_info: Series A, $12M
headquarters: San Francisco, CA
---

## Product
AI-powered inventory management for e-commerce brands. Helps DTC brands
automate reordering and sync across multiple warehouses.

## Research Findings
- **[high]** Checkout optimization for Shopify stores, serving mid-market DTC brands with $5M-$50M revenue (source: acme.com/about)
- **[high]** Series A, $12M raised in Q3 2025 from Sequoia (source: TechCrunch)
- **[medium]** Recently hired 3 data engineers, expanding platform team (source: LinkedIn job posts)
- **[medium]** Uses Selenium for web scraping in their data pipeline (source: careers page)

Field Rules

  • YAML frontmatter: All structured fields go here. These are extracted for CSV compilation.
  • key_features: Pipe-separated (|) list in YAML, not a JSON array.
  • icp_fit_score: Integer 1-10.
  • icp_fit_reasoning: One line, references specific findings.
  • Body sections: ## Product, ## Research Findings.
  • Findings format: - **[confidence]** fact (source: url or description)
  • Filename: {OUTPUT_DIR}/{company-slug}.md where slug is lowercase, hyphenated (e.g., acme-inc.md).
  • Deduplication: One file per company. If a subagent encounters a company that already has a file, overwrite with richer data.

Writing via Bash Heredoc

Subagents write these files using bash heredoc to avoid security prompts. Use the full literal {OUTPUT_DIR} path — no ~ or $HOME:

cat << 'COMPANY_MD' > {OUTPUT_DIR}/acme-inc.md
---
company_name: Acme Inc
website: https://acme.com
...
---

## Product
...

## Research Findings
...
COMPANY_MD

Use 'COMPANY_MD' (quoted) as the delimiter to prevent shell variable expansion.

IMPORTANT: Write ALL company files in a SINGLE Bash call using chained heredocs to minimize permission prompts.

Company Research — Deep Research Patterns

Overview

This reference defines two research contexts:

  1. Self-Research (Step 1) — Deep research on the user's own company to build a strong ICP foundation
  2. Target Research (Step 6) — Research each discovered company using Plan→Research→Synthesize

Both use the same 3-phase pattern but with different sub-questions and goals.

Self-Research (User's Company)

This is the most important research in the pipeline. Every downstream decision depends on it.

Sub-Questions

  • "What does {company} sell and what specific problem does it solve?"
  • "Who are {company}'s existing customers? What industries, company sizes, and use cases?"
  • "Who are {company}'s competitors and what differentiates them?"
  • "What pricing model does {company} use and who is the typical buyer persona?"
  • "What use cases and pain points does {company}'s marketing emphasize?"

Page Discovery

Discover site pages dynamically — do NOT hardcode paths like /about or /customers:

  1. Fetch browse cloud fetch --allow-redirects "{company website}/sitemap.xml" — primary source, has ALL pages
  2. Scan sitemap URLs for keywords: customer, case-stud, pricing, about, use-case, blog, docs, industry, solution
  3. Optionally fetch browse cloud fetch --allow-redirects "{company website}/llms.txt" for page descriptions
  4. Pick the 3-5 most relevant URLs from the sitemap and fetch those
  5. Sitemap is the source of truth. llms.txt is bonus context but often incomplete.

External Research

  • Search: "{company} customers use cases reviews"
  • Search: "{company} alternatives competitors vs"
  • Fetch 1-2 of the most informative third-party results (G2, blog posts, comparisons)

Synthesis Output

From all findings, produce a company profile:

  • Company: name
  • Product: what they sell, how it works, key capabilities (2-3 sentences, specific)
  • Existing Customers: named customers or customer types found
  • Competitors: who they compete with, key differentiators
  • Use Cases: broad list of use cases the product serves (NOT tied to one vertical)

Do NOT include ICP, pitch angle, or sub-verticals in the profile. Those are per-run targeting decisions made in Step 2 after the profile is confirmed. The profile is a general-purpose company fact sheet that works regardless of which vertical you target next.

Why This Matters

A thin profile produces generic search queries, weak lead scoring, and cookie-cutter emails. A rich profile with specific customers, competitors, and use cases produces targeted queries, accurate scoring, and emails that reference real pain points.


Target Company Research (Step 6)

Sub-Question Templates

Generate sub-questions from these categories based on the ICP and enrichment fields requested. Not every category applies to every company — pick the most relevant.

Priority 1 (Always ask)

  • Product/Market: "What does {company} sell and who are their customers?"
  • ICP Fit: "How does {company}'s product/market relate to {sender's ICP description}?"

Priority 2 (Ask in deep/deeper)

  • Tech Stack: "What technologies, frameworks, or infrastructure does {company} use?"
  • Growth Signals: "Has {company} raised funding, launched products, or expanded recently?"
  • Pain Points: "What challenges might {company} face that {sender's product} addresses?"

Priority 3 (Ask in deeper only)

  • Decision Makers: "Who leads engineering, product, or growth at {company}?"
  • Competitive Landscape: "Who are {company}'s competitors and how are they differentiated?"
  • Customers/Case Studies: "Who are {company}'s notable customers and what results do they highlight?"

Search Query Patterns

For each sub-question, generate 2-3 search query variations:

# Product/Market
"{company name} what they do"
"{company name} product features customers"

# Tech Stack
"{company name} tech stack engineering blog"
"{company name} careers software engineer" (job posts reveal stack)

# Growth Signals
"{company name} funding round 2025 2026"
"{company name} launch announcement"
"{company name} hiring"

# Pain Points
"{company name} challenges {relevant domain}"
"{company name} {problem sender solves}"

# Decision Makers
"{company name} VP engineering CTO LinkedIn"
"{company name} head of growth product"

Finding Format

Each finding is a self-contained factual statement tied to a source:

{
  "subQuestion": "What does Acme sell and who are their customers?",
  "fact": "Acme provides checkout optimization for Shopify stores, serving mid-market DTC brands with $5M-$50M revenue",
  "sourceUrl": "https://acme.com/about",
  "sourceTitle": "About Acme - Checkout Optimization",
  "confidence": "high"
}

Confidence levels:

  • high: Directly stated on the company's own website or official press
  • medium: Inferred from job postings, third-party articles, or indirect signals
  • low: Speculative based on industry/category, or from outdated sources

Research Loop Rules

  1. Process sub-questions by priority — Priority 1 first, then 2, then 3
  2. 3-5 findings per sub-question, then move on — Don't exhaust a topic
  3. Use parallel tool calls — Search multiple queries simultaneously when possible
  4. Rephrase, don't retry — If a search returns poor results, try different keywords
  5. Fetch selectively — Don't fetch every URL from search results. Pick the 1-2 most relevant based on title and URL
  6. Stop at step limit — Respect the depth mode's step budget per company
  7. Homepage first — Always fetch the company's homepage before branching to other pages
  8. Deduplicate findings — Don't record the same fact twice from different sources

Depth Mode Behavior

Quick Mode (100+ leads)

  • Skip Phase A — No sub-question decomposition
  • Phase B: Fetch the company homepage. Run 1-2 supplementary searches if homepage data is thin.
  • Phase C: Extract available data, score ICP, write email from what's available
  • Budget: 2-3 total tool calls per company
  • Trade-off: Fast and cheap, but emails may be less personalized

Deep Mode (25-50 leads)

  • Phase A: Decompose into 2-3 sub-questions (Priority 1 + selected Priority 2)
  • Phase B: For each sub-question, run 2-3 searches + fetch 1-2 URLs. Target 3-5 findings per sub-question.
  • Phase C: Synthesize from all findings. ICP reasoning references specific evidence. Email uses the most specific/compelling finding.
  • Budget: 5-8 total tool calls per company
  • Trade-off: Good balance of depth and scale

Deeper Mode (10-25 leads)

  • Phase A: Decompose into 4-5 sub-questions (Priority 1 + 2 + selected Priority 3)
  • Phase B: Research exhaustively. Fetch multiple pages per company (homepage, about, blog, careers, product pages). Target 3-5 findings per sub-question.
  • Phase C: Synthesize with cited evidence. ICP reasoning is detailed. Email references multiple specific signals.
  • Budget: 10-15 total tool calls per company
  • Trade-off: High quality intelligence, but slow and expensive

Synthesis Instructions

After the research loop completes for a company, synthesize findings into the output record:

ICP Scoring

Score 1-10 using ALL accumulated findings as evidence:

  • 8-10: Strong match. Multiple high-confidence findings confirm right industry, company stage, and clear pain point alignment. The pitch angle directly addresses a visible need supported by evidence.
  • 5-7: Partial match. Some findings suggest relevance but key signals are missing or low-confidence. Adjacent industry or unclear pain point.
  • 1-4: Weak match. Findings indicate wrong segment, too large/small, or no apparent connection to sender's product.

Write icp_fit_reasoning referencing specific findings: "Series A fintech (from Crunchbase), uses Selenium for scraping (from job posting), expanding to EU market (from blog) — strong fit for browser infrastructure."

Email Personalization

Use the richest, most specific findings for email context:

  • Opening: Use the most concrete finding (a specific product feature, a recent launch, a job posting)
  • Bridge: Connect a finding about their challenges/stack to the sender's pitch angle
  • If only low-confidence findings exist, keep the email shorter and more general — don't fabricate specificity

Enrichment Fields

Map findings to enrichment fields:

  • product_description → from Product/Market findings
  • industry → inferred from Product/Market
  • employee_estimate → from LinkedIn search or careers page findings
  • funding_info → from Growth Signals findings
  • headquarters → from company homepage or about page
  • target_audience → from Product/Market findings
  • key_features → from product page findings

If a field has no supporting findings, leave it empty rather than guessing.

Anti-Hallucination Rules

Apply these at synthesis time. They exist because the failure mode — especially on Framer/Next.js landing pages with little server-rendered copy — is for the subagent to pattern-match visual cues onto the sender's ICP and fabricate a plausible-sounding description:

  1. Typography is not a product. Never infer product_description, industry, or target_audience from fonts, design system, framework choice (Framer, Next.js, React), or site polish. "Framer-built" and "uses Geist Mono" are observations about tooling, not signals of what the company sells.
  2. No ICP leakage. If the homepage is thin and external search turns up nothing, do NOT default the target's description toward the sender's ICP. Manufacturing AI ≠ browser automation just because both use AI.
  3. Quote, don't paraphrase from memory. product_description must quote or closely paraphrase a specific phrase from extract_page.mjs output (TITLE / META_DESCRIPTION / OG_DESCRIPTION / HEADINGS / BODY) or from an external search result. If no such phrase exists, write Unknown — homepage content not accessible.
  4. Cap scores on thin evidence. If product_description is Unknown, set icp_fit_score ≤ 3 and icp_fit_reasoning: Insufficient evidence — homepage returned no readable content. Do not justify a higher score on inferred signals alone.

Company Research — Workflow Reference

Discovery Batch JSON Schema

File: /tmp/company_discovery_batch_{N}.json

browse cloud search --output writes a JSON object (NOT a flat array):

{
  "requestId": "abc123",
  "query": "AI data extraction startups",
  "results": [
    { "url": "https://example.com", "title": "Example Corp", "author": null, "publishedDate": null },
    ...
  ]
}

The list_urls.mjs script handles both formats (flat array and { results: [...] }).

Company Research Markdown Format

File: {OUTPUT_DIR}/{company-slug}.md

Where {OUTPUT_DIR} is the per-run directory on the user's Desktop (e.g., ~/Desktop/acme_research_2026-04-23/). The main agent sets this up in Step 0 and passes the full literal path to every subagent.

Each research subagent writes one markdown file per company. See references/example-research.md for the full template.

YAML frontmatter fields (used for report + CSV compilation):

  • company_name (required)
  • website (required)
  • product_description
  • industry
  • target_audience
  • key_features (pipe-separated: feature1 | feature2 | feature3)
  • icp_fit_score (integer 1-10, required)
  • icp_fit_reasoning
  • employee_estimate
  • funding_info
  • headquarters

Body sections:

  • ## Product — what they do
  • ## Research Findings — evidence with confidence levels and sources

CRITICAL: Use consistent field names across all files. The compile_report.mjs script reads these fields.

Extracting Page Content

Use extract_page.mjs for all homepage/product-page content extraction. It fetches via browse cloud fetch --output, parses title + meta + visible body text, and falls back to browse get markdown automatically when fetch fails or returns thin JS-rendered content:

node {SKILL_DIR}/scripts/extract_page.mjs "https://example.com" --max-chars 3000

Output is a structured block:

URL: https://example.com
FETCH_OK: true|false
FALLBACK_TO_BROWSE: true|false
TITLE: ...
META_DESCRIPTION: ...
OG_TITLE: ...
OG_DESCRIPTION: ...
HEADINGS: h1/h2/h3 joined by " | "
BODY_CHARS: N
BODY:
<cleaned visible text, max N chars>

Why not a raw browse cloud fetch | sed pipeline? Without --output, browse cloud fetch returns a JSON envelope with the HTML embedded as an escaped string. A naive sed pipeline strips <> from the wrapper and content, and it removes <meta> tags, which on Framer/Next.js SPAs are often the only readable content. extract_page.mjs uses --output to parse raw HTML directly.

When to use raw browse cloud fetch: Only for small structured files where you want the JSON envelope intact — e.g. sitemap.xml, robots.txt, llms.txt. For any HTML page you'd feed to a model, use extract_page.mjs.

Verifying content is real (not hallucinated)

Before writing product_description, industry, or target_audience into a company file, confirm the claim is grounded in extract_page.mjs output. Quote or closely paraphrase from TITLE, META_DESCRIPTION, OG_DESCRIPTION, HEADINGS, or BODY.

If extract_page.mjs returns FETCH_OK: false AND FALLBACK_TO_BROWSE: false (or BODY_CHARS < 50), the homepage is inaccessible. Do not fabricate. Write:

  • product_description: Unknown — homepage content not accessible
  • icp_fit_score: 3 (or lower)
  • icp_fit_reasoning: Insufficient evidence — homepage returned no readable content

A classic failure mode this prevents: a Framer/Next.js landing page with no server-rendered copy, where the subagent pattern-matches visual cues ("design-forward", "Geist Mono", "Framer-built") onto the user's own ICP. Typography is not a product.

Discovery Subagent Prompt Template

You are a company discovery subagent. Run search queries and save results.

TOOL RULES — CRITICAL, FOLLOW EXACTLY:
1. You may ONLY use the Bash tool. No exceptions.
2. Run ALL searches in a SINGLE Bash call using && chaining.
3. BANNED TOOLS: WebFetch, WebSearch, Write, Read, Glob, Grep — ALL BANNED.
   If you use ANY banned tool, the entire run fails. Use ONLY Bash.
4. NEVER use ~ or $HOME in paths — use full literal paths.

TASK:
Run ALL of the following searches in ONE Bash command:

browse cloud search "{query1}" --num-results 25 --output /tmp/company_discovery_batch_{N1}.json && \
browse cloud search "{query2}" --num-results 25 --output /tmp/company_discovery_batch_{N2}.json && \
browse cloud search "{query3}" --num-results 25 --output /tmp/company_discovery_batch_{N3}.json && \
echo "Discovery complete"

After the command completes, report back ONLY the count of results found per batch.
Do NOT analyze, summarize, or return the actual results.

Research Subagent Prompt Template

You are a company research subagent. For each company URL, research the company and score ICP fit.

CONTEXT:
- User's company: {user_company}
- User's product: {user_product}
- ICP description: {icp_description}
- Depth mode: {depth_mode}
- Output directory: {OUTPUT_DIR}   ← write research files HERE, as a full literal path

URLS TO PROCESS:
{url_list}

TOOL RULES — CRITICAL, FOLLOW EXACTLY:
1. You may ONLY use the Bash tool. No exceptions.
2. All searches: Bash → browse cloud search "..." --num-results 10
3. All homepage/product-page content extraction:
   Bash → node {SKILL_DIR}/scripts/extract_page.mjs "URL" --max-chars 3000
   This returns structured TITLE / META_DESCRIPTION / OG_DESCRIPTION / HEADINGS / BODY and auto-falls back to browse get markdown when fetch fails or returns thin JS-rendered content.
   DO NOT hand-roll a `browse cloud fetch | sed` pipeline — it strips meta tags and doesn't parse the stdout JSON envelope. Use `browse cloud fetch` raw only for sitemap.xml, robots.txt, llms.txt.
4. BATCH all file writes: Write ALL markdown files in a SINGLE Bash call using chained heredocs (one permission prompt, not one per file).
5. BANNED TOOLS: WebFetch, WebSearch, Write, Read, Glob, Grep — ALL BANNED.
   If you use ANY banned tool, the entire run fails. Use ONLY Bash.
6. NEVER use ~ or $HOME in paths — use full literal paths.

ANTI-HALLUCINATION RULES — CRITICAL:
- NEVER infer product_description, industry, or target_audience from fonts, framework (Framer/Next.js/React), design system, or visual style. Typography is not a product.
- NEVER let the sender's ICP leak into a target's description. If you don't know what the target does, write "Unknown" — do not pattern-match them onto the ICP.
- product_description MUST quote or closely paraphrase a phrase from extract_page.mjs output. If none of TITLE/META/OG/HEADINGS/BODY yield a recognizable product statement, write "Unknown — homepage content not accessible" and cap icp_fit_score at 3.

RESEARCH PATTERN (per company):

Phase A — Plan (skip in quick mode):
Decompose what you need to know into sub-questions based on ICP and enrichment fields.

Phase B — Research Loop:
For each sub-question (or just the homepage in quick mode):
1. Run browse cloud search with relevant query
2. Pick 1-2 most relevant URLs from results
3. Extract page content: node {SKILL_DIR}/scripts/extract_page.mjs "URL" --max-chars 3000
   (uses `--output` to avoid the stdout JSON envelope, preserves meta tags, and falls back to browse get markdown when needed)
4. Smart page discovery: use `browse cloud fetch --allow-redirects` on /sitemap.xml or /llms.txt to find relevant URLs — these are small XML/text files where the raw JSON envelope is fine. For the actual HTML pages you discover, use extract_page.mjs.
5. Extract findings: factual statements with source, confidence level
6. Accumulate findings, move to next sub-question
7. Respect step budget: quick=2-3 calls, deep=5-8, deeper=10-15

Phase C — Synthesize:
From accumulated findings:
1. Score ICP fit 1-10 (see rubric below)
2. Fill enrichment fields from findings
3. Reference specific findings in icp_fit_reasoning

ICP SCORING RUBRIC:
- 8-10: Strong match. Multiple high-confidence findings confirm fit.
- 5-7: Partial match. Some findings suggest relevance but key signals missing.
- 1-4: Weak match. Wrong segment or no apparent connection.

OUTPUT — write ALL company files in a SINGLE Bash call using chained heredocs directly to {OUTPUT_DIR}:

cat << 'COMPANY_MD' > {OUTPUT_DIR}/{slug1}.md
---
company_name: {name}
website: {url}
product_description: {description}
industry: {industry}
target_audience: {audience}
key_features: {feature1} | {feature2} | {feature3}
icp_fit_score: {score}
icp_fit_reasoning: {reasoning}
employee_estimate: {estimate}
funding_info: {funding}
headquarters: {location}
---

## Product
{product description paragraph}

## Research Findings
- **[{confidence}]** {finding} (source: {url})
COMPANY_MD
cat << 'COMPANY_MD' > {OUTPUT_DIR}/{slug2}.md
---
...
---
...
COMPANY_MD

Use 'COMPANY_MD' (quoted) as the heredoc delimiter to prevent shell variable expansion.

Report back ONLY: "Batch {batch_id}: {succeeded}/{total} researched, {findings_count} total findings."
Do NOT return raw data to the main conversation.

Wave Management

Key Principle: Maximize Parallelism, Minimize Prompts

Launch as many subagents as possible in a single message (up to ~6 Agent tool calls per message). Each subagent MUST batch all its Bash operations to minimize permission prompts.

Discovery Phase

  • Launch up to 6 discovery subagents in a single message
  • Each subagent runs ALL its queries in a SINGLE Bash call using && chaining
  • After all waves complete, run node {SKILL_DIR}/scripts/list_urls.mjs /tmp
  • Filter URLs: Remove blog posts, news articles, directories, competitors, and existing customers. Keep only company homepages.

Research Phase

  • Companies per subagent varies by depth:
    • quick: ~10 companies per subagent
    • deep: ~5 companies per subagent
    • deeper: ~2-3 companies per subagent
  • Each subagent writes ALL its markdown files in a SINGLE Bash call (chained heredocs) directly to {OUTPUT_DIR}

Sizing Formula

search_queries = ceil(requested_companies / 35)
discovery_subagents = search_queries
expected_urls = search_queries * 20

quick:  research_subagents = ceil(expected_urls / 10)
deep:   research_subagents = ceil(expected_urls / 5)
deeper: research_subagents = ceil(expected_urls / 3)

Error Handling

  • If a subagent fails, log the error and continue with remaining batches
  • If >50% of subagents fail in a wave, pause and inform the user
  • extract_page.mjs already handles the browse cloud fetch → browse get markdown fallback internally. If it still returns FETCH_OK: false with empty BODY, skip the company and mark product_description as Unknown (do not guess).

Report + CSV Compilation

After all research subagents complete, compile the HTML report and CSV in one command:

node {SKILL_DIR}/scripts/compile_report.mjs {OUTPUT_DIR} --open

The script:

  • Reads all .md files in {OUTPUT_DIR}
  • Parses YAML frontmatter + body sections
  • Deduplicates by normalized company name (keeps highest ICP score)
  • Generates {OUTPUT_DIR}/index.html — scored overview page
  • Generates {OUTPUT_DIR}/companies/{slug}.html — one page per company
  • Generates {OUTPUT_DIR}/results.csv — spreadsheet for sheets/CRM
  • Opens index.html in the default browser (--open flag)
  • Prints a JSON summary to stderr