ElasticFlow
HubAll SkillsBy DepartmentBy RoleBy ToolBy MetricMCPsPublishers
WebsiteLoginSign Up
ElasticFlow

Transform your business with AI-powered workflow automation. One unified platform for all your enterprise needs.

Follow us

Platform

  • Features
  • Benefits
  • Use Cases
  • Workflow Library

Use Cases

  • Sales
  • Marketing
  • Finance & Legal
  • HR

Catalogue

  • Departments
  • Roles
  • Tools
  • Metrics
  • Platforms

Growth

  • Referral Program
  • Partners

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Acceptable Use
  • Security
  • SLA

© 2026 ElasticFlow. All rights reserved.

ElasticFlow
HubAll SkillsBy DepartmentBy RoleBy ToolBy MetricMCPsPublishers
WebsiteLoginSign Up
ElasticFlow

Transform your business with AI-powered workflow automation. One unified platform for all your enterprise needs.

Follow us

Platform

  • Features
  • Benefits
  • Use Cases
  • Workflow Library

Use Cases

  • Sales
  • Marketing
  • Finance & Legal
  • HR

Catalogue

  • Departments
  • Roles
  • Tools
  • Metrics
  • Platforms

Growth

  • Referral Program
  • Partners

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Acceptable Use
  • Security
  • SLA

© 2026 ElasticFlow. All rights reserved.

ElasticFlow
HubAll SkillsBy DepartmentBy RoleBy ToolBy MetricMCPsPublishers
WebsiteLoginSign Up
  1. Hub
  2. All Skills
  3. A/B Test Analysis
Available in:🇬🇧 English🇫🇷 Français🇰🇷 한국어
AI SkillA/B Test AnalysisMarketing

Decide whether an experiment should ship, stop, or keep running. — Claude Skill

A Claude Skill for Claude Code by Paweł Huryn — run /ab-test-analysis in Claude·Updated Jun 12, 2026·vphuryn/pm-skills@ab-test-analysis

Compatible withGChatGPTClaudeClaudeCCClaude CodeCDClaude DesktopXCodex / Codex CLICursorCursorGeminiGeminiHHermes (via Continue / Cline)OpenClawOpenClawWindsurfWindsurf

Reads experiment results, sample size, conversion changes, guardrail metrics, and business context to recommend a clear ship, stop, or continue decision.

  • Explains experiment results in plain language instead of only reporting a p-value or dashboard screenshot.
  • Checks primary metric, sample size, segment differences, and guardrail metrics before recommending a decision.
  • Separates meaningful lift from noise, novelty effects, broken tracking, or mixed segment behavior.
  • Returns a decision memo with evidence, risk, next test idea, and what a human should confirm.
YouToday

A growth marketer screenshots the experiment dashboard, says the test is up, and debates confidence in a meeting.

With /ab-test-analysis

Run /ab-test-analysis with the result table and context. The skill returns a decision, evidence, risks, and follow-up test.

1 Paste result table2 Check guardrails3 Interpret decision risk4 Write ship/stop/continue memo

Who this is for

Growth Marketer

Turn experiment results into clear launch, stop, or continue decisions.

See skills for this role
Product Manager

Understand experiment impact on user behavior, product risk, and next iteration.

See skills for this role
Analytics Engineer

Spot tracking, sample, and guardrail issues before stakeholders trust the readout.

See skills for this role

What it does

Growth experiment readout

Turn Optimizely, Amplitude, or GA results into a decision memo.

Guardrail review

Check whether a conversion lift came with revenue, support, speed, or retention risk.

Experiment design critique

Find tracking, segment, sample size, or timing problems before trusting the result.

How it works

1

Share the experiment goal, variants, dates, traffic, sample size, and metric results.

2

Add guardrail metrics such as churn, revenue, refund rate, support tickets, or page speed if available.

3

The skill interprets lift, confidence, practical significance, and business risk.

4

It recommends ship, stop, keep running, or re-run with a cleaner design.

Input options

Experiment setup

Hypothesis, variants, dates, traffic split, audience, and success metric.

Example

What the user pastes
Experiment: Pricing page CTA copy.
Variant A: Start free trial.
Variant B: Build my plan.
Dates: June 1-14.

Results:
- A: 24,100 visitors, 1,084 trials, 4.5% conversion.
- B: 23,900 visitors, 1,267 trials, 5.3% conversion.
- Confidence shown in tool: 96%.

Guardrails:
- Paid conversion after trial: A 18.4%, B 17.9%.
- Support questions about pricing increased 11% for B.

Need: ship, stop, or continue, and what to tell leadership.
Useful result
Decision
Ship to 50% first, not 100%. Variant B improves trial start rate from 4.5% to 5.3%, but paid conversion is slightly lower and pricing questions increased.
Why
The lift is likely real for trial starts. The business impact is not proven until trial quality and support load are watched for one more week.
Guardrail risk
Pricing confusion may be rising. Add a pricing FAQ link near the CTA before full rollout.
Leadership wording
The new CTA increases trial starts by about 18% relative, but we will roll out gradually while monitoring paid conversion and pricing support tickets.
Human review
Confirm attribution window, whether paid conversion is mature enough, and whether support ticket tagging is consistent.

Metrics this improves

Conversion Rate
+5-20%
Marketing
Statistical Significance
Decision risk reduced
Marketing
Metric Trust
+20-40%
Marketing

Works with

Google Sheets
manual

Compare result tables and write the decision memo.

Optimizely
manual

Use experiment results, variants, confidence, and traffic allocation.

Amplitude
manual

Check product behavior, activation, retention, and segment impact.

google-analytics
manual

Use traffic, conversion, and acquisition context.

Works anywhere

Standalone
No setup required

Paste the notes, exports, screenshots, or summaries you already have. The skill works without a connected system.

Connected
CRM + tools integrated

Connect the relevant support, analytics, CRM, or data tool when you want fresher source evidence.

Similar skills

Auto-suggested by attribute overlap. Side-by-side comparison shows what differs.

Compare all 4 →

Programmatic SEO Page Planner

by Gooseworks
↳text, tool-accessvstext, url(What you provide)·markdown, csvvsmarkdown(Output formats)·review-requiredvsapproval-required(Human review)

Topical Authority Mapper

by Gooseworks
↳text, tool-accessvstext(What you provide)·markdown, csvvsmarkdown(Output formats)·review-requiredvsapproval-required(Human review)

Competitor Intelligence

by Gooseworks
↳text, tool-accessvstext, api-credentials(What you provide)·markdown, csvvsmarkdown, email(Output formats)·review-requiredvsnone(Human review)
Sorted by attribute overlap × differentiation. A/B Test Analysis shares 12+ attributes with each.

Want to use A/B Test Analysis?

Choose how to get started.

Run in Claude Code
Free. Open source.

Install and run this skill locally on your computer.

1
Install Claude Code

Open a terminal on your computer and paste this command:

2
Install the skill

This downloads the skill with all its files to your computer:

Add -g at the end to make it available in all your projects.

3
Run it

Start Claude Code, then type the command:

then
View source on GitHub
Use on ElasticFlow
Team and collaboration features

Run skills from your browser. Share results, manage access, collaborate with your team. No terminal needed.

Free 14-day trial. Cancel anytime.

View on GitHub

A/B Test Analysis

Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.

Context

You are analyzing A/B test results for $ARGUMENTS.

If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.

Instructions

  1. Understand the experiment:

    • What was the hypothesis?
    • What was changed (the variant)?
    • What is the primary metric? Any guardrail metrics?
    • How long did the test run?
    • What is the traffic split?
  2. Validate the test setup:

    • Sample size: Is the sample large enough for the expected effect size?
      • Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
      • Flag if the test is underpowered (<80% power)
    • Duration: Did the test run for at least 1-2 full business cycles?
    • Randomization: Any evidence of sample ratio mismatch (SRM)?
    • Novelty/primacy effects: Was there enough time to wash out initial behavior changes?
  3. Calculate statistical significance:

    • Conversion rate for control and variant
    • Relative lift: (variant - control) / control × 100
    • p-value: Using a two-tailed z-test or chi-squared test
    • Confidence interval: 95% CI for the difference
    • Statistical significance: Is p < 0.05?
    • Practical significance: Is the lift meaningful for the business?

    If the user provides raw data, generate and run a Python script to calculate these.

  4. Check guardrail metrics:

    • Did any guardrail metrics (revenue, engagement, page load time) degrade?
    • A winning primary metric with degraded guardrails may not be a true win
  5. Interpret results:

    OutcomeRecommendation
    Significant positive lift, no guardrail issuesShip it — roll out to 100%
    Significant positive lift, guardrail concernsInvestigate — understand trade-offs before shipping
    Not significant, positive trendExtend the test — need more data or larger effect
    Not significant, flatStop the test — no meaningful difference detected
    Significant negative liftDon't ship — revert to control, analyze why
  6. Provide the analysis summary:

    ## A/B Test Results: [Test Name]
    
    **Hypothesis**: [What we expected]
    **Duration**: [X days] | **Sample**: [N control / M variant]
    
    | Metric | Control | Variant | Lift | p-value | Significant? |
    |---|---|---|---|---|---|
    | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
    | [Guardrail] | ... | ... | ... | ... | ... |
    
    **Recommendation**: [Ship / Extend / Stop / Investigate]
    **Reasoning**: [Why]
    **Next steps**: [What to do]
    

Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.


Further Reading

  • A/B Testing 101 + Examples
  • Testing Product Ideas: The Ultimate Validation Experiments Library
  • Are You Tracking the Right Metrics?

Reference documents


name: ab-test-analysis description: "Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant."

A/B Test Analysis

Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.

Context

You are analyzing A/B test results for $ARGUMENTS.

If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.

Instructions

  1. Understand the experiment:

    • What was the hypothesis?
    • What was changed (the variant)?
    • What is the primary metric? Any guardrail metrics?
    • How long did the test run?
    • What is the traffic split?
  2. Validate the test setup:

    • Sample size: Is the sample large enough for the expected effect size?
      • Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
      • Flag if the test is underpowered (<80% power)
    • Duration: Did the test run for at least 1-2 full business cycles?
    • Randomization: Any evidence of sample ratio mismatch (SRM)?
    • Novelty/primacy effects: Was there enough time to wash out initial behavior changes?
  3. Calculate statistical significance:

    • Conversion rate for control and variant
    • Relative lift: (variant - control) / control × 100
    • p-value: Using a two-tailed z-test or chi-squared test
    • Confidence interval: 95% CI for the difference
    • Statistical significance: Is p < 0.05?
    • Practical significance: Is the lift meaningful for the business?

    If the user provides raw data, generate and run a Python script to calculate these.

  4. Check guardrail metrics:

    • Did any guardrail metrics (revenue, engagement, page load time) degrade?
    • A winning primary metric with degraded guardrails may not be a true win
  5. Interpret results:

    OutcomeRecommendation
    Significant positive lift, no guardrail issuesShip it — roll out to 100%
    Significant positive lift, guardrail concernsInvestigate — understand trade-offs before shipping
    Not significant, positive trendExtend the test — need more data or larger effect
    Not significant, flatStop the test — no meaningful difference detected
    Significant negative liftDon't ship — revert to control, analyze why
  6. Provide the analysis summary:

    ## A/B Test Results: [Test Name]
    
    **Hypothesis**: [What we expected]
    **Duration**: [X days] | **Sample**: [N control / M variant]
    
    | Metric | Control | Variant | Lift | p-value | Significant? |
    |---|---|---|---|---|---|
    | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
    | [Guardrail] | ... | ... | ... | ... | ... |
    
    **Recommendation**: [Ship / Extend / Stop / Investigate]
    **Reasoning**: [Why]
    **Next steps**: [What to do]
    

Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.


Further Reading

  • A/B Testing 101 + Examples
  • Testing Product Ideas: The Ultimate Validation Experiments Library
  • Are You Tracking the Right Metrics?

Source marketplace page: https://github.com/phuryn/pm-skills/blob/HEAD/pm-data-analytics/skills/ab-test-analysis/SKILL.md

Install command: npx skills add phuryn/pm-skills@ab-test-analysis

ElasticFlow

Transform your business with AI-powered workflow automation. One unified platform for all your enterprise needs.

Follow us

Platform

  • Features
  • Benefits
  • Use Cases
  • Workflow Library

Use Cases

  • Sales
  • Marketing
  • Finance & Legal
  • HR

Catalogue

  • Departments
  • Roles
  • Tools
  • Metrics
  • Platforms

Growth

  • Referral Program
  • Partners

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Acceptable Use
  • Security
  • SLA

© 2026 ElasticFlow. All rights reserved.