Loading calculator…

Calculator A/B Test - Statistical Significance & Conversion Rate

Free A/B test calculator with statistical significance testing. Calculate conversion rates, p-values, confidence intervals, and detect winning variants using two-proportion z-test. Essential for marketers, product managers, and data scientists running online experiments and conversion optimization campaigns.

A/B Test Formula (Two-Proportion Z-Test)

Z = (p₁ - p₂) / √[p(1-p)(1/n₁ + 1/n₂)]

Variables:

  • p₁Conversion Rate of Variant A (Control)
    Conversion Rate of Variant A (Control)(e.g.: 10% (0.10))
  • p₂Conversion Rate of Variant B (Experiment)
    Conversion Rate of Variant B (Experiment)(e.g.: 13% (0.13))
  • n₁Sample size of Variant A (Total visitors)
    Sample size of Variant A (Total visitors)(e.g.: 1000)
  • n₂Sample size of Variant B (Total visitors)
    Sample size of Variant B (Total visitors)(e.g.: 1000)
  • pPooled proportion: (x₁ + x₂) / (n₁ + n₂)
    Pooled proportion: (x₁ + x₂) / (n₁ + n₂)(e.g.: 0.115)
  • ZZ-Score (Statistical significance score)
    Z-Score (Statistical significance score)(e.g.: 2.16)
  • LiftPercentage increase/decrease in performance
    Percentage increase/decrease in performance(e.g.: +30%)

How to Use the KalkuLab A/B Test Calculator

  1. 1

    Enter Variant A (Control) Data

    Enter the number of conversions (e.g., 100) and total sample size or visitors (e.g., 1,000) for the control variant.

  2. 2

    Enter Variant B (Experiment) Data

    Enter the same metrics for the experiment (challenger) variant, e.g., 130 conversions from 1,000 visitors.

  3. 3

    Select Confidence Level

    Choose your confidence level: 90% (Z > 1.645), 95% (Z > 1.96, industry standard), or 99% (Z > 2.576).

  4. 4

    Analyze the Results

    Review conversion rate, lift, Z-score, and significance status. If the Z-score exceeds the critical value, the difference is statistically significant.

Examples

Example 1: Variant B Wins Significantly (E-commerce)

Problem:

An online store tests a 'Buy Now' button. Variant A (blue): 100 conversions from 1,000 visitors. Variant B (red): 130 conversions from 1,000 visitors. Test at 95% confidence.

Solution:
  1. 1.Conversion Rate Variant A: p₁ = 100/1000 = 0.10 (10%)
  2. 2.Conversion Rate Variant B: p₂ = 130/1000 = 0.13 (13%)
  3. 3.Lift: (13% − 10%) / 10% × 100% = +30%
  4. 4.Pooled proportion: p = (100+130)/(1000+1000) = 0.115
  5. 5.Standard error ≈ 0.01427
  6. 6.Z-score: (0.13 − 0.10) / 0.01427 ≈ 2.10
  7. 7.Critical value at 95%: 1.96. Since 2.10 > 1.96, result is significant.
Result:Z-Score: 2.10 | Lift: +30% | Status: SIGNIFICANT (95% confidence)

Variant B (red button) wins with a statistically significant 30% conversion rate increase. You can trust this result at 95% confidence.

Example 2: Result Not Significant (Need More Sample)

Problem:

Landing page headline test. Variant A: 50 conversions from 1,000 visitors. Variant B: 55 conversions from 1,000 visitors. Confidence level 95%.

Solution:
  1. 1.CR A: 5%, CR B: 5.5%
  2. 2.Lift: +10%
  3. 3.Z-score ≈ 0.50
  4. 4.Critical value 95% = 1.96. Since 0.50 < 1.96, NOT significant.
Result:Z-Score: 0.50 | Lift: +10% | Status: NOT SIGNIFICANT

Although Variant B shows a 10% lift, there is not enough statistical evidence yet. Increase sample size or test a larger difference.

Example 3: Variant B Performs Worse (Negative Lift)

Problem:

Checkout form redesign. Variant A (old): 200 conversions from 2,000 visitors. Variant B (new): 180 conversions from 2,000 visitors. Confidence level 95%.

Solution:
  1. 1.CR A: 10%, CR B: 9%
  2. 2.Lift: −10%
  3. 3.Z-score ≈ −1.08
  4. 4.|−1.08| < 1.96, NOT significant.
Result:Z-Score: −1.08 | Lift: −10% | Status: NOT SIGNIFICANT

Variant B shows a 10% drop but it is not yet statistically significant—likely random variation rather than a bad design.

Example 4: Test at 99% Confidence (High Stakes)

Problem:

A hospital tests a new treatment. Variant A (standard): 500 recoveries from 5,000 patients. Variant B (new): 550 recoveries from 5,000 patients. Test at 99% confidence (Z > 2.576).

Solution:
  1. 1.CR A: 10%, CR B: 11%
  2. 2.Lift: +10%
  3. 3.Z-score ≈ 1.63
  4. 4.Critical value 99% = 2.576. Since 1.63 < 2.576, NOT significant at 99%.
Result:Z-Score: 1.63 | Lift: +10% | Status: NOT SIGNIFICANT (99% confidence)

At the strict 99% standard for medical decisions, more data or a larger effect is needed to prove superiority.

Frequently Asked Questions

What is an A/B test and how does it work?
An A/B test (split test) compares two versions—Variant A as control and Variant B as experiment—to see which performs better on a metric such as conversion rate. Users are randomly assigned to each group, and performance is compared using statistical tests.
What is the Z-score in an A/B test?
The Z-score measures how far the difference in conversion rates between two variants deviates from what you would expect if both variants were truly the same. A high Z-score (above the critical value) suggests the difference is unlikely due to chance. At 95% confidence, Z > 1.96 is considered significant.
What confidence level should I use?
90% (Z > 1.645): quick tests or low-risk decisions. 95% (Z > 1.96): industry standard for most A/B tests. 99% (Z > 2.576): high-stakes decisions such as medical treatments or major product changes. Higher confidence requires larger samples.
How much sample size do I need for a valid A/B test?
Sample size depends on baseline conversion rate, minimum detectable effect (MDE), and confidence level. Smaller effects require larger samples. Use our Sample Size Calculator to estimate requirements before starting an experiment.
What does negative lift mean and what should I do?
Negative lift means Variant B performs worse than Variant A. If the result is statistically significant, stop the experiment and revert to Variant A. If not significant, the change may have no real impact or the sample may be too small to detect a difference.
How long should I run an A/B test?
Run at least 1–2 full business cycles (typically 2–4 weeks) to account for daily and weekly fluctuations and to reach adequate sample size. Avoid stopping early (peeking), which can lead to wrong conclusions.
What is the difference between statistical and practical significance?
Statistical significance confirms the difference is unlikely due to chance. Practical significance asks whether the difference is large enough to matter for the business. A 0.1% lift may be statistically significant with a huge sample but not worth implementing if costs exceed benefits.
What is the p-value and how does it relate to the Z-score?
The p-value is the probability of observing results as extreme as yours if the null hypothesis (both variants are equal) is true. A low p-value (< 0.05 at 95% confidence) means you can reject the null hypothesis. Higher Z-scores correspond to lower p-values; Z = 1.96 is roughly p ≈ 0.05.

Related Calculators

References