Facebook Ads A/B Testing: How to Run Tests That Actually Tell You Something
Most "A/B tests" on Facebook are just two ads running with no statistical conclusion possible. Here's how to set up tests that produce decisions instead of opinions.
The phrase "I'm A/B testing creatives" usually means "I have two ads running and I'm looking at the dashboard." That's not a test. That's an observation. Real tests have a hypothesis, a method that isolates the variable, and a sample size that reaches significance.
Below: how I structure tests that produce real conclusions, and the common setups that look scientific but produce noise.
What you're actually testing
Three things you can test on Facebook (in order of how much they matter):
- Creative โ different videos, images, hooks. 80% of test value lives here.
- Audience โ different lookalikes, interests, broad vs narrow. 15% of value.
- Bid/budget setup โ bid strategies, CBO vs ABO. 5% of value.
If you're burning test budget on bid strategies before nailing creative, you're optimizing the wrong layer.
The proper test setup
Use Facebook's built-in A/B Test tool (Experiments tab). It splits audience into mutually exclusive groups so impressions don't overlap. Without that, you're running two ad sets in the same auction competing against each other โ your "results" are partly determined by which one Facebook decided to favor in the bidder.
Setup: identical everything except the variable you're testing. Same audience definition, same budget, same bid strategy, same placements. Only the variable differs.
Sample size: the part everyone skips
To detect a 20% difference in CPA with 90% confidence, you need roughly 60-100 conversions per variant. That's the realistic minimum.
If your offer generates 5 conversions/day per variant at test budget, that's 12-20 days minimum to call a winner. Most operators stop at day 4 because "B is clearly winning." Then they're shocked when B reverses.
Quick math: with fewer than 30 conversions per variant, you almost never have statistical significance. With 100+, you usually do (assuming a real difference exists). With 200+, the test is essentially conclusive.
Common setups that look like tests but aren't
Two ads in the same ad set. Facebook's bidder decides which to show. If it shows A 80% of the time, A's "result" is just whatever A did when over-served. Not a test.
Two ad sets in the same campaign with overlapping audiences. They cannibalize each other in the auction. Use Experiments tool or non-overlapping audiences.
Sequential testing ("ran A last week, ran B this week"). Different week, different auction conditions, different week-of-year effects. Comparing apples to oranges.
Vanity metric tests. "B has higher CTR." OK, but does it produce more revenue? CTR alone is not a winning metric for paid ads.
What to test, in priority order
1. Creative concept (different angles, formats). Highest variance, biggest wins. Test 3-5 angles in parallel: founder POV, UGC review, demo, problem/solution, social proof. Winner stays, losers get killed.
2. Hook (first 5-7 words of copy or first 2 seconds of video). Same creative, different opening. Most leverage per test.
3. CTA / offer framing. "Try free" vs "Start trial" vs "Get instant access." Small effect size; test only after creative is dialed.
4. Audience. Lookalike vs broad, 1% vs 5%. Easy to test, but creative usually swings results more than audience.
5. Landing page. This is where the biggest dollars hide. A 0.3% lift in LP conversion rate often beats a 50% better creative. Test in your LP tool, not in Ads Manager.
How long tests run
Minimum: 7 days, OR 50 conversions per variant, whichever comes later. Anything shorter and you're reading noise.
Maximum: 21 days. After that, ad fatigue starts confounding results.
If at day 14 you don't have enough data for significance, you've probably under-budgeted the test. Either accept inconclusive and move on, or rerun with more budget.
FAQ
Should I use Facebook's Experiments tool or just split into two ad sets?
Experiments tool is cleaner โ it actually splits the audience. Two ad sets in one campaign overlap in the auction. Use the tool unless you have a reason not to.
Can I test two variables at once?
Technically yes (multivariate), but you need 4x the conversions to get clean signal. Single-variable tests almost always pay off better in practice.
What confidence level should I use?
90% is fine for most decisions. 95% if the decision is expensive (e.g. switching all creative). Less than 80% confidence means "we don't actually know."
Bottom line
Real A/B testing requires isolation, sample size, and patience. Most "tests" people run are just two ads they're watching. Use Experiments tool, run for 7+ days minimum, get to 50+ conversions per variant, accept that some tests will be inconclusive. Decisions made on noise are how accounts go sideways.