When a powerful A/B testing tool felt like a maze
How seven non-technical users exposed the hidden friction inside Optimizely's A/B testing platform, and what we did about it.

MY ROLE
Lead Researcher (4/7 sessions)
Project Manager
TEAM
Collaborated with:
2 Product Managers (Optimizely)
4 UX Researchers (UW)
METHODOLOGY
Moderated Usability Testing
Think-Aloud
System Usability Score (SUS)
DURATION
8 Weeks
JAN – MAR 2025
IMPACT
First usability testing of the product onboarding experience
Presented and delivered to product team with 10+ members
Directly informed 2025 Q2-Q3 product roadmap
Impacting 10000+ future experiments
01 CONTEXT
The platform was designed to be powerful,
but few people could actually figure it out.
Optimizely's Feature Experimentation platform gives product managers, marketers, and engineers the ability to run A/B tests and manage feature rollouts, without touching a single line of code.
That was the vision, but in practice, something was getting in the way. During internal business meetings, a pattern kept surfacing: users found the UI confusing and felt they had to spend too much effort just to complete core actions.
The complaints centered around two workflows: setting up feature rollouts and experiments, and understanding the overall information hierarchy of the system. No one had ever formally tested the onboarding experience from a usability perspective in Optimizely.. and that gap became our entry point.
"Through the platform, users wouldn't need an engineer beside them just to run a test."
— OPTIMIZELY PRODUCT GOAL
↳ Feature flag setup confusion
↳ High cognitive load during onboarding
↳ Unclear information hierarchy
QUICK PRIMER
What is Feature Experimentation?

02 STAKEHOLDER INVOLVEMENT
My role & who I worked with
This project was a collaboration between our UX research team and Optimizely's product team. I worked with my team to lead the study design, facilitated the pilot test, and moderated 4 out of 7 usability sessions. I led the data synthesis and collaborated with my team to present findings.
The Optimizely product team was our client. We discussed with 2 product managers to define the problems, product goals, outlined the critical user workflows, as well as narrowing down the target audience. The entire product team was present for the final presentation of this project.
Me
as a Lead UX Researcher
Study design, pilot test, moderated 4 sessions, data synthesis, deck creation, presented findings
4
Researchers worked with me
Collaborated with 4 researchers from define to testing and research
10+
Optimizely Product Team members participated
VP, PMs, researchers, designers, engineers who attended the final presentation readout
03 RESEARCH OBJECTIVE
What we want to answer, and why it mattered?
Our goal was specific: identify the exact friction points that cause non-technical users to fail or struggle when setting up an A/B test independently.
This mattered beyond UX scores, as the product markets itself to be a A/B testing tool that is easy for non-technical users, that they can be self-sufficient without an engineer fixing for them. Every engineer called in to help and extra time taken, could be a cost that eroded that promise, and may eventually affect business.
How intuitive is the product?
Specifically around setting up and managing A/B tests end-to-end without prior training
Where do users break?
What challenges and frustrations emerge when navigating core features in a real-world workflow
Does terminology help or hurt?
How well do users understand the language of the interface while running experiments
04 RESEARCH PLAN
When 8 weeks was all we got, here's how we planned:
WEEKS 1–4
Study Planning + Recruitment
Product deep-dive, becoming the testers ourselves, interview protocol design, screener + recruitment
WEEKS 4–7
Facilitation + Data Collection
1 pilot test + 7 moderated sessions, think-aloud protocol, quantitative + qualitative data capture
WEEK 7–8
Synthesis + Readout
Affinity mapping, severity rating, recommendations, live presentation to 10+ stakeholders
05 METHODOLOGY
Why moderated usability testing?
Approach: Moderated Usability Testing
We conducted seven 60-minute remote sessions using a think-aloud protocol. Each participant completed six task-based scenarios in a hypothetical business context, followed by a post-test SUS questionnaire.
Sessions included a pre-test interview to calibrate experience level, six structured tasks, post-task probing questions, and a final SUS score.
WHY THIS METHOD
→
The product goal required non-technical independence. We needed to see exactly where and why they broke.
→
Think-aloud surfaces implicit confusion that survey data misses entirely
→
Moderated sessions let us probe in real time when something unexpected surfaced
OUR CONSTRAINTS
No A/B testing background
Recruitment
Complex technical product
8 week timeline
06 PROCESS
Before testing users, we tested ourselves

1
Product walkthrough & flow mapping
Mapped every interaction, feature, rule, and terminology in FigJam to understand the product's architecture
2
Became the usability testers
Our team ran through every task scenario ourselves, surfacing confusion before participants ever saw the product.
3
Task design grounded in real workflow
Tasks were designed to mirror how A/B testers actually work, so findings would reflect genuine user problems
07 RECRUITMENT
Who we tested
We recruited 7 participants representing Optimizely's target user: non-technical, with A/B testing experience but less than one year experience using Optimizely.
We wanted to mirror the realistic onboarding scenario: someone who understands experimentation conceptually but is new to this platform. Little-to-no programming experience was key, since the product's promise was non-technical independence.
Two participants (P4, P6) slightly deviated from the target profile. Their overlapping findings were included, unique outlier observations were logged as low-severity and noted separately.
Age range
20–40 years old
Technical background
Little to no programming experience (3 participants had SQL familiarity)
A/B testing experience
1 experienced · 5 limited · 1 none
Roles represented
Product managers, content specialists, business CEOs
08 FINDINGS
What we found
While participants appreciated the platform's customizability, five recurring themes of friction emerged. Two were high-severity, causing direct task failure. The SUS score of 58 confirmed usability was below acceptable thresholds.
58
6/7
4/7
5

HIGH SEVERITY
6/7 participants never noticed the "Copy Rule" option

WHAT WE FOUND
The "Copy Rule" feature, buried inside a dropdown under "Add Rule", was the intended path for transferring experiment setups from Development to Production. Almost no one found it without prompting. Users were also unsure what copying a rule would actually do, creating a second layer of uncertainty even after discovery.
DESIGN RECOMMENDATIONS
Separate "Copy Rule" as a standalone visible option rather than nesting it in a dropdown.
Add a visual indicator to "Add Rule". Consider a "Push to Production" feature that lets users deploy rules directly from development, eliminating the copy step entirely.
"Didn't notice the Copy Rule function until I was prompted." — P5
"Navigating from development to production was confusing. I would like to push the rule directly from development." — P2
HIGH SEVERITY
4/7 participants couldn't figure out how to run the test

WHAT WE FOUND
There are two distinct "Run" buttons: one at the Rule level, one at the Ruleset level. The sequence matters, but nothing in the UI communicates this hierarchy.
The status "Ready to Run" appeared after running the Rule, but participants interpreted it as confirmation the test was already running, not as a prompt to run the Ruleset next.
DESIGN RECOMMENDATIONS
Replace "Ready to Run" with action-oriented language: "Run Ruleset to Apply Changes", making the next step explicit.
Add step indicators (Step 1: Run Rule → Step 2: Run Ruleset) to create visible progression.
"I'm not sure how to run the test." — P2
MEDIUM SEVERITY / EXTRA TIME ON TASK
Development and Production environments were indistinguishable

WHAT WE FOUND
The two environments looked nearly identical. Several participants set up rules in the wrong environment and had to repeat steps, losing time and confidence.
DESIGN RECOMMENDATIONS
Add distinct visual differentiation between environments: color-coded headers, persistent environment badges, or a persistent banner that makes the current environment immediately obvious at all times.
"I'm unclear about whether the test should be in Development or Production." — P1
"Maybe just tell me the environment I'm in?" — P2
09 IMPACT
What changed because of this work
This was the first formal usability study ever conducted on Optimizely's Feature Experimentation onboarding flow. The findings directly helped shape Q2–Q3 2025 product roadmap and future design decisions.
First
Usability study to surface onboarding experience pain points
10,000+
Future experiments impacted by the recommended UI improvements
Q2/Q3
2025 design decisions directly informed by these findings
10+
Stakeholders at the final readout: PMs, designers, and engineers
"I'm impressed by how your team navigated this in just a few weeks without an A/B testing background, the insights about the product are accurate and on-point."
OPTIMIZELY PRODUCT TEAM, POST PRESENTATION
10 REFLECTION
What I'd do differently,
and what I'd carry forward
Navigating a complex product with no A/B testing background was challenging. The heavy jargon left us lost at first, but through collaborative problem-solving, taking the initiative asking questions, mapping workflows, and rebuilding our understanding step by step, we gradually built clarity and overcame the confusion. Looking back, there are three takeaways I'd carry forward:
⚡
Become the user before designing: Remove everything I thought I know and rebuild!
Walking through the product ourselves first wasn't just preparation. It changed how we framed every task and probe. I'll make this a non-negotiable step in every evaluative study.
📊
Handle outliers intentionally, not by default
🎯
Know the presenting style before I start
Knowing the stakeholder’s timeline and preferred format in advance made the process more efficient. I shared findings in small sections as the PM needed them.