EVALUATIVE USABILITY STUDY @ OPTIMIZELY

EVALUATIVE USABILITY STUDY @ OPTIMIZELY

When a powerful A/B testing tool felt like a maze

How seven non-technical users exposed the hidden friction inside Optimizely's A/B testing platform, and what we did about it.

MY ROLE

Lead Researcher (4/7 sessions)

Project Manager

TEAM

Collaborated with:
2 Product Managers (Optimizely)

4 UX Researchers (UW)

METHODOLOGY

Moderated Usability Testing

Think-Aloud

System Usability Score (SUS)

DURATION

8 Weeks

JAN – MAR 2025

IMPACT

  • First usability testing of the product onboarding experience

  • Presented and delivered to product team with 10+ members

  • Directly informed 2025 Q2-Q3 product roadmap

  • Impacting 10000+ future experiments

UX Research / Case Study

01 CONTEXT

The platform was designed to be powerful,
but few people could actually figure it out.

Optimizely's Feature Experimentation platform gives product managers, marketers, and engineers the ability to run A/B tests and manage feature rollouts, without touching a single line of code.

That was the vision, but in practice, something was getting in the way. During internal business meetings, a pattern kept surfacing: users found the UI confusing and felt they had to spend too much effort just to complete core actions.

The complaints centered around two workflows: setting up feature rollouts and experiments, and understanding the overall information hierarchy of the system. No one had ever formally tested the onboarding experience from a usability perspective in Optimizely.. and that gap became our entry point.

"Through the platform, users wouldn't need an engineer beside them just to run a test."

— OPTIMIZELY PRODUCT GOAL

Feature flag setup confusion

High cognitive load during onboarding

Unclear information hierarchy

QUICK PRIMER

What is Feature Experimentation?

TL;DR: Imagine you can test two different versions of your website without coding using just one button, this platform is where you control and setup the button. Now, you can fix one color without getting an engineer to code for you.

A feature flag is a technique that lets users toggle product features on or off without deploying new code. Optimizely's platform is built on this concept. Users create flags, attach rulesets, and run A/B tests or targeted deliveries against specific audience segments.

The user base spans technical and non-technical roles: PMs, marketers, content specialists, and CEOs. The product aspiration was that any of them could independently launch an experiment, with zero engineer dependency.

TL;DR: Imagine you can test two different versions of your website without coding using just one button, this platform is where you control and setup the button. Now, you can fix one color without getting an engineer to code for you.

A feature flag is a technique that lets users toggle product features on or off without deploying new code. Optimizely's platform is built on this concept. Users create flags, attach rulesets, and run A/B tests or targeted deliveries against specific audience segments.

The user base spans technical and non-technical roles: PMs, marketers, content specialists, and CEOs. The product aspiration was that any of them could independently launch an experiment, with zero engineer dependency.

02 STAKEHOLDER INVOLVEMENT

My role & who I worked with

This project was a collaboration between our UX research team and Optimizely's product team. I worked with my team to lead the study design, facilitated the pilot test, and moderated 4 out of 7 usability sessions. I led the data synthesis and collaborated with my team to present findings.

The Optimizely product team was our client. We discussed with 2 product managers to define the problems, product goals, outlined the critical user workflows, as well as narrowing down the target audience. The entire product team was present for the final presentation of this project.

Me

as a Lead UX Researcher

Study design, pilot test, moderated 4 sessions, data synthesis, deck creation, presented findings

4

Researchers worked with me

Collaborated with 4 researchers from define to testing and research

10+

Optimizely Product Team members participated

VP, PMs, researchers, designers, engineers who attended the final presentation readout

03 RESEARCH OBJECTIVE

What we want to answer, and why it mattered?

Our goal was specific: identify the exact friction points that cause non-technical users to fail or struggle when setting up an A/B test independently.

This mattered beyond UX scores, as the product markets itself to be a A/B testing tool that is easy for non-technical users, that they can be self-sufficient without an engineer fixing for them. Every engineer called in to help and extra time taken, could be a cost that eroded that promise, and may eventually affect business.

How intuitive is the product?

Specifically around setting up and managing A/B tests end-to-end without prior training

Where do users break?

What challenges and frustrations emerge when navigating core features in a real-world workflow

Does terminology help or hurt?

How well do users understand the language of the interface while running experiments

04 RESEARCH PLAN

When 8 weeks was all we got, here's how we planned:

WEEKS 1–4

Study Planning + Recruitment

Product deep-dive, becoming the testers ourselves, interview protocol design, screener + recruitment

WEEKS 4–7

Facilitation + Data Collection

1 pilot test + 7 moderated sessions, think-aloud protocol, quantitative + qualitative data capture

WEEK 7–8

Synthesis + Readout

Affinity mapping, severity rating, recommendations, live presentation to 10+ stakeholders

05 METHODOLOGY

Why moderated usability testing?

Approach: Moderated Usability Testing

We conducted seven 60-minute remote sessions using a think-aloud protocol. Each participant completed six task-based scenarios in a hypothetical business context, followed by a post-test SUS questionnaire.

Sessions included a pre-test interview to calibrate experience level, six structured tasks, post-task probing questions, and a final SUS score.

WHY THIS METHOD

The product goal required non-technical independence. We needed to see exactly where and why they broke.

Think-aloud surfaces implicit confusion that survey data misses entirely

Moderated sessions let us probe in real time when something unexpected surfaced

OUR CONSTRAINTS

No A/B testing background

Recruitment

Complex technical product

8 week timeline

06 PROCESS

Before testing users, we tested ourselves

Scariest thing before doing usability tests?

We were just as confused.

Optimizely's Feature Experimentation platform is dense and full of technical jargons. Before we could design meaningful tasks with clear intention, we had to deeply understand the product: its jargons, hidden logic, information hierarchy, and how rules, flags, environments, and rulesets interact.. as a A/B testing and coding environment newbie.

Scariest thing before doing usability tests?

We were just as confused.

Optimizely's Feature Experimentation platform is dense and full of technical jargons. Before we could design meaningful tasks with clear intention, we had to deeply understand the product: its jargons, hidden logic, information hierarchy, and how rules, flags, environments, and rulesets interact.. as a A/B testing and coding environment newbie.

We started with the step-by-step developer manual on Optimizely's website, mapped the interaction flows, flagged every piece of terminology we didn't understand, and walked through the platform as users ourselves. Still confused and lost. Then we brought all the questions we had to our client to clarify things up, verified our assumptions, and asked our client to do a demo walkthrough, this was when the mental mapping finally started to clicked. Questions after questions.

(Kudos to our client for doing all the info session requested by us!)

We started with the step-by-step developer manual on Optimizely's website, mapped the interaction flows, flagged every piece of terminology we didn't understand, and walked through the platform as users ourselves. Still confused and lost. Then we brought all the questions we had to our client to clarify things up, verified our assumptions, and asked our client to do a demo walkthrough, this was when the mental mapping finally started to clicked. Questions after questions.

(Kudos to our client for doing all the info session requested by us!)

1

Product walkthrough & flow mapping

Mapped every interaction, feature, rule, and terminology in FigJam to understand the product's architecture

2

Became the usability testers

Our team ran through every task scenario ourselves, surfacing confusion before participants ever saw the product.

3

Task design grounded in real workflow

Tasks were designed to mirror how A/B testers actually work, so findings would reflect genuine user problems

07 RECRUITMENT

Who we tested

We recruited 7 participants representing Optimizely's target user: non-technical, with A/B testing experience but less than one year experience using Optimizely.

We wanted to mirror the realistic onboarding scenario: someone who understands experimentation conceptually but is new to this platform. Little-to-no programming experience was key, since the product's promise was non-technical independence.

Two participants (P4, P6) slightly deviated from the target profile. Their overlapping findings were included, unique outlier observations were logged as low-severity and noted separately.

Age range

20–40 years old

Technical background

Little to no programming experience (3 participants had SQL familiarity)

A/B testing experience

1 experienced · 5 limited · 1 none

Roles represented

Product managers, content specialists, business CEOs

08 FINDINGS

What we found

While participants appreciated the platform's customizability, five recurring themes of friction emerged. Two were high-severity, causing direct task failure. The SUS score of 58 confirmed usability was below acceptable thresholds.

58

SUS Score below 68 indicates usability concerns.

SUS Score below 68 indicates usability concerns.

6/7

Participants missed the "Copy Rule" option entirely

Participants missed the "Copy Rule" option entirely

4/7

Failed to run the test due to confusing "Run" status

Failed to run the test due to confusing "Run" status

5

Distinct usability friction themes identified

Distinct usability friction themes identified

HIGH SEVERITY

6/7 participants never noticed the "Copy Rule" option

WHAT WE FOUND

The "Copy Rule" feature, buried inside a dropdown under "Add Rule", was the intended path for transferring experiment setups from Development to Production. Almost no one found it without prompting. Users were also unsure what copying a rule would actually do, creating a second layer of uncertainty even after discovery.

DESIGN RECOMMENDATIONS

  • Separate "Copy Rule" as a standalone visible option rather than nesting it in a dropdown.

  • Add a visual indicator to "Add Rule". Consider a "Push to Production" feature that lets users deploy rules directly from development, eliminating the copy step entirely.

"Didn't notice the Copy Rule function until I was prompted." — P5

"Navigating from development to production was confusing. I would like to push the rule directly from development." — P2

HIGH SEVERITY

4/7 participants couldn't figure out how to run the test

WHAT WE FOUND

There are two distinct "Run" buttons: one at the Rule level, one at the Ruleset level. The sequence matters, but nothing in the UI communicates this hierarchy.

The status "Ready to Run" appeared after running the Rule, but participants interpreted it as confirmation the test was already running, not as a prompt to run the Ruleset next.

DESIGN RECOMMENDATIONS

  • Replace "Ready to Run" with action-oriented language: "Run Ruleset to Apply Changes", making the next step explicit.

  • Add step indicators (Step 1: Run Rule → Step 2: Run Ruleset) to create visible progression.

"I'm not sure how to run the test." — P2

MEDIUM SEVERITY / EXTRA TIME ON TASK

Development and Production environments were indistinguishable

WHAT WE FOUND

The two environments looked nearly identical. Several participants set up rules in the wrong environment and had to repeat steps, losing time and confidence.

DESIGN RECOMMENDATIONS

Add distinct visual differentiation between environments: color-coded headers, persistent environment badges, or a persistent banner that makes the current environment immediately obvious at all times.

"I'm unclear about whether the test should be in Development or Production." — P1

"Maybe just tell me the environment I'm in?" — P2

09 IMPACT

What changed because of this work

This was the first formal usability study ever conducted on Optimizely's Feature Experimentation onboarding flow. The findings directly helped shape Q2–Q3 2025 product roadmap and future design decisions.

First

Usability study to surface onboarding experience pain points

10,000+

Future experiments impacted by the recommended UI improvements

Q2/Q3

2025 design decisions directly informed by these findings

10+

Stakeholders at the final readout: PMs, designers, and engineers

"I'm impressed by how your team navigated this in just a few weeks without an A/B testing background, the insights about the product are accurate and on-point."

OPTIMIZELY PRODUCT TEAM, POST PRESENTATION

10 REFLECTION

What I'd do differently,
and what I'd carry forward

Navigating a complex product with no A/B testing background was challenging. The heavy jargon left us lost at first, but through collaborative problem-solving, taking the initiative asking questions, mapping workflows, and rebuilding our understanding step by step, we gradually built clarity and overcame the confusion. Looking back, there are three takeaways I'd carry forward:

Become the user before designing: Remove everything I thought I know and rebuild!

Walking through the product ourselves first wasn't just preparation. It changed how we framed every task and probe. I'll make this a non-negotiable step in every evaluative study.

📊

Handle outliers intentionally, not by default

In future studies, I'd build the outlier decision: . why they are included or excluded, into the analysis plan from the start rather than addressing it after data collection.

In future studies, I'd build the outlier decision: . why they are included or excluded, into the analysis plan from the start rather than addressing it after data collection.

🎯

Know the presenting style before I start

Knowing the stakeholder’s timeline and preferred format in advance made the process more efficient. I shared findings in small sections as the PM needed them.

More dots? Let's connect!

© 2026 CRAFTED BY CELIA