EVALUATIVE USABILITY STUDY @ OPTIMIZELY

EVALUATIVE USABILITY STUDY @ OPTIMIZELY

Improving the usability and information hierarchy of a complex A/B testing platform

As a lead UX researcher, I tackled the research challenges of this complex platform as a newbie to A/B testing, and led the testing sessions under 8 weeks with 7 non-technical users to identify the hidden friction and improve the design of an A/B testing platform.

MY ROLE

Lead Researcher (4/7 sessions)

Project Manager

TEAM

Collaborated with:
2 Product Managers

4 UX Researchers

METHODOLOGY

Moderated Usability Testing

Think-Aloud

System Usability Score (SUS)

DURATION

Jan. – Mar. 2025

THE PROBLEM

The UI is confusing for users and they spend too much time onboarding just to complete core actions

THE GOAL

  • First usability testing of the product onboarding experience

  • Presented and delivered to product team with 10+ members

  • Directly informed 2025 Q2-Q3 product roadmap

  • Impacting 10000+ future experiments

OUTCOME & IMPACT

First-ever UX study that helped shape product roadmap

This was the first formal usability study on Optimizely's Feature Experimentation onboarding flow. The findings went directly to the research team, informing the Q2–Q3 2025 product roadmap and future design decisions.

First

Usability study to surface onboarding experience pain points

10,000+

Future experiments impacted by the recommended UI improvements

10+

Stakeholders at the final readout including PMs, designers, and engineers

5

Distinct usability friction themes identified

"I'm impressed by how your team navigated this in just a few weeks without an A/B testing background, the insights about the product are accurate and on-point."

——— OPTIMIZELY PRODUCT TEAM, POST PRESENTATION FEEDBACK

UX Research / Case Study

CONTEXT

An A/B testing tool designed to make things easier for the non-technicals, but ended up confused users

Optimizely's Feature Experimentation platform gives product managers, marketers, and engineers the ability to run A/B tests and manage feature rollouts, without touching a single line of code.

However, we found that user struggled with the UI and felt they had to spend too much effort just to complete core actions.

User complaints centered around two workflows:

  1. Setting up feature rollouts and experiments

  1. Understanding the overall information hierarchy of the system

User complaints centered around two workflows:

  1. Setting up feature rollouts and experiments

  1. Understanding the overall information hierarchy of the system

No one had ever formally tested the onboarding experience from a usability perspective in Optimizely.. and that gap became our entry point.


"Through the platform, users wouldn't need an engineer beside them just to run a test."

— OPTIMIZELY PRODUCT GOAL

Feature flag setup confusion

High cognitive load during onboarding

Unclear information hierarchy

What is Feature Experimentation?

TL;DR: Imagine you can test two different versions of your website without coding using just one button, this platform is where you control and setup the button. Now, you can fix one color without getting an engineer to code for you.

A feature flag is a technique that lets users toggle product features on or off without deploying new code. Optimizely's platform is built on this concept. Users create flags, attach rulesets, and run A/B tests or targeted deliveries against specific audience segments.

The user base spans technical and non-technical roles: PMs, marketers, content specialists, and CEOs. The product aspiration was that any of them could independently launch an experiment, with zero engineer dependency.

TL;DR: Imagine you can test two different versions of your website without coding using just one button, this platform is where you control and setup the button. Now, you can fix one color without getting an engineer to code for you.

A feature flag is a technique that lets users toggle product features on or off without deploying new code. Optimizely's platform is built on this concept. Users create flags, attach rulesets, and run A/B tests or targeted deliveries against specific audience segments.

The user base spans technical and non-technical roles: PMs, marketers, content specialists, and CEOs. The product aspiration was that any of them could independently launch an experiment, with zero engineer dependency.

STAKEHOLDER INVOLVEMENT

My role & who I worked with

This project was a collaboration between our UX research team and Optimizely's product team. I worked with my team to lead the study design, facilitated the pilot test, and moderated 4 out of 7 usability sessions. I led the data synthesis and collaborated with my team to present findings.

The Optimizely product team was our client. We discussed with 2 product managers to define the problems, product goals, outlined the critical user workflows, as well as narrowing down the target audience. The entire product team was present for the final presentation of this project.

Me

as a Lead UX Researcher

Study design, pilot test, moderated 4 sessions, data synthesis, deck creation, presented findings

4

Researchers worked with me

Collaborated with 4 researchers from define to testing and research

10+

Optimizely Product Team members

VP, PMs, researchers, designers, engineers who attended the final presentation readout

RESEARCH OBJECTIVE

Identify the exact friction points that cause non-technical users to fail or struggle when setting up an A/B test independently

This mattered beyond UX scores, as the product markets itself to be a A/B testing tool that is easy for non-technical users, that they can be self-sufficient without an engineer fixing for them.

Every engineer called in to help and extra time taken, could be a cost that eroded that promise, and may eventually affect business.

This mattered beyond UX scores, as the product markets itself to be a A/B testing tool that is easy for non-technical users, that they can be self-sufficient without an engineer fixing for them.

Every engineer called in to help and extra time taken, could be a cost that eroded that promise, and may eventually affect business.

How intuitive is the product?

Specifically around setting up and managing A/B tests end-to-end without prior training

Where do users break?

What challenges and frustrations emerge when navigating core features in a real-world workflow

Does terminology help or hurt?

How well do users understand the language of the interface while running experiments

RESEARCH PLAN

Planning with constraints

WEEKS 1–4

Study Planning + Recruitment

Product deep-dive, becoming the testers ourselves, interview protocol design, screener + recruitment

WEEKS 4–7

Facilitation + Data Collection

1 pilot test + 7 moderated sessions, think-aloud protocol, quantitative + qualitative data capture

WEEK 7–8

Synthesis + Readout

Affinity mapping, severity rating, recommendations, live presentation to 10+ stakeholders

METHODOLOGY

Why moderated usability testing?

Approach: Moderated Usability Testing

We conducted seven 60-minute remote sessions using a think-aloud protocol. Each participant completed six task-based scenarios in a hypothetical business context, followed by a post-test SUS questionnaire.

Sessions included a pre-test interview to calibrate experience level, six structured tasks, post-task probing questions, and a final SUS score.

WHY THIS METHOD

The product goal required non-technical independence. We needed to see exactly where and why they broke.

Think-aloud surfaces implicit confusion that survey data misses entirely

Moderated sessions let us probe in real time when something unexpected surfaced

OUR CONSTRAINTS

No A/B testing background

Recruitment

Complex technical product

8 week timeline

PROCESS

Onboarded as real users before testing to build product fluency and map the mental model

Scariest thing before tests?

Being just as confused.

The Feature Experimentation platform is dense and full of technical jargons. Before we could design meaningful tasks with clear intention, we had to deeply understand the product: the jargons, hidden logic, information hierarchy, and how rules, flags, environments, and rulesets interact.. as a A/B testing and coding environment newbie.

Scariest thing before tests?

Being just as confused.

The Feature Experimentation platform is dense and full of technical jargons. Before we could design meaningful tasks with clear intention, we had to deeply understand the product: the jargons, hidden logic, information hierarchy, and how rules, flags, environments, and rulesets interact.. as a A/B testing and coding environment newbie.

Started with the step-by-step developer manual on Optimizely's website, our team mapped the interaction flows, flagged every piece of terminology we didn't understand, and walked through the platform as users ourselves.

Then brought all the questions we had to our client to clarify things up, verified our assumptions, and asked our client to do a demo walkthrough, this was when the mental mapping finally started to clicked. Questions after questions. (Kudos to our client for doing all the info session requested by us!)

Started with the step-by-step developer manual on Optimizely's website, our team mapped the interaction flows, flagged every piece of terminology we didn't understand, and walked through the platform as users ourselves.

Then brought all the questions we had to our client to clarify things up, verified our assumptions, and asked our client to do a demo walkthrough, this was when the mental mapping finally started to clicked. Questions after questions. (Kudos to our client for doing all the info session requested by us!)

1

Product walkthrough & flow mapping

Mapped every interaction, feature, rule, and terminology in FigJam to understand the product's architecture

2

Became the users ourselves

Ran through every task scenario ourselves, surfacing confusion before participants ever saw the product, and see the product in their ways

3

Task design grounded in real workflow

7 tasks were designed to mirror how A/B testers actually work, so findings would reflect genuine user problems

RECRUITMENT

Testing the non-technicals

We recruited 7 participants representing Optimizely's target user: non-technical, with A/B testing experience but less than one year experience using Optimizely.

We wanted to mirror the realistic onboarding scenario: someone who understands experimentation conceptually but is new to this platform. Little-to-no programming experience was key, since the product's promise was non-technical independence.

Two participants (P4, P6) slightly deviated from the target profile. Their overlapping findings were included, unique outlier observations were logged as low-severity and noted separately.

Age range

20–40 years old

Technical background

Little to no programming experience (3 participants had SQL familiarity)

A/B testing experience

1 experienced · 5 limited · 1 none

Roles represented

Product managers, content specialists, business CEOs

FINDINGS

What we found

While participants appreciated the platform's customizability, five recurring themes of friction emerged. Two were high-severity, causing direct task failure. The SUS score of 58 confirmed usability was below acceptable thresholds.

58

SUS Score below 68 indicates usability concerns.

SUS Score below 68 indicates usability concerns.

6/7

Participants missed the "Copy Rule" option entirely

Participants missed the "Copy Rule" option entirely

4/7

Failed to run the test due to confusing "Run" status

Failed to run the test due to confusing "Run" status

5

Distinct usability friction themes identified

Distinct usability friction themes identified

HIGH SEVERITY

6/7 participants never noticed the "Copy Rule" option

WHAT WE FOUND

The "Copy Rule" feature, buried inside a dropdown under "Add Rule", was the intended path for transferring experiment setups from Development to Production. Almost no one found it without prompting. Users were also unsure what copying a rule would actually do, creating a second layer of uncertainty even after discovery.

DESIGN RECOMMENDATIONS

  • Separate "Copy Rule" as a standalone visible option rather than nesting it in a dropdown.

  • Add a visual indicator to "Add Rule". Consider a "Push to Production" feature that lets users deploy rules directly from development, eliminating the copy step entirely.

"Didn't notice the Copy Rule function until I was prompted." — P5

"Navigating from development to production was confusing. I would like to push the rule directly from development." — P2

HIGH SEVERITY

4/7 participants couldn't figure out how to run the test

WHAT WE FOUND

There are two distinct "Run" buttons: one at the Rule level, one at the Ruleset level. The sequence matters, but nothing in the UI communicates this hierarchy.

The status "Ready to Run" appeared after running the Rule, but participants interpreted it as confirmation the test was already running, not as a prompt to run the Ruleset next.

DESIGN RECOMMENDATIONS

  • Replace "Ready to Run" with action-oriented language: "Run Ruleset to Apply Changes", making the next step explicit.

  • Add step indicators (Step 1: Run Rule → Step 2: Run Ruleset) to create visible progression.

"I'm not sure how to run the test." — P2

MEDIUM SEVERITY / EXTRA TIME ON TASK

Development and Production environments were indistinguishable

WHAT WE FOUND

The two environments looked nearly identical. Several participants set up rules in the wrong environment and had to repeat steps, losing time and confidence.

DESIGN RECOMMENDATIONS

Add distinct visual differentiation between environments: color-coded headers, persistent environment badges, or a persistent banner that makes the current environment immediately obvious at all times.

"I'm unclear about whether the test should be in Development or Production." — P1

"Maybe just tell me the environment I'm in?" — P2

TAKEAWAYS

Become the user before designing: Remove everything I thought I know and rebuild, allowed me to frame and probe questions more aligned with how users see this product

Navigating a complex product with no A/B testing background and heavy jargons was challenging. Through collaborative problem-solving, asking questions, mapping workflows, and rebuilding my understanding step by step, I gradually built clarity and overcame the confusion.

In future studies, I'd like to handle outliers more intentionally: building the outlier decision on why they are included or excluded, into the planniing from the start rather than addressing it after data collection.

More dots? Let's connect!

© 2026 CRAFTED BY CELIA