Success metrics - userjourneys.ai

A success metric is a measurable outcome an experiment is trying to change. userjourneys.ai supports four types, each answering a different product question with a different formula. This guide covers what each type computes, when to use it, and how the results are produced.

Metric types at a glance

Type	Answers	Formula	Typical use
Conversion	Did the user trigger the event at least once?	`triggered ÷ exposed`	Activation, signup, first action
Events per user	How many events per user?	`Σ events ÷ exposed`	Volume: clicks, views, purchases
Events per user per active day	How intensely when engaged?	`Σ(events ÷ active days) ÷ exposed`	Engagement depth, session quality
Retention	Did the user clear a threshold within a window?	`threshold met ÷ exposed`	Habit formation, stickiness

All four denominators are the total number of exposed users — users who never triggered the event contribute 0. This is intent-to-treat analysis, the correct statistical frame for A/B testing.

Conversion

Measures the fraction of exposed users who fired the event at least once.

Formula

Rate = users_who_triggered ÷ users_exposed

Example

A variant exposes 1,200 users. 340 of them fire signup_completed at least once.

Conversion rate = 340 ÷ 1,200 = 28.3%

When to use

Activation funnels, first-time actions, any binary “did it happen” outcome. Don’t use it when volume matters. A user firing the event ten times contributes the same as a user firing it once. For volume, use Events per user.

Events per user

Measures the average number of events each exposed user fired.

Formula

Mean = Σ(events across all users) ÷ users_exposed

Non-participants count as 0 in the numerator but remain in the denominator — a zero-filled mean taken over everyone exposed, not only those who engaged.

Example

A variant exposes 1,200 users. 200 fire 600 events total; the remaining 1,000 fire none.

Mean = 600 ÷ 1,200 = 0.5 events / user

When to use

Volume metrics: clicks, page views, messages sent, purchases. A user firing the event ten times contributes ten times as much as a user firing it once. Don’t use it when engagement intensity matters more than cumulative volume. A user firing ten events on one day and a user firing one event on each of ten days contribute the same here. For intensity, use Events per user per active day.

Events per user per active day

Measures the average per-active-day event rate, computed per user, then averaged across the exposed population.

Formula

For each user:   rate = SUM(events) ÷ COUNT(DISTINCT active_days)
Mean           = Σ(rate) ÷ users_exposed

Each user contributes their own daily rate rather than their raw total. Non-participants contribute 0.

Example

Two users, same event, same experiment window:

User	Events	Active days	Per-user rate
Alice	10	1	10
Bob	10	10	1

Alice and Bob fired the same number of events, but their per-day rates differ by 10×. This metric captures that asymmetry; Events per user does not.

When to use

Engagement intensity, session quality, and any question where “when users come, they come hard” matters more than “how many total interactions.” Don’t use it when you care about total volume. Use Events per user.

When these two mean metrics show similar numbers, the denominators still differ — total events vs. a sum of per-user daily rates — and meaningful differences typically appear at the 2nd or 3rd decimal. Low-participation experiments can make them look numerically close even when answering different questions.

Retention

Measures the fraction of exposed users who met a frequency threshold within a specified post-exposure window.

Formula

Rate = users_who_met_threshold ÷ users_exposed

Configure the threshold as “at least N events in days X through Y after exposure.”

Example

Threshold: at least 2 events in days 0–7. Out of 1,200 exposed users, 180 meet it.

Retention rate = 180 ÷ 1,200 = 15.0%

When to use

Habit formation, stickiness, and any question about whether users keep engaging within a specific time window. Don’t use it for one-time actions (Conversion is simpler) or when you need the number of interactions (Events per user).

How results are computed

The sections below document the statistical machinery. Read them to interpret edge cases; skip them to just use the numbers.

Zero-filling

All four metric types divide by the total exposed user count, not just participants. Exposed users who never trigger the event contribute 0 to the numerator. This is intent-to-treat (ITT) analysis: the experiment measures the effect of assigning users to a variant, not the effect on users who engaged. Restricting to engaged users selects on outcome and biases the result. Practical consequence: with low participation (say 3%), the median pins to 0 because more than half of the variant contributes 0. The Participating column shows how many users contributed non-zero values.

Winsorization

Events per user and Events per user per active day cap each user’s per-user value at the variant’s 99.9th percentile before summing. This bounds the influence of extreme outliers without removing them entirely. Conversion and Retention are booleans per user; there’s no value to cap.

Significance testing

Metric type	Test
`Conversion`, `Retention`	Two-proportion Z-test (two-sided)
`Events per user`, `Events per user per active day`	Welch’s t-test (two-sided)

The q-value column shows p-values adjusted via Benjamini–Hochberg, which controls the false-discovery rate across all metrics on the experiment. Treat q < 0.05 as statistically significant.

CUPED variance reduction

CUPED (Controlled-experiment Using Pre-Experiment data) uses each user’s pre-exposure behavior as a covariate to shrink variance. Typical reduction is 10–40%, which means experiments reach significance with fewer samples. Enable CUPED per metric by setting cuped_pre_exposure_days to the number of pre-exposure days to use (e.g. 7 or 14). Only applies to Events per user and Events per user per active day.

Troubleshooting

My mean and median look suspiciously small

Low participation. If 3% of exposed users engaged, the remaining 97% contribute 0. The mean is diluted and the median pins to 0.Check the Participating column. When it’s a small fraction of Users, the metric is dominated by zero-filled non-participants. Options:

Run the experiment longer to accumulate more participants.
Switch to Retention if the question is “how many users crossed a threshold.”
Switch to Conversion if a binary “did they engage” is enough.

Events per user and Events per user per active day show nearly identical numbers

The two metrics differ only when users vary in how many days they were active. If every engaged user fires the event on exactly one day, the two reduce to the same quantity (events ÷ 1 = events).To see a meaningful difference, pick an event that can repeat across days — session-level activity, daily check-ins, repeated clicks.

The q-value never drops below 0.05

The experiment is underpowered: either the sample is too small or the effect size is too small relative to variance. Options:

Increase traffic allocation or extend the run to accumulate more exposures.
Enable CUPED on mean metrics to reduce variance.
Pick a metric with less variance — Conversion is typically less noisy than Events per user.
Verify the metric is one the treatment is expected to move.

​Metric types at a glance

​Conversion

​Formula

​Example

​When to use

​Events per user

​Formula

​Example

​When to use

​Events per user per active day

​Formula

​Example

​When to use

​Retention

​Formula

​Example

​When to use

​How results are computed

​Zero-filling

​Winsorization

​Significance testing

​CUPED variance reduction

​Troubleshooting

Metric types at a glance

Conversion

Formula

Example

When to use

Events per user

Formula

Example

When to use

Events per user per active day

Formula

Example

When to use

Retention

Formula

Example

When to use

How results are computed

Zero-filling

Winsorization

Significance testing

CUPED variance reduction

Troubleshooting