Skip to main content
A success metric is a measurable outcome an experiment is trying to change. userjourneys.ai supports four types, each answering a different product question with a different formula. This guide covers what each type computes, when to use it, and how the results are produced.

Metric types at a glance

TypeAnswersFormulaTypical use
ConversionDid the user trigger the event at least once?triggered ÷ exposedActivation, signup, first action
Events per userHow many events per user?Σ events ÷ exposedVolume: clicks, views, purchases
Events per user per active dayHow intensely when engaged?Σ(events ÷ active days) ÷ exposedEngagement depth, session quality
RetentionDid the user clear a threshold within a window?threshold met ÷ exposedHabit formation, stickiness
All four denominators are the total number of exposed users — users who never triggered the event contribute 0. This is intent-to-treat analysis, the correct statistical frame for A/B testing.

Conversion

Measures the fraction of exposed users who fired the event at least once.

Formula

Rate = users_who_triggered ÷ users_exposed

Example

A variant exposes 1,200 users. 340 of them fire signup_completed at least once.
Conversion rate = 340 ÷ 1,200 = 28.3%

When to use

Activation funnels, first-time actions, any binary “did it happen” outcome. Don’t use it when volume matters. A user firing the event ten times contributes the same as a user firing it once. For volume, use Events per user.

Events per user

Measures the average number of events each exposed user fired.

Formula

Mean = Σ(events across all users) ÷ users_exposed
Non-participants count as 0 in the numerator but remain in the denominator — a zero-filled mean taken over everyone exposed, not only those who engaged.

Example

A variant exposes 1,200 users. 200 fire 600 events total; the remaining 1,000 fire none.
Mean = 600 ÷ 1,200 = 0.5 events / user

When to use

Volume metrics: clicks, page views, messages sent, purchases. A user firing the event ten times contributes ten times as much as a user firing it once. Don’t use it when engagement intensity matters more than cumulative volume. A user firing ten events on one day and a user firing one event on each of ten days contribute the same here. For intensity, use Events per user per active day.

Events per user per active day

Measures the average per-active-day event rate, computed per user, then averaged across the exposed population.

Formula

For each user:   rate = SUM(events) ÷ COUNT(DISTINCT active_days)
Mean           = Σ(rate) ÷ users_exposed
Each user contributes their own daily rate rather than their raw total. Non-participants contribute 0.

Example

Two users, same event, same experiment window:
UserEventsActive daysPer-user rate
Alice10110
Bob10101
Alice and Bob fired the same number of events, but their per-day rates differ by 10×. This metric captures that asymmetry; Events per user does not.

When to use

Engagement intensity, session quality, and any question where “when users come, they come hard” matters more than “how many total interactions.” Don’t use it when you care about total volume. Use Events per user.
When these two mean metrics show similar numbers, the denominators still differ — total events vs. a sum of per-user daily rates — and meaningful differences typically appear at the 2nd or 3rd decimal. Low-participation experiments can make them look numerically close even when answering different questions.

Retention

Measures the fraction of exposed users who met a frequency threshold within a specified post-exposure window.

Formula

Rate = users_who_met_threshold ÷ users_exposed
Configure the threshold as “at least N events in days X through Y after exposure.”

Example

Threshold: at least 2 events in days 0–7. Out of 1,200 exposed users, 180 meet it.
Retention rate = 180 ÷ 1,200 = 15.0%

When to use

Habit formation, stickiness, and any question about whether users keep engaging within a specific time window. Don’t use it for one-time actions (Conversion is simpler) or when you need the number of interactions (Events per user).

How results are computed

The sections below document the statistical machinery. Read them to interpret edge cases; skip them to just use the numbers.

Zero-filling

All four metric types divide by the total exposed user count, not just participants. Exposed users who never trigger the event contribute 0 to the numerator. This is intent-to-treat (ITT) analysis: the experiment measures the effect of assigning users to a variant, not the effect on users who engaged. Restricting to engaged users selects on outcome and biases the result. Practical consequence: with low participation (say 3%), the median pins to 0 because more than half of the variant contributes 0. The Participating column shows how many users contributed non-zero values.

Winsorization

Events per user and Events per user per active day cap each user’s per-user value at the variant’s 99.9th percentile before summing. This bounds the influence of extreme outliers without removing them entirely. Conversion and Retention are booleans per user; there’s no value to cap.

Significance testing

Metric typeTest
Conversion, RetentionTwo-proportion Z-test (two-sided)
Events per user, Events per user per active dayWelch’s t-test (two-sided)
The q-value column shows p-values adjusted via Benjamini–Hochberg, which controls the false-discovery rate across all metrics on the experiment. Treat q < 0.05 as statistically significant.

CUPED variance reduction

CUPED (Controlled-experiment Using Pre-Experiment data) uses each user’s pre-exposure behavior as a covariate to shrink variance. Typical reduction is 10–40%, which means experiments reach significance with fewer samples. Enable CUPED per metric by setting cuped_pre_exposure_days to the number of pre-exposure days to use (e.g. 7 or 14). Only applies to Events per user and Events per user per active day.

Troubleshooting

Low participation. If 3% of exposed users engaged, the remaining 97% contribute 0. The mean is diluted and the median pins to 0.Check the Participating column. When it’s a small fraction of Users, the metric is dominated by zero-filled non-participants. Options:
  • Run the experiment longer to accumulate more participants.
  • Switch to Retention if the question is “how many users crossed a threshold.”
  • Switch to Conversion if a binary “did they engage” is enough.
The two metrics differ only when users vary in how many days they were active. If every engaged user fires the event on exactly one day, the two reduce to the same quantity (events ÷ 1 = events).To see a meaningful difference, pick an event that can repeat across days — session-level activity, daily check-ins, repeated clicks.
The experiment is underpowered: either the sample is too small or the effect size is too small relative to variance. Options:
  • Increase traffic allocation or extend the run to accumulate more exposures.
  • Enable CUPED on mean metrics to reduce variance.
  • Pick a metric with less variance — Conversion is typically less noisy than Events per user.
  • Verify the metric is one the treatment is expected to move.