Rewards Causality Analysis Plan

The Problem

Why Current Numbers Don't Prove Causality

The 7.4x retention and 3.2x transaction multipliers suffer from two fatal confounders that must be addressed before any conclusions about rewards effectiveness can be drawn.

Selection Bias

?

Users who seek out and engage with rewards are likely already more active, more motivated, and more likely to transact. They self-select into the feature. Comparing them to the general population is comparing apples to oranges.

Mechanical Inflation

?

Missions like SWAP1 require a swap to complete. If a user who never swapped now does one swap to complete the mission, swap activity mechanically goes from 0 to 1. That is not "increased activity" - it is the mission requirement itself.

The question we must answer: After stripping out the mission-completing transactions themselves, do rewards users show more subsequent activity than comparable users who never engaged with rewards? If someone completes SWAP1, do they go on to do a second, third, fourth swap organically? Or do they stop?

What "Proving Causality" Means Here

We need to demonstrate that the rewards feature generates incremental activity. Specifically:

Post-mission organic activity - After a user completes a mission and claims the reward, do they perform additional actions of the same type beyond what the mission required?
Activity vs. matched baseline - Is this post-mission activity higher than what a comparable user (same tenure, same pre-rewards activity, same geography) does in the same time period without rewards?
Net-new behavior - Did rewards activate users who were previously inactive? Or did it only reward users who were going to transact anyway?

Cohort Definitions

Treatment vs. Control Groups

The analysis requires carefully defined cohorts. The feature flag [Experiment] rewards-feature and Ramp Network app platform filters are the primary segmentation tools.

Treatment Group: Rewards Engagers

Users who actively engaged with the rewards feature, subdivided by depth of engagement.

Cohort	Definition	Identification Method	Current Size (est.)
T1: Mission Completers	Users who clicked `Claim rewards mission reward` for any mission_code	`click` event where `label = "Claim rewards mission reward"`	~250 users
T2: Mission Actors	Users who clicked `Rewards mission action` (started a mission CTA) but may not have completed	`click` event where `label = "Rewards mission action"`	~500 users
T3: Rewards Visitors	Users who viewed the Rewards screen but did not interact with any mission	`page_view` where `screen_name = "Rewards"`, excluding T1 and T2	~530 users

Control Groups

Cohort	Definition	Purpose
C1: Flag-On, No Visit	Users with `[Experiment] rewards-feature = "on"` who never viewed the Rewards screen	Best available control. Same flag exposure, same platform (mobile app, non-EU, non-partner), but no rewards engagement. Controls for platform but not for motivation.
C2: Propensity-Matched	From C1, select users matched on pre-rewards activity (transaction count, volume, tenure, transaction flow mix in the 30 days before March 16)	Gold standard control. Same observable characteristics, different treatment status. Controls for both platform and baseline activity level.
C3: Pre-Period Self	Each treatment user compared against their own behavior in the 30 days before their first rewards screen visit	Difference-in-differences baseline. Each user is their own control. Eliminates between-user confounders but vulnerable to time trends.

The flag is our best natural experiment. The rewards-feature flag (ID 10188193) is a release flag that rolls out to integration_name = "Ramp Landing - iOS" or "Ramp Landing - Android" users outside the EU. Within that eligible population, ~567 daily users have the flag on, while ~12,000+ do not. Flag-on users who never visit rewards (C1) are the closest available control.

Flag Targeting Details (From Amplitude Experiment)

The rewards-feature flag targets users matching ALL of these conditions:

residence_country is NOT any EU/EEA country (31 countries excluded including UK, Norway, Switzerland)
integration_name IS "Ramp Landing - iOS" or "Ramp Landing - Android"

The default (everyone else) gets no variant. There is also a separate ramp.network employee override segment.

Coverage limitation. This means the analysis is restricted to non-EU mobile app users. Web users (35K) and EU users are excluded from the rewards feature entirely. Any findings apply only to this specific population segment.

Metrics Framework

Isolating Post-Mission Incremental Activity

For each mission type, we must precisely define: what counts as the mission requirement (to subtract), and what counts as incremental organic activity (to measure).

Mission	Requirement	What to Subtract	Incremental Metric	Claims to Date
TOPUP1	Top up $25+	First on-ramp transaction >= $25 after mission action click	Count and volume of on-ramp transactions after claim, excluding the first one	~67
TX1	Complete 1 transaction	First transaction_completed after mission action click	Count and volume of all transactions after claim, excluding the first one	~31
SWAP1	First swap of $50+	First swap (transaction_flow = SWAP) >= $50 after mission action click	Count of additional swaps after claim. Does completing SWAP1 lead to organic SWAP3?	~19
SWAP3	Complete 3 swaps	First 3 swaps after mission action click	Count of swaps beyond the 3rd. Does the habit persist?	~3
EARN10	Earn $25 in stablecoins	Earn deposits up to $25 threshold after mission action click	Additional earn deposits beyond mission threshold. Does user continue earning?	~3
WALLET	Connect wallet	The wallet connection event itself	Any transactions within 30 days of wallet connection. Wallet is a gateway action.	~131
INV3 / INV7	Invite 3/7 friends	The invite actions themselves	Do referred users become active? Do inviters increase their own activity?	~1 / ~0
SEND5	Send to 5 friends	First 5 send transactions after mission action click	Send transactions beyond the 5th. Does send become a habit?	~1

Critical sample size constraint. Most missions have fewer than 30 completions. Only WALLET (~131 claims), TOPUP1 (~67), and TX1 (~31) have potentially viable sample sizes for statistical analysis. SWAP1 (~19) is borderline. All others are too small for reliable inference. This analysis will need to wait for more data accumulation, or focus on the top 2-3 missions.

Core Metrics Per User

For each treatment user, compute these metrics in the 7, 14, and 30-day windows after their mission claim timestamp.

Post-claim transaction count (incremental) - Total transaction_completed events after claim, minus the mission-completing transaction(s)
Post-claim transaction volume (EUR) - Sum of fiat_amount_eur on post-claim transaction_completed events, excluding mission-completing transaction(s)
Time to next organic transaction - Time between mission claim and the first transaction_completed event that is NOT the mission-completing transaction
Return rate - Did the user return to the app (any page_view) at D7 / D14 / D30 after claim?
Transaction flow diversity - Did the user try a new transaction flow (SWAP, OFFRAMP, SEND) they hadn't used before?
Cross-mission progression - After completing one mission, did the user go on to complete additional missions?

For each control user (C2, propensity-matched), compute the same metrics in the identical calendar window, anchored to the matched treatment user's claim date.

Methodology

Three Approaches to Establishing Causality

No single method is definitive given the observational nature of this data. We apply three complementary approaches; if all three agree, we have strong evidence.

Method 1: Propensity Score Matching (PSM)

PSM Workflow

Flag-On Users

→

Compute Pre-Period Features

→

Logistic Regression: P(visits rewards)

→

Match Treatment to Control on P-score

→

Compare Post-Period Outcomes

Pre-Period Features (Covariates)

For each user in the flag-on population, compute these features for the 30-day window before the rewards feature launch (Feb 14 - Mar 15, 2026):

Transaction count - Number of transaction_completed events
Transaction volume (EUR) - Sum of fiat_amount_eur
Transaction flow mix - Proportion of transactions that are ONRAMP, SWAP, OFFRAMP, SEND
Session count - Number of distinct days with page_view events
Account tenure - Days since first ever page_view
Platform - iOS vs Android
Country - User property country
User type - user_type = "new" vs "returning"

Matching Procedure

Fit a logistic regression: P(visited_rewards = 1) ~ pre_tx_count + pre_tx_volume + pre_session_count + tenure + platform + country_group
For each treatment user, find the nearest control user (from C1) on propensity score, using 1:1 nearest-neighbor matching with caliper = 0.2 * SD of logit(propensity score)
Verify covariate balance post-matching: standardized mean differences should be < 0.1 for all covariates
Compare post-period outcomes between matched pairs

PSM limitation. Propensity score matching can only control for observable confounders. If unobserved factors (like crypto enthusiasm, portfolio size, or external market conditions) drive both rewards engagement and activity, PSM will not fully eliminate bias. This is why we run multiple methods.

Method 2: Difference-in-Differences (DiD)

DiD Design

Treatment: Rewards Engagers

→

Pre-Period Activity (Feb 14 - Mar 15)

→

Post-Period Activity (Mar 16 - Apr 15)

Control: PSM-Matched Non-Engagers

→

Pre-Period Activity (Feb 14 - Mar 15)

→

Post-Period Activity (Mar 16 - Apr 15)

The DiD estimator is:

// DiD causal effect estimate
effect = (treatment_post - treatment_pre) - (control_post - control_pre)

This controls for: (a) time-invariant differences between groups (treatment users are "more active" people), and (b) common time trends (crypto market conditions affecting everyone). It requires the parallel trends assumption: absent treatment, both groups would have followed similar activity trajectories.

Parallel Trends Validation

Plot weekly transaction counts for both groups in the 4-6 weeks before rewards launch. If the trends are approximately parallel, DiD is valid. If they diverge before the launch date, the assumption fails.

Method 3: Within-User Pre/Post Analysis

For each treatment user individually, compare their activity rate in the 30 days before their first rewards screen visit vs. 30 days after their mission claim. This eliminates all between-user confounders but is vulnerable to regression to the mean and concurrent time effects.

Key variant: split users into activity quartiles based on pre-period behavior:

Q1 (inactive) - 0 transactions in pre-period. This is the most important group. If rewards activates Q1 users, that is strong evidence of causal impact.
Q2 (light) - 1 transaction in pre-period
Q3 (moderate) - 2-3 transactions in pre-period
Q4 (heavy) - 4+ transactions in pre-period

The Q1 test is the most powerful. If we find that previously-inactive users (0 pre-period transactions) go on to make 2+ post-mission transactions (beyond the mission requirement itself), that is strong evidence that rewards caused new behavior. These users had no baseline activity, so there is nothing to select on.

Query Designs

Amplitude and Omni Queries

These are the concrete queries needed to execute the analysis. Each query references actual event names, property names, and filter values verified against the Amplitude schema (project 100017324, "New Widget - Production") and Omni model (Ramp Data Platform, 8d49baba-ff1d-4ab8-a234-0b1797646c0e).

Query 1: Build Treatment Cohort with Claim Timestamps

Source: Amplitude (project 100017324)

// Amplitude Event Segmentation
// Extract users who claimed rewards, with their claim timestamp and mission code
// Export as user-level CSV from Amplitude

Event: click
Filter: label = "Claim rewards mission reward"
Group By: mission_code
Metric: Event Totals (to see volume), then switch to User List export
Range: Mar 1, 2026 - Apr 1, 2026

// Required output per user:
//   - amplitude_user_id
//   - mission_code (WALLET, TOPUP1, TX1, SWAP1, etc.)
//   - claim_timestamp (event_time of the click)
//   - [Experiment] rewards-feature value

Implementation note. Amplitude's UI does not allow direct export of per-event timestamps with user IDs in a single view. You will need either: (a) the Amplitude Export API (Behavioral Cohorts + Profile API), (b) an Amplitude SQL query via the Data Tables feature, or (c) the Amplitude CDP/warehouse sync if available. The chart-based segmentation confirms the volumes but does not give per-user claim timestamps.

Query 2: Build Control Cohort (Flag-On, No Rewards Visit)

Source: Amplitude (project 100017324)

// Amplitude Cohort Definition
// Users who have the rewards flag but never visited the Rewards screen

Cohort: "Flag-On No Rewards"
Include: Users where user property
    [Experiment] rewards-feature = "on"
Exclude: Users who performed page_view
    where screen_name = "Rewards"
    in the last 30 days

// Estimated size: ~12,000 daily actives with flag (none), 
// vs ~567 daily with flag "on"
// After excluding rewards visitors (~1,279), control pool: ~10,500+

Query 3: Pre-Period Activity for All Flag-On Users

Source: Omni (transactions_simplified topic)

// Omni getData query
// Pull transaction history for PSM covariate computation

Prompt: "For all users from Ramp landing partner, show me per-user
transaction counts and total settled volume in USD, grouped by user ID
and transaction type, for transactions created between February 14
and March 15, 2026. Only include settled transactions."

Topic: transactions_simplified
Model: 8d49baba-ff1d-4ab8-a234-0b1797646c0e

// This gives us pre-period baselines per user for PSM matching
// Key fields: User ID, transaction type (ONSWAP, etc.),
//   settled transaction count, settled volume USD

Query 4: Post-Period Activity for Treatment Users

Source: Omni (transactions_simplified topic)

// Omni getData query
// Pull post-claim transactions for treatment users
// Must be joined with claim timestamps from Query 1

Prompt: "For a specific list of user IDs from Ramp landing partner,
show me all settled transactions after March 16, 2026, including
user ID, transaction ID, created time, transaction type, settled
volume USD, crypto asset symbol, and crypto chain symbol."

Topic: transactions_simplified
Model: 8d49baba-ff1d-4ab8-a234-0b1797646c0e

// Post-processing required:
//   1. Join with claim timestamps from Query 1
//   2. For each user, identify the mission-completing transaction
//      (first transaction of matching type after mission_action click)
//   3. Exclude mission-completing transaction(s) from incrementals
//   4. Count remaining transactions = incremental activity

Query 5: Session Activity for Retention Analysis

Source: Omni (fct_widget_events topic)

// Omni getData query
// Session-level activity for return rate computation

Prompt: "For Ramp landing partner users, show me the count of unique
users with sessions per day, grouped by user type (new vs returning),
for the period March 1 to April 1, 2026."

Topic: fct_widget_events
Model: 8d49baba-ff1d-4ab8-a234-0b1797646c0e

// Available fields from fct_widget_events:
//   - Merged Amplitude ID (user identifier)
//   - Foreign Keys User ID (email-based user ID)
//   - Session Start Time
//   - User Properties Partner Name
//   - User Properties User Type (new/returning)
//   - User Properties User Status
//   - Common Properties Screen Name
//   - Derived Transaction Flow
//   - Transaction Context Fiat Amount Eur

Query 6: Amplitude Retention Chart (Treatment vs Control)

Source: Amplitude (project 100017324)

// Amplitude Retention Analysis
// Compare D7/D14/D30 retention: rewards claimers vs matched control

Chart Type: Retention
Starting Event: click
    label = "Claim rewards mission reward"
Return Event: transaction_completed
Retention Type: N-Day
Range: Last 30 Days

// Segment 1 (Treatment): User property
//   [Experiment] rewards-feature = "on"
//   AND performed click where label = "Claim rewards mission reward"

// Segment 2 (Control): Build from saved cohort (Query 2)
//   Starting event for control = first transaction_completed
//   (since they have no claim event, anchor on their first 
//   transaction in the post-period)

// IMPORTANT: The return event must be transaction_completed,
// not page_view, because we want to measure transactional
// retention, not just app opens

Query 7: SWAP1 Deep Dive (Mission-Specific)

Source: Amplitude + Omni joined analysis

// The canonical causal question for SWAP1:
// "After completing their first $50+ swap for the SWAP1 mission,
//  did users go on to complete additional swaps?"

// Step 1: Identify SWAP1 claimers (Amplitude)
Event: click
Filter: label = "Claim rewards mission reward"
        AND mission_code = "SWAP1"
// ~19 users. Extract user IDs + claim timestamps

// Step 2: Pull their full swap history (Omni)
Prompt: "For these specific user IDs, show all settled transactions
where transaction type contains SWAP, ordered by created time,
from January 1 2026 to April 1 2026"

// Step 3: For each user, count:
//   - Swaps before SWAP1 claim (pre-baseline)
//   - The mission-completing swap (subtract this)
//   - Swaps after SWAP1 claim (incremental activity)
//   - Time gap between claim and next organic swap

// Step 4: Compare to matched control:
//   Control = flag-on users who did their first swap in the
//   same calendar week but without rewards engagement
//   How many of THEM did a second swap?

Instrumentation Gap Analysis

What Can Be Answered Now vs. What Needs New Tracking

Data Point	Status	Source	Notes
Rewards screen visits	Available	Amplitude `page_view` where `screen_name = "Rewards"`	Working since early March. Good coverage.
Mission claim events	Available	Amplitude `click` where `label = "Claim rewards mission reward"` + `mission_code`	Has mission_code for per-mission breakdown. This is the treatment timestamp anchor.
Mission action clicks	Available	Amplitude `click` where `label = "Rewards mission action"` + `mission_code`	Indicates user started pursuing a mission (clicked the CTA).
Feature flag assignment	Available	Amplitude user property `[Experiment] rewards-feature`	Values: "on" or (none). Used for treatment/control segmentation.
Transaction data	Available	Amplitude `transaction_completed` + Omni `transactions_simplified`	Both sources have transaction_flow, fiat_amount_eur, transaction_id. Omni has richer historical data.
Session activity	Available	Omni `fct_widget_events`	Has Merged Amplitude ID, session timestamps, screen names, user type.
Mission completion timestamp (backend)	Partial	Amplitude click event timestamp is a proxy	The click on "Claim rewards mission reward" is the closest available proxy for mission completion. Backend Prometheus counters track completions but are not user-attributed.
Per-user claim timestamps with user IDs	Partial	Requires Amplitude Export API or Data Tables	The Amplitude chart UI shows aggregate counts. Per-user timestamps need an API export, User Lookup, or warehouse sync.
Mission completion event (dedicated)	Not Tracked	Would need new Amplitude event	A dedicated `mission_completed` event with `mission_code`, `reward_amount`, and `completion_timestamp` would be cleaner than inferring from click labels.
Mission progress tracking	Not Tracked	Would need new Amplitude event	A `mission_progress` event (e.g., "SWAP3: 2 of 3 completed") would let us track partial completions and abandonment.
Reward amount / type per claim	Not Tracked	Not in Amplitude events	We do not know the reward value (XP? Tokens? Discount?) associated with each claim. This is needed to compute reward ROI.
Transaction-to-mission attribution	Not Tracked	Would need new property on transaction events	A property like `triggered_by_mission` on `transaction_completed` events would let us directly identify which transactions were mission-driven vs organic.
User ID mapping (Amplitude to Omni)	Partial	Both systems have user IDs but in different formats	Amplitude uses `amplitude_user_id` and `user_id`. Omni uses `User ID` (integer) and `Merged Amplitude ID`. Need to verify the join key.

New Instrumentation Recommendations

P0

Add mission_completed event to Amplitude

Fire a dedicated event when a mission is completed (backend-confirmed, not just UI click). Properties: mission_code, reward_type, reward_value, completion_timestamp, time_to_complete_seconds. This replaces the current reliance on click label inference.

P0

Add triggered_by_mission property to transaction_completed

When a transaction settles and the backend detects it completes a mission requirement, tag the transaction event with triggered_by_mission = "SWAP1" (or whichever mission). This eliminates the need for heuristic transaction-to-mission matching.

P1

Add mission_progress event for multi-step missions

For SWAP3, SEND5, INV3, INV7: fire a progress event at each step (e.g., "SWAP3: step 2 of 3"). Properties: mission_code, current_step, total_steps. This lets us measure abandonment at each step and understand friction points.

P1

Verify and document user ID join keys

Confirm that Amplitude's user_id maps to Omni's User ID or Merged Amplitude ID. Without a reliable join key, cross-system analysis (Amplitude behavioral data + Omni transaction data) is impossible. Test on a sample of 10 known users.

P2

Run a proper A/B holdout experiment

The gold standard for causality: within the eligible population (non-EU, mobile, Ramp Network landing), randomly assign 50% to see rewards and 50% to not see rewards. Run for 4-6 weeks. Compare transaction counts, volumes, and retention. This eliminates all confounders. The existing rewards-feature flag infrastructure supports this, though it is currently set to 100% rollout, not a 50/50 split.

Challenges and Limitations

What Could Go Wrong

Sample sizes are very small for most missions

WALLET has ~131 claims, TOPUP1 ~67, TX1 ~31, SWAP1 ~19. Everything else is in single digits. With n=19 for SWAP1, detecting a 20% difference in post-mission swap rates requires a control effect size of ~50% (unrealistically large). Practically, only WALLET and TOPUP1 have enough power for standalone analysis. The others should be pooled into a combined "any mission completed" cohort, sacrificing per-mission granularity for statistical power. Alternatively, wait 4-8 more weeks for data accumulation.

The feature is only 2 weeks old at scale

Rewards effectively launched the week of March 16, reaching ~567 daily flag-on users by end of March. That is barely 2 full weeks of data. Long-term behavioral changes (habit formation, increased lifetime value) take 30-90 days to manifest. A 14-day analysis window will capture short-term novelty effects but may miss sustained behavioral change. Plan for a Phase 2 analysis at the 60-day mark (mid-May 2026).

Rewards is mobile-only and non-EU only

The flag targets only integration_name = "Ramp Landing - iOS" or "Ramp Landing - Android" users outside 31 EU/EEA countries. This limits external validity. Findings cannot be generalized to web users (35K population) or EU users. If rewards is later expanded to web/EU, a separate analysis would be needed.

No randomized holdout exists

The rewards-feature flag is a release flag at 100% rollout for the eligible segment, not a randomized experiment. All eligible users get the feature; those who engage are self-selected. This makes true causal inference difficult with observational methods alone. The strongest recommendation is to run a proper 50/50 holdout for 4-6 weeks before concluding anything about causality.

Mission-completing transaction identification is heuristic

Without a triggered_by_mission property on transactions, we must infer which transaction completed a mission. For SWAP1, the heuristic is: "the first swap of $50+ after the user clicked 'Rewards mission action' for SWAP1." This works for simple missions but breaks for multi-step missions (SWAP3) where we cannot be certain which 3 swaps were mission-driven and which were organic. The P0 instrumentation recommendation (adding triggered_by_mission) would eliminate this ambiguity.

Crypto market conditions are a confounder

If the crypto market rallied in late March, ALL users (rewards and non-rewards) might increase activity. DiD controls for this if both groups share the trend, but if rewards users are disproportionately exposed to specific assets that rallied, the effect could be confounded. Include market benchmarks (BTC/ETH price) as a time-varying covariate in the DiD regression.

Omni and Amplitude user IDs may not join cleanly

Amplitude identifies users via amplitude_user_id (integer, e.g., 89562739732) and optionally user_id (email). Omni's transactions_simplified uses User ID (integer, e.g., 10534249) and fct_widget_events uses Merged Amplitude ID (integer matching Amplitude's ID). The join path is likely Merged Amplitude ID in Omni = amplitude_id in Amplitude, but this must be validated before any cross-system queries.

Execution Roadmap

Phased Analysis Plan

Phase 1: Quick Wins (This Week)

Answerable today with existing data. No new instrumentation required.

1a

WALLET + TOPUP1 post-claim transaction audit

For the ~131 WALLET claimers and ~67 TOPUP1 claimers, manually pull their full transaction history from Omni. Count how many completed a transaction AFTER their claim that was NOT the mission-completing transaction. This is a simple spreadsheet exercise using Amplitude User Lookup + Omni queries. Expected timeline: 1-2 days.

1b

Pre-period activity quartile analysis

For all rewards screen visitors (~1,279), classify them by pre-period activity (0, 1, 2-3, 4+ transactions before March 16). Report the distribution. If 80% of rewards engagers were already in Q4 (heavy users), selection bias is confirmed. If meaningful numbers come from Q1 (inactive), the feature may be activating new users.

1c

SWAP1 individual user audit

With only ~19 SWAP1 claimers, manually look up each user. For each one: did they swap before? Did they swap again after the mission-completing swap? This gives a directional answer with n=19 individual case studies. Not statistically significant but informative.

Phase 2: Rigorous Analysis (Weeks 2-3)

Requires new instrumentation (P0 items) and data engineering support.

2a

Implement P0 instrumentation

Ship mission_completed event and triggered_by_mission property. Validate data flow with 48 hours of production data. Coordinate with the rewards backend team (flag owners: jakub.jastrzebski2@ and marek.rycerski@).

2b

Propensity score matching analysis

Build PSM pipeline: extract pre-period covariates, fit propensity model, match treatment to control, compute post-period outcome differences with confidence intervals. Requires data science support and access to Amplitude's export API or warehouse sync.

2c

Difference-in-differences with parallel trends validation

Plot weekly activity for treatment and control groups from February 1 through present. Validate parallel trends assumption. Compute DiD estimator with standard errors clustered at the user level.

Phase 3: Definitive Answer (Week 4+)

3a

A/B holdout experiment

Modify the rewards-feature flag (ID 10188193) from 100% rollout to 50/50 split within the eligible segment. Run for 4-6 weeks. This is the only way to definitively prove causality. Requires product/eng buy-in since it means withholding rewards from half the eligible users.

3b

60-day follow-up analysis

Re-run the Phase 2 analysis at the 60-day mark (mid-May). Larger sample sizes, longer observation windows, and new instrumentation data will all improve the analysis quality. Check if early behavioral changes persist or fade (novelty effect).

Decision Framework

How to Interpret the Results

When the analysis is complete, map the findings to one of these four outcomes. Each outcome has a different strategic implication.

Outcome	Evidence Pattern	Strategic Implication
Strong Causal Effect	Post-mission incremental activity is significantly higher than control (p < 0.05). Q1 (inactive) users activate. Effect persists at D30.	Double down. Expand to web, EU, partner channels. Invest in mission design and reward economics. Rewards is a growth lever.
Moderate Causal Effect	Some incremental activity observed but small effect size. Mainly driven by already-active users. Q1 users do not activate meaningfully.	Proceed with caution. Rewards reinforces existing behavior but does not create new behavior. Focus on retention value rather than activation. Optimize mission design toward lower-activity users.
Selection Effect Only	No incremental post-mission activity beyond mission requirements. Treatment and matched control show same post-period behavior. Rewards visitors were already heavy users.	Rewards is a loyalty feature, not a growth feature. It rewards existing behavior but does not change it. Re-frame the ROI case: is the retention uplift among already-engaged users worth the reward cost?
Insufficient Data	Sample sizes too small for statistical significance. Confidence intervals span zero. Cannot reject the null hypothesis.	Wait. Accumulate 60+ days of data. Implement the A/B holdout. Do not make resource allocation decisions on inconclusive evidence.

The honest expectation. Given that the feature is 2 weeks old with ~250 total mission completions, the most likely Phase 1 outcome is "Insufficient Data" for rigorous statistical inference, combined with "directional signals" from individual user audits. Phase 2 and Phase 3 are where the real answers come from. Do not over-interpret early results.