Predictive Transaction Model — Money Transfer Case Study

Configuration & Data Generation 01

Every key assumption is adjustable at the top of the notebook. In production, you'd tune these constantly. PREDICTABLE_PCT = 0.08 seeds 8% of customers as predictable — conservative but realistic for money transfer data where 5–15% of customers transact on a schedule (payday remittances, recurring supplier payments, expat bill coverage).

MIN_TRADES_FOR_PREDICTION = 3 is the bare minimum to calculate meaningful standard deviation. Two trades gives a single interval with no way to measure consistency. Three gives two intervals — enough to say "this person's spacing is consistent."

Python In [1]

# ══════════════════════════════════════════════════════════
# ADJUSTABLE PARAMETERS — Modify these to tune the model
# ══════════════════════════════════════════════════════════
RANDOM_SEED            = 42
TOTAL_TRANSACTIONS     = 10000
DATE_START             = datetime(2024, 1, 1)
DATE_END               = datetime(2025, 12, 31)  # 24-month window
CURRENCIES             = ['USD', 'EUR', 'JPY', 'GBP', 'CHF', 'AUD', 'NZD']
REVENUE_PCT            = 0.015   # 1.5% of send amount
PREDICTABLE_PCT        = 0.08    # ~8% of customers are predictable
MIN_TRADES_FOR_PREDICTION = 3   # Minimum trades to evaluate
CV_THRESHOLD_HIGH      = 0.10    # CV ≤ this = Highly Predictable
CV_THRESHOLD_MOD       = 0.20    # CV ≤ this = Moderately Predictable

Output

✅ Configuration loaded Transactions: 10,000 Date range: 2024-01-01 → 2025-12-31 Predictable customer target: 8% CV thresholds: High ≤ 0.10, Moderate ≤ 0.20 Min trades for prediction: 3

Data Design: Predictable customers are assigned a cadence (weekly, biweekly, or monthly) and generate trades at that interval with ±1 day jitter, weighted toward zero. Irregular customers get randomly distributed trades with an exponential frequency distribution — creating a realistic long tail where most customers trade a handful of times, but some are very active without being predictable. 42 currency pairs across 7 currencies, with 20 weighted as popular corridors to mirror real-world remittance flows.

CV Calculation & Scoring 02

The analytical core. For each customer, sort trades chronologically, compute days between consecutive trades, then aggregate to get mean interval, standard deviation, and Coefficient of Variation (std / mean). A CV of 0.10 means the standard deviation is only 10% of the average interval — a weekly trader varies by ~0.7 days, a monthly trader by ~3 days. Both are highly predictable in context.

Python In [4]

# Calculate days between consecutive trades
df = df.sort_values(['customer_id', 'date'])
df['days_since_last_trade'] = df.groupby('customer_id')['date'].diff().dt.days

# Customer-level aggregation
customer_stats = df.groupby('customer_id').agg(
    total_trades=('transaction_id', 'count'),
    first_trade=('date', 'min'),
    last_trade=('date', 'max'),
    total_revenue=('revenue', 'sum'),
    avg_send_amount=('send_amount', 'mean'),
).reset_index()

# Coefficient of variation: std / mean
intervals = df.dropna(subset=['days_since_last_trade'])
interval_stats = intervals.groupby('customer_id')['days_since_last_trade'].agg(
    mean_interval='mean', std_interval='std',
    median_interval='median', n_intervals='count'
)
interval_stats['cv'] = interval_stats['std_interval'] / interval_stats['mean_interval']

Output — CV Distribution (customers with 3+ trades)

✅ Interval statistics calculated for 287 customers with 2+ trades CV Distribution (customers with 3+ trades): count 277 mean 0.87 std 0.32 min 0.01 25% 0.78 50% 0.93 75% 1.04 max 1.53

Validation: The CV distribution confirms clean separation — predictable customers cluster below 0.15 while irregular customers spread from 0.3 to 1.5+. The median CV of 0.93 shows most customers are genuinely erratic, which is exactly what you'd expect in real money transfer data.

Predictability Tiers 03

Rather than a binary flag, three tiers plus "Insufficient Data" allow differentiated treatment. A customer with CV=0.05 (monthly trader who's never off by more than a day) gets confident email timing. One with CV=0.18 gets a wider send window or softer call-to-action.

Python In [5]

# Classify tiers based on CV thresholds
def classify_tier(row):
    if pd.isna(row['cv']) or row['total_trades'] < MIN_TRADES_FOR_PREDICTION:
        return 'Insufficient Data'
    elif row['cv'] <= CV_THRESHOLD_HIGH:
        return 'Highly Predictable'
    elif row['cv'] <= CV_THRESHOLD_MOD:
        return 'Moderately Predictable'
    else:
        return 'Irregular'

# Cadence estimation via interval bucketing
# ≤10d = weekly, ≤20d = biweekly, ≤40d = monthly
# Predicted next trade = last_trade + mean_interval
# Email trigger = predicted_next - 2 days

Highly Predictable

6.0%

18 customers with CV ≤ 0.10. Clockwork consistency — these are payday senders and recurring payment customers. Trigger emails 2 days before predicted trade with high confidence.

Moderately Predictable

2.7%

8 customers with CV 0.10–0.20. Slightly looser timing but still actionable. Wider outreach window, softer call-to-action. Often biweekly traders with occasional slippage.

Irregular

83.7%

251 customers with CV > 0.20. Not candidates for timing-based outreach. Better served by event-triggered or promotional campaigns. High frequency ≠ predictability.

Output — Tier Distribution

Cadence Breakdown (Predictable Only)

26 Predictable Customers — 8.7% of Total

Biweekly (~14d)

Weekly (~7d)

Monthly (~30d)

Other

Output — Top 10 Most Predictable Customers: Trade Timelines

Trade timelines showing clockwork patterns for top 10 most predictable customers

Visual Proof: Each subplot shows one customer's trade history over 24 months. The evenly-spaced lollipops confirm clockwork cadence — biweekly traders show tight ~14-day gaps, monthly traders hit ~30-day intervals. The red dashed line marks the predicted next trade date. This is the chart that makes the model tangible for stakeholders.

Revenue Comparison 04

Predictable customers aren't just consistent — they're disproportionately valuable. They trade more frequently and accumulate higher total revenue even if individual transaction sizes are similar.

$2,433

Avg Total Revenue

Predictable customers

$1,758

Avg Total Revenue

Irregular customers

58.5

Avg Trades

Predictable customers

33.7

Avg Trades

Irregular customers

Output — Revenue Comparison

Metric	Predictable	Irregular	All (3+ trades)
Count	26	251	277
Avg Total Revenue	$2,433	$1,758	$1,821
Median Total Revenue	$1,245	$570	$618
Avg Annual Revenue	$1,266	$944	$974
Avg Trades	58.5	33.7	36.0

Output — Revenue Analysis Charts

Revenue comparison: box plot, bar chart, and send amount histogram

The Insight: Predictable customers represent 8% of the customer base but 12.1% of total revenue. Their median total revenue ($1,245) is more than double the irregular median ($570). The disproportionate contribution is the core argument for investing in proactive retention.

Cohort & Corridor Analysis 05

This shifts from "who are our predictable customers" to "how do we find more of them." Over-indexing analysis compares corridor distribution between predictable and general populations. An over-index of 164 means predictable customers are 64% more concentrated in that corridor than average — actionable for targeted acquisition.

Output — Corridor Over-Index

Corridor	% of Pred Trades	% of All Trades	Over-Index	Total Revenue
GBP → USD	18.0%	11.0%	164	$6,496
EUR → USD	16.2%	10.7%	151	$3,666
USD → EUR	17.4%	14.6%	119	$8,871
USD → JPY	10.1%	8.8%	115	$5,475
USD → GBP	10.1%	12.6%	80	$4,756

Output — Cohort Analysis Charts

Cohort analysis: corridor over-indexing, day-of-week patterns, transaction size buckets, acquisition cohorts

Acquisition Signal: GBP→USD and EUR→USD over-index heavily (164 and 151 respectively). If you're buying ads or doing outreach, target people who need to send GBP or EUR to USD recipients — they're statistically more likely to become predictable, high-value customers.

LTV & Revenue at Risk 06

Projected 3-year lifetime value by annualizing observed revenue. A critical finding: moderately predictable customers have dramatically higher LTV ($7,554 over 3 years) compared to highly predictable ($2,131). This seems counterintuitive until you look at cadence — moderately predictable customers skew weekly/biweekly, so they have ~4× the transaction frequency despite slightly higher CV.

Output — LTV by Tier

Tier	Customers	Avg Annual Rev	Avg Trades	3-Year LTV
Highly Predictable	18	$710	39.6	$2,131
Moderately Predictable	8	$2,518	101.1	$7,554
Irregular	251	$944	33.7	$2,832

Output — LTV Analysis Charts

LTV analysis: tier comparison and corridor breakdown

Prioritization: Start the email program with weekly and biweekly traders. They're slightly harder to predict (CV 0.11–0.15) but represent far more revenue per customer. The highest-value corridor for predictable customers is GBP→EUR at $12,892 projected 3-year LTV — worth aggressive retention investment.

$271K

Annual Revenue (All)

300 customers

$33K

Annual Revenue at Risk

Predictable segment

12.1%

Revenue Concentration

From 8.7% of customers

$99K

3-Year Revenue at Risk

If these customers churn

Executive Dashboard —

Four-panel view designed for stakeholder consumption. The pie chart gives the high-level distribution, the CV histogram shows where threshold lines fall, the scatter plot maps predictability against revenue, and the monthly volume chart shows that predictable customer trading volume is steady while irregular customers fluctuate — critical for revenue forecasting.

Output — Executive Dashboard

Executive dashboard: pie chart, CV histogram, predictability vs revenue scatter, monthly volume

Email Trigger List 07

The final deliverable handed to the marketing team. Every predictable customer, sorted by next expected trade date, with an email send date 2 days prior. In production, this feeds into your CRM or marketing automation platform as a dynamic list that recalculates after each new transaction.

Python In [10]

# Predicted next trade = last_trade + mean_interval
# Email trigger = predicted_next - 2 days
customer_stats['predicted_next_trade'] = (
    customer_stats['last_trade'] +
    pd.to_timedelta(customer_stats['mean_interval'], unit='D')
)
customer_stats['email_trigger_date'] = (
    customer_stats['predicted_next_trade'] - timedelta(days=2)
)

# Filter to predictable customers, sort by next trade
email_list = customer_stats[predictable_mask].sort_values('predicted_next_trade')
email_list[['customer_id', 'predictability_tier', 'estimated_cadence',
            'cv', 'mean_interval', 'predicted_next_trade',
            'email_trigger_date']]

Output — Email Trigger List (sample)

Customer	Tier	Cadence	CV	Mean Interval	Predicted Next	Email Trigger
CUST_0211	Highly Pred.	Other	0.01	185d	2026-03-19	2026-03-17
CUST_0246	Highly Pred.	Monthly	0.03	30d	2026-01-28	2026-01-26
CUST_0281	Highly Pred.	Other	0.03	93d	2026-03-14	2026-03-12
CUST_0157	Mod. Pred.	Biweekly	0.11	14d	2026-01-09	2026-01-07
CUST_0093	Mod. Pred.	Weekly	0.14	7d	2026-01-04	2026-01-02

Production Path: This deliberately simple prediction (last trade + mean interval) is surprisingly effective for this use case because these customers are, by definition, consistent. In production, you might layer on exponential smoothing or ARIMA, but for a triggered email, the mean interval approach handles the 80/20 — correct enough to time the message right, simple enough to explain to stakeholders and maintain without a data science team.

ROI Anchor: $99K in 3-year revenue at risk from the predictable segment. If the email program prevents even 10–20% of churn from these customers, it pays for itself many times over. The email list recalculates dynamically after each new transaction — no ongoing manual work once deployed.

Predictive Transaction
Model

Batch marketing misses timing

Find the rhythm in the data

Normalizing for cadence

10,000 synthetic transactions

Configuration & Data Generation 01

CV Calculation & Scoring 02

Predictability Tiers 03

Revenue Comparison 04

Cohort & Corridor Analysis 05

LTV & Revenue at Risk 06

Executive Dashboard —

Email Trigger List 07

Predictive TransactionModel

Batch marketing misses timing

Find the rhythm in the data

Normalizing for cadence

10,000 synthetic transactions

Configuration & Data Generation 01

CV Calculation & Scoring 02

Predictability Tiers 03

Revenue Comparison 04

Cohort & Corridor Analysis 05

LTV & Revenue at Risk 06

Executive Dashboard —

Email Trigger List 07

Predictive Transaction
Model