Back to Portfolio
Predictive Analytics · Python · Money Transfer

Predictive Transaction
Model

Identifying customers with clockwork transaction patterns in money transfer data to trigger proactive outreach before their next expected trade. Built from real-world experience, recreated with synthetic data that mirrors production volume and corridor distribution.

Python Pandas Scipy Customer Segmentation Predictive Modeling LTV Analysis
The Problem

Batch marketing misses timing

Marketing teams blast the entire customer list at fixed intervals. But customers who transact on predictable schedules — remittances on payday, recurring supplier payments — are best reached at the right moment, not at random.

The Approach

Find the rhythm in the data

Calculate inter-trade intervals for each customer, measure consistency via Coefficient of Variation, predict next transaction date, and trigger an email 2 days before. Simple, effective, production-ready.

Why CV?

Normalizing for cadence

Raw standard deviation doesn't account for frequency. A weekly trader with 1-day std and a monthly trader with 1-day std are equally predictable — CV normalizes by dividing std by mean, making comparison across cadences meaningful.

The Data

10,000 synthetic transactions

300 customers, 42 currency pairs, 24-month window. 8% seeded as predictable with weekly/biweekly/monthly cadences and realistic ±1 day jitter. Mirrors real money transfer volume distribution.

Notebook Contents

Configuration & Data Generation 01

Every key assumption is adjustable at the top of the notebook. In production, you'd tune these constantly. PREDICTABLE_PCT = 0.08 seeds 8% of customers as predictable — conservative but realistic for money transfer data where 5–15% of customers transact on a schedule (payday remittances, recurring supplier payments, expat bill coverage).

MIN_TRADES_FOR_PREDICTION = 3 is the bare minimum to calculate meaningful standard deviation. Two trades gives a single interval with no way to measure consistency. Three gives two intervals — enough to say "this person's spacing is consistent."

Python In [1]
# ══════════════════════════════════════════════════════════
# ADJUSTABLE PARAMETERS — Modify these to tune the model
# ══════════════════════════════════════════════════════════
RANDOM_SEED            = 42
TOTAL_TRANSACTIONS     = 10000
DATE_START             = datetime(2024, 1, 1)
DATE_END               = datetime(2025, 12, 31)  # 24-month window
CURRENCIES             = ['USD', 'EUR', 'JPY', 'GBP', 'CHF', 'AUD', 'NZD']
REVENUE_PCT            = 0.015   # 1.5% of send amount
PREDICTABLE_PCT        = 0.08    # ~8% of customers are predictable
MIN_TRADES_FOR_PREDICTION = 3   # Minimum trades to evaluate
CV_THRESHOLD_HIGH      = 0.10    # CV ≤ this = Highly Predictable
CV_THRESHOLD_MOD       = 0.20    # CV ≤ this = Moderately Predictable
Output
✅ Configuration loaded Transactions: 10,000 Date range: 2024-01-01 → 2025-12-31 Predictable customer target: 8% CV thresholds: High ≤ 0.10, Moderate ≤ 0.20 Min trades for prediction: 3

Data Design: Predictable customers are assigned a cadence (weekly, biweekly, or monthly) and generate trades at that interval with ±1 day jitter, weighted toward zero. Irregular customers get randomly distributed trades with an exponential frequency distribution — creating a realistic long tail where most customers trade a handful of times, but some are very active without being predictable. 42 currency pairs across 7 currencies, with 20 weighted as popular corridors to mirror real-world remittance flows.

CV Calculation & Scoring 02

The analytical core. For each customer, sort trades chronologically, compute days between consecutive trades, then aggregate to get mean interval, standard deviation, and Coefficient of Variation (std / mean). A CV of 0.10 means the standard deviation is only 10% of the average interval — a weekly trader varies by ~0.7 days, a monthly trader by ~3 days. Both are highly predictable in context.

Python In [4]
# Calculate days between consecutive trades
df = df.sort_values(['customer_id', 'date'])
df['days_since_last_trade'] = df.groupby('customer_id')['date'].diff().dt.days

# Customer-level aggregation
customer_stats = df.groupby('customer_id').agg(
    total_trades=('transaction_id', 'count'),
    first_trade=('date', 'min'),
    last_trade=('date', 'max'),
    total_revenue=('revenue', 'sum'),
    avg_send_amount=('send_amount', 'mean'),
).reset_index()

# Coefficient of variation: std / mean
intervals = df.dropna(subset=['days_since_last_trade'])
interval_stats = intervals.groupby('customer_id')['days_since_last_trade'].agg(
    mean_interval='mean', std_interval='std',
    median_interval='median', n_intervals='count'
)
interval_stats['cv'] = interval_stats['std_interval'] / interval_stats['mean_interval']
Output — CV Distribution (customers with 3+ trades)
✅ Interval statistics calculated for 287 customers with 2+ trades CV Distribution (customers with 3+ trades): count 277 mean 0.87 std 0.32 min 0.01 25% 0.78 50% 0.93 75% 1.04 max 1.53

Validation: The CV distribution confirms clean separation — predictable customers cluster below 0.15 while irregular customers spread from 0.3 to 1.5+. The median CV of 0.93 shows most customers are genuinely erratic, which is exactly what you'd expect in real money transfer data.

Predictability Tiers 03

Rather than a binary flag, three tiers plus "Insufficient Data" allow differentiated treatment. A customer with CV=0.05 (monthly trader who's never off by more than a day) gets confident email timing. One with CV=0.18 gets a wider send window or softer call-to-action.

Python In [5]
# Classify tiers based on CV thresholds
def classify_tier(row):
    if pd.isna(row['cv']) or row['total_trades'] < MIN_TRADES_FOR_PREDICTION:
        return 'Insufficient Data'
    elif row['cv'] <= CV_THRESHOLD_HIGH:
        return 'Highly Predictable'
    elif row['cv'] <= CV_THRESHOLD_MOD:
        return 'Moderately Predictable'
    else:
        return 'Irregular'

# Cadence estimation via interval bucketing
# ≤10d = weekly, ≤20d = biweekly, ≤40d = monthly
# Predicted next trade = last_trade + mean_interval
# Email trigger = predicted_next - 2 days
Highly Predictable
6.0%
18 customers with CV ≤ 0.10. Clockwork consistency — these are payday senders and recurring payment customers. Trigger emails 2 days before predicted trade with high confidence.
Moderately Predictable
2.7%
8 customers with CV 0.10–0.20. Slightly looser timing but still actionable. Wider outreach window, softer call-to-action. Often biweekly traders with occasional slippage.
Irregular
83.7%
251 customers with CV > 0.20. Not candidates for timing-based outreach. Better served by event-triggered or promotional campaigns. High frequency ≠ predictability.
Output — Tier Distribution
Cadence Breakdown (Predictable Only)
26 Predictable Customers — 8.7% of Total
Biweekly (~14d)
12
Weekly (~7d)
8
Monthly (~30d)
4
Other
2
Output — Top 10 Most Predictable Customers: Trade Timelines
Trade timelines showing clockwork patterns for top 10 most predictable customers

Visual Proof: Each subplot shows one customer's trade history over 24 months. The evenly-spaced lollipops confirm clockwork cadence — biweekly traders show tight ~14-day gaps, monthly traders hit ~30-day intervals. The red dashed line marks the predicted next trade date. This is the chart that makes the model tangible for stakeholders.

Revenue Comparison 04

Predictable customers aren't just consistent — they're disproportionately valuable. They trade more frequently and accumulate higher total revenue even if individual transaction sizes are similar.

$2,433
Avg Total Revenue
Predictable customers
$1,758
Avg Total Revenue
Irregular customers
58.5
Avg Trades
Predictable customers
33.7
Avg Trades
Irregular customers
Output — Revenue Comparison
MetricPredictableIrregularAll (3+ trades)
Count26251277
Avg Total Revenue$2,433$1,758$1,821
Median Total Revenue$1,245$570$618
Avg Annual Revenue$1,266$944$974
Avg Trades58.533.736.0
Output — Revenue Analysis Charts
Revenue comparison: box plot, bar chart, and send amount histogram

The Insight: Predictable customers represent 8% of the customer base but 12.1% of total revenue. Their median total revenue ($1,245) is more than double the irregular median ($570). The disproportionate contribution is the core argument for investing in proactive retention.

Cohort & Corridor Analysis 05

This shifts from "who are our predictable customers" to "how do we find more of them." Over-indexing analysis compares corridor distribution between predictable and general populations. An over-index of 164 means predictable customers are 64% more concentrated in that corridor than average — actionable for targeted acquisition.

Output — Corridor Over-Index
Corridor% of Pred Trades% of All TradesOver-IndexTotal Revenue
GBP → USD18.0%11.0%164$6,496
EUR → USD16.2%10.7%151$3,666
USD → EUR17.4%14.6%119$8,871
USD → JPY10.1%8.8%115$5,475
USD → GBP10.1%12.6%80$4,756
Output — Cohort Analysis Charts
Cohort analysis: corridor over-indexing, day-of-week patterns, transaction size buckets, acquisition cohorts

Acquisition Signal: GBP→USD and EUR→USD over-index heavily (164 and 151 respectively). If you're buying ads or doing outreach, target people who need to send GBP or EUR to USD recipients — they're statistically more likely to become predictable, high-value customers.

LTV & Revenue at Risk 06

Projected 3-year lifetime value by annualizing observed revenue. A critical finding: moderately predictable customers have dramatically higher LTV ($7,554 over 3 years) compared to highly predictable ($2,131). This seems counterintuitive until you look at cadence — moderately predictable customers skew weekly/biweekly, so they have ~4× the transaction frequency despite slightly higher CV.

Output — LTV by Tier
TierCustomersAvg Annual RevAvg Trades3-Year LTV
Highly Predictable18$71039.6$2,131
Moderately Predictable8$2,518101.1$7,554
Irregular251$94433.7$2,832
Output — LTV Analysis Charts
LTV analysis: tier comparison and corridor breakdown

Prioritization: Start the email program with weekly and biweekly traders. They're slightly harder to predict (CV 0.11–0.15) but represent far more revenue per customer. The highest-value corridor for predictable customers is GBP→EUR at $12,892 projected 3-year LTV — worth aggressive retention investment.

$271K
Annual Revenue (All)
300 customers
$33K
Annual Revenue at Risk
Predictable segment
12.1%
Revenue Concentration
From 8.7% of customers
$99K
3-Year Revenue at Risk
If these customers churn

Executive Dashboard

Four-panel view designed for stakeholder consumption. The pie chart gives the high-level distribution, the CV histogram shows where threshold lines fall, the scatter plot maps predictability against revenue, and the monthly volume chart shows that predictable customer trading volume is steady while irregular customers fluctuate — critical for revenue forecasting.

Output — Executive Dashboard
Executive dashboard: pie chart, CV histogram, predictability vs revenue scatter, monthly volume

Email Trigger List 07

The final deliverable handed to the marketing team. Every predictable customer, sorted by next expected trade date, with an email send date 2 days prior. In production, this feeds into your CRM or marketing automation platform as a dynamic list that recalculates after each new transaction.

Python In [10]
# Predicted next trade = last_trade + mean_interval
# Email trigger = predicted_next - 2 days
customer_stats['predicted_next_trade'] = (
    customer_stats['last_trade'] +
    pd.to_timedelta(customer_stats['mean_interval'], unit='D')
)
customer_stats['email_trigger_date'] = (
    customer_stats['predicted_next_trade'] - timedelta(days=2)
)

# Filter to predictable customers, sort by next trade
email_list = customer_stats[predictable_mask].sort_values('predicted_next_trade')
email_list[['customer_id', 'predictability_tier', 'estimated_cadence',
            'cv', 'mean_interval', 'predicted_next_trade',
            'email_trigger_date']]
Output — Email Trigger List (sample)
CustomerTierCadenceCVMean IntervalPredicted NextEmail Trigger
CUST_0211Highly Pred.Other0.01185d2026-03-192026-03-17
CUST_0246Highly Pred.Monthly0.0330d2026-01-282026-01-26
CUST_0281Highly Pred.Other0.0393d2026-03-142026-03-12
CUST_0157Mod. Pred.Biweekly0.1114d2026-01-092026-01-07
CUST_0093Mod. Pred.Weekly0.147d2026-01-042026-01-02

Production Path: This deliberately simple prediction (last trade + mean interval) is surprisingly effective for this use case because these customers are, by definition, consistent. In production, you might layer on exponential smoothing or ARIMA, but for a triggered email, the mean interval approach handles the 80/20 — correct enough to time the message right, simple enough to explain to stakeholders and maintain without a data science team.

ROI Anchor: $99K in 3-year revenue at risk from the predictable segment. If the email program prevents even 10–20% of churn from these customers, it pays for itself many times over. The email list recalculates dynamically after each new transaction — no ongoing manual work once deployed.