Identifying customers with clockwork transaction patterns in money transfer data to trigger proactive outreach before their next expected trade. Built from real-world experience, recreated with synthetic data that mirrors production volume and corridor distribution.
Marketing teams blast the entire customer list at fixed intervals. But customers who transact on predictable schedules — remittances on payday, recurring supplier payments — are best reached at the right moment, not at random.
Calculate inter-trade intervals for each customer, measure consistency via Coefficient of Variation, predict next transaction date, and trigger an email 2 days before. Simple, effective, production-ready.
Raw standard deviation doesn't account for frequency. A weekly trader with 1-day std and a monthly trader with 1-day std are equally predictable — CV normalizes by dividing std by mean, making comparison across cadences meaningful.
300 customers, 42 currency pairs, 24-month window. 8% seeded as predictable with weekly/biweekly/monthly cadences and realistic ±1 day jitter. Mirrors real money transfer volume distribution.
Every key assumption is adjustable at the top of the notebook. In production, you'd tune these constantly. PREDICTABLE_PCT = 0.08 seeds 8% of customers as predictable — conservative but realistic for money transfer data where 5–15% of customers transact on a schedule (payday remittances, recurring supplier payments, expat bill coverage).
MIN_TRADES_FOR_PREDICTION = 3 is the bare minimum to calculate meaningful standard deviation. Two trades gives a single interval with no way to measure consistency. Three gives two intervals — enough to say "this person's spacing is consistent."
# ══════════════════════════════════════════════════════════ # ADJUSTABLE PARAMETERS — Modify these to tune the model # ══════════════════════════════════════════════════════════ RANDOM_SEED = 42 TOTAL_TRANSACTIONS = 10000 DATE_START = datetime(2024, 1, 1) DATE_END = datetime(2025, 12, 31) # 24-month window CURRENCIES = ['USD', 'EUR', 'JPY', 'GBP', 'CHF', 'AUD', 'NZD'] REVENUE_PCT = 0.015 # 1.5% of send amount PREDICTABLE_PCT = 0.08 # ~8% of customers are predictable MIN_TRADES_FOR_PREDICTION = 3 # Minimum trades to evaluate CV_THRESHOLD_HIGH = 0.10 # CV ≤ this = Highly Predictable CV_THRESHOLD_MOD = 0.20 # CV ≤ this = Moderately Predictable
Data Design: Predictable customers are assigned a cadence (weekly, biweekly, or monthly) and generate trades at that interval with ±1 day jitter, weighted toward zero. Irregular customers get randomly distributed trades with an exponential frequency distribution — creating a realistic long tail where most customers trade a handful of times, but some are very active without being predictable. 42 currency pairs across 7 currencies, with 20 weighted as popular corridors to mirror real-world remittance flows.
The analytical core. For each customer, sort trades chronologically, compute days between consecutive trades, then aggregate to get mean interval, standard deviation, and Coefficient of Variation (std / mean). A CV of 0.10 means the standard deviation is only 10% of the average interval — a weekly trader varies by ~0.7 days, a monthly trader by ~3 days. Both are highly predictable in context.
# Calculate days between consecutive trades df = df.sort_values(['customer_id', 'date']) df['days_since_last_trade'] = df.groupby('customer_id')['date'].diff().dt.days # Customer-level aggregation customer_stats = df.groupby('customer_id').agg( total_trades=('transaction_id', 'count'), first_trade=('date', 'min'), last_trade=('date', 'max'), total_revenue=('revenue', 'sum'), avg_send_amount=('send_amount', 'mean'), ).reset_index() # Coefficient of variation: std / mean intervals = df.dropna(subset=['days_since_last_trade']) interval_stats = intervals.groupby('customer_id')['days_since_last_trade'].agg( mean_interval='mean', std_interval='std', median_interval='median', n_intervals='count' ) interval_stats['cv'] = interval_stats['std_interval'] / interval_stats['mean_interval']
Validation: The CV distribution confirms clean separation — predictable customers cluster below 0.15 while irregular customers spread from 0.3 to 1.5+. The median CV of 0.93 shows most customers are genuinely erratic, which is exactly what you'd expect in real money transfer data.
Rather than a binary flag, three tiers plus "Insufficient Data" allow differentiated treatment. A customer with CV=0.05 (monthly trader who's never off by more than a day) gets confident email timing. One with CV=0.18 gets a wider send window or softer call-to-action.
# Classify tiers based on CV thresholds def classify_tier(row): if pd.isna(row['cv']) or row['total_trades'] < MIN_TRADES_FOR_PREDICTION: return 'Insufficient Data' elif row['cv'] <= CV_THRESHOLD_HIGH: return 'Highly Predictable' elif row['cv'] <= CV_THRESHOLD_MOD: return 'Moderately Predictable' else: return 'Irregular' # Cadence estimation via interval bucketing # ≤10d = weekly, ≤20d = biweekly, ≤40d = monthly # Predicted next trade = last_trade + mean_interval # Email trigger = predicted_next - 2 days
Visual Proof: Each subplot shows one customer's trade history over 24 months. The evenly-spaced lollipops confirm clockwork cadence — biweekly traders show tight ~14-day gaps, monthly traders hit ~30-day intervals. The red dashed line marks the predicted next trade date. This is the chart that makes the model tangible for stakeholders.
Predictable customers aren't just consistent — they're disproportionately valuable. They trade more frequently and accumulate higher total revenue even if individual transaction sizes are similar.
| Metric | Predictable | Irregular | All (3+ trades) |
|---|---|---|---|
| Count | 26 | 251 | 277 |
| Avg Total Revenue | $2,433 | $1,758 | $1,821 |
| Median Total Revenue | $1,245 | $570 | $618 |
| Avg Annual Revenue | $1,266 | $944 | $974 |
| Avg Trades | 58.5 | 33.7 | 36.0 |
The Insight: Predictable customers represent 8% of the customer base but 12.1% of total revenue. Their median total revenue ($1,245) is more than double the irregular median ($570). The disproportionate contribution is the core argument for investing in proactive retention.
This shifts from "who are our predictable customers" to "how do we find more of them." Over-indexing analysis compares corridor distribution between predictable and general populations. An over-index of 164 means predictable customers are 64% more concentrated in that corridor than average — actionable for targeted acquisition.
| Corridor | % of Pred Trades | % of All Trades | Over-Index | Total Revenue |
|---|---|---|---|---|
| GBP → USD | 18.0% | 11.0% | 164 | $6,496 |
| EUR → USD | 16.2% | 10.7% | 151 | $3,666 |
| USD → EUR | 17.4% | 14.6% | 119 | $8,871 |
| USD → JPY | 10.1% | 8.8% | 115 | $5,475 |
| USD → GBP | 10.1% | 12.6% | 80 | $4,756 |
Acquisition Signal: GBP→USD and EUR→USD over-index heavily (164 and 151 respectively). If you're buying ads or doing outreach, target people who need to send GBP or EUR to USD recipients — they're statistically more likely to become predictable, high-value customers.
Projected 3-year lifetime value by annualizing observed revenue. A critical finding: moderately predictable customers have dramatically higher LTV ($7,554 over 3 years) compared to highly predictable ($2,131). This seems counterintuitive until you look at cadence — moderately predictable customers skew weekly/biweekly, so they have ~4× the transaction frequency despite slightly higher CV.
| Tier | Customers | Avg Annual Rev | Avg Trades | 3-Year LTV |
|---|---|---|---|---|
| Highly Predictable | 18 | $710 | 39.6 | $2,131 |
| Moderately Predictable | 8 | $2,518 | 101.1 | $7,554 |
| Irregular | 251 | $944 | 33.7 | $2,832 |
Prioritization: Start the email program with weekly and biweekly traders. They're slightly harder to predict (CV 0.11–0.15) but represent far more revenue per customer. The highest-value corridor for predictable customers is GBP→EUR at $12,892 projected 3-year LTV — worth aggressive retention investment.
Four-panel view designed for stakeholder consumption. The pie chart gives the high-level distribution, the CV histogram shows where threshold lines fall, the scatter plot maps predictability against revenue, and the monthly volume chart shows that predictable customer trading volume is steady while irregular customers fluctuate — critical for revenue forecasting.
The final deliverable handed to the marketing team. Every predictable customer, sorted by next expected trade date, with an email send date 2 days prior. In production, this feeds into your CRM or marketing automation platform as a dynamic list that recalculates after each new transaction.
# Predicted next trade = last_trade + mean_interval # Email trigger = predicted_next - 2 days customer_stats['predicted_next_trade'] = ( customer_stats['last_trade'] + pd.to_timedelta(customer_stats['mean_interval'], unit='D') ) customer_stats['email_trigger_date'] = ( customer_stats['predicted_next_trade'] - timedelta(days=2) ) # Filter to predictable customers, sort by next trade email_list = customer_stats[predictable_mask].sort_values('predicted_next_trade') email_list[['customer_id', 'predictability_tier', 'estimated_cadence', 'cv', 'mean_interval', 'predicted_next_trade', 'email_trigger_date']]
| Customer | Tier | Cadence | CV | Mean Interval | Predicted Next | Email Trigger |
|---|---|---|---|---|---|---|
| CUST_0211 | Highly Pred. | Other | 0.01 | 185d | 2026-03-19 | 2026-03-17 |
| CUST_0246 | Highly Pred. | Monthly | 0.03 | 30d | 2026-01-28 | 2026-01-26 |
| CUST_0281 | Highly Pred. | Other | 0.03 | 93d | 2026-03-14 | 2026-03-12 |
| CUST_0157 | Mod. Pred. | Biweekly | 0.11 | 14d | 2026-01-09 | 2026-01-07 |
| CUST_0093 | Mod. Pred. | Weekly | 0.14 | 7d | 2026-01-04 | 2026-01-02 |
Production Path: This deliberately simple prediction (last trade + mean interval) is surprisingly effective for this use case because these customers are, by definition, consistent. In production, you might layer on exponential smoothing or ARIMA, but for a triggered email, the mean interval approach handles the 80/20 — correct enough to time the message right, simple enough to explain to stakeholders and maintain without a data science team.
ROI Anchor: $99K in 3-year revenue at risk from the predictable segment. If the email program prevents even 10–20% of churn from these customers, it pays for itself many times over. The email list recalculates dynamically after each new transaction — no ongoing manual work once deployed.