Index
Case 02 / 2024

FX Predict — finding the customers who trade on a schedule.

Thesis: a small slice of FX customers trade on a rhythm you can predict — weekly, biweekly, monthly. Find them, score them, and email two days before their next trade instead of blasting everyone on Tuesday.

Built for a money-transfer company. Synthetic dataset (10,000 trades, 300 customers, 24 months) stands in for the proprietary real one; the method and code are production-faithful.

Python Pandas Coefficient of Variation Cohort analysis LTV
RoleAnalytics · Modeling
Year2024
Data10,000 trades · 300 customers
StackPython · Pandas · SciPy
predictive_transactions.ipynb — jupyter
predictive_transactions.ipynb File · Edit · View · Run · Kernel Python 3 · idle
·

Predictive Transaction Model

Identifying predictable customers for proactive email engagement

Objective: score each customer's trading cadence, classify into tiers, and trigger an email 2 days before their predicted next trade.

In [1]:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy import stats CV_THRESHOLD_HIGH = 0.10 # Highly Predictable CV_THRESHOLD_MOD = 0.20 # Moderately Predictable MIN_TRADES_FOR_PREDICTION = 3
Out[1]:
✅ Configuration loaded Transactions: 10,000 Date range: 2024-01-01 → 2025-12-31 Predictable customer target: 8%
Results first

What the model found.

Of 300 customers in the dataset, 27 turned out to be predictable enough to act on — and they punch well above their weight on revenue. Here's the punchline before the walkthrough:

27 of 300
Predictable customers
9.0% of the base
+39.3%
Revenue lift per customer
vs. the rest of the base
12.5%
Share of annual revenue
$33.2K of $265.7K
$99.7K
3-year revenue at risk
If these customers churn

The output isn't a segment — it's a dated send list. Each predictable customer gets a predicted next-trade date (last trade + their average gap) and an email trigger date two days earlier. Drop the CSV into Salesforce Marketing Cloud, and the campaign runs itself.

The idea

Four ideas, stacked in a row.

Most email marketing sends on the company's calendar. Some customers, though, live on their own rhythm — payday remittance, rent abroad, a monthly treasury sweep. If we can spot that rhythm in the data, we can meet them at their moment instead of ours.

1 · The gap
Measure the days between each customer's trades.
A customer who trades on Jan 5, Jan 12, Jan 19 has two gaps — both 7 days. That list of gaps per person is the raw signal everything else is built on.
2 · The score
Check whether their gaps stay consistent.
Consistent gaps = predictable customer. We use one number — Coefficient of Variation — that goes down as the rhythm tightens. Clock-like scores low; sporadic scores high.
3 · The tier
Sort customers into three buckets.
Highly predictable (CV ≤ 0.10), moderately predictable (0.10–0.20), irregular (everyone else). Two tiers are worth triggering emails for.
4 · The schedule
Email two days before their predicted next trade.
For each predictable customer: next trade = last trade + average gap. Trigger = next trade − 2 days. One row per customer, one CSV, ready for the marketing tool.
Method walkthrough

Seven steps from raw trades to trigger list.

Each block below is a faithful rebuild of a cell from the notebook. Code is real; the note next to it says why the step matters in plain English.

Step 01 · Build the dataset

Generate 10,000 realistic FX trades across 300 customers.

Real customer data is proprietary, so I generated a synthetic replica that mirrors it — 300 customers, 10,000 trades, 24-month window, weighted across 20 popular currency corridors (USD→EUR, GBP→USD, etc.) using approximate mid-market rates.

8% of customers are seeded as predictable (weekly, biweekly, or monthly cadence with ±1 day jitter). The other 92% trade at random intervals. The model doesn't know which is which — it has to find the predictable ones from the gap data alone.

Seed with known ground truth — so we can check the model's work
predictive_transactions.ipynb — § 1. Data generation
In [5]:
# 300 customers, 8% seeded with a fixed cadence N_CUSTOMERS = 300 n_predictable = int(N_CUSTOMERS * PREDICTABLE_PCT) cadences = {'weekly': 7, 'biweekly': 14, 'monthly': 30} for cid in predictable_ids: cadence_days = cadences[np.random.choice(list(cadences.keys()))] while current_date <= DATE_END: jitter = int(np.random.choice([-1, 0, 0, 0, 1])) trade_date = current_date + timedelta(days=jitter) # … append trade with corridor, send_amount, revenue … current_date += timedelta(days=cadence_days)
Out[5]:
✅ Total transactions: 10,000 Unique customers: 300 Date range: 2024-01-01 → 2025-12-31 Avg txns per customer: 33.3 Predictable seeded: 24 (8.0%)
transaction_idcustomer_iddatesend_amountsellbuyraterevenue
TXN_003001CUST_00012024-11-122,475.97USDJPY148.7737.14
TXN_009478CUST_00022024-01-011,626.55GBPJPY189.2024.40
TXN_009479CUST_00022024-02-103,412.80EURCHF0.9651.19
TXN_009480CUST_00022024-03-016,822.90USDAUD1.53102.34
TXN_009481CUST_00022024-03-152,995.68USDAUD1.5344.94
Step 02 · Days between trades

For each customer, compute the gap between trades.

Sort every customer's trades by date, then use .diff() to get the days between each one and the next. That list of gaps per person is the raw signal the whole rest of the project sits on top of.

Then per-customer aggregates: mean gap, standard deviation, min, max, total revenue, tenure. One row per customer, ready for scoring.

predictive_transactions.ipynb — § 2. Inter-trade intervals
In [9]:
# Days between consecutive trades per customer df = df.sort_values(['customer_id', 'date']) df['days_since_last_trade'] = df.groupby('customer_id')['date'].diff().dt.days interval_stats = df.dropna(subset=['days_since_last_trade']).groupby('customer_id')['days_since_last_trade'].agg( mean_interval='mean', std_interval='std', median_interval='median', min_interval='min', max_interval='max', ).reset_index()
Out[9]:
✅ Interval statistics calculated for 285 customers with 2+ trades
Step 03 · Score the rhythm

One number per customer: how consistent is their rhythm?

Quick explainer — take a customer's gaps (say, 30, 29, 31, 30 days). Divide how much they wobble (standard deviation) by their average. Small wobble = low score = predictable. Statisticians call this ratio the Coefficient of Variation, or CV.

Compute CV for every customer with at least 3 trades — fewer than that and you can't tell a rhythm from a coincidence. The eligible pool: 278 customers. The CV distribution runs from a tight 0.01 (perfect clockwork) up to 1.55 (chaotic).

Minimum 3 trades. Two points make a line, not a pattern.
predictive_transactions.ipynb — § 2. Predictability scoring
In [9]:
# Coefficient of Variation — lower = more predictable interval_stats['cv'] = interval_stats['std_interval'] / interval_stats['mean_interval'] interval_stats['cv'] = interval_stats['cv'].fillna(0) eligible = customer_stats[customer_stats['total_trades'] >= MIN_TRADES_FOR_PREDICTION] print(eligible['cv'].describe())
Out[9]:
CV Distribution (278 eligible customers) count 278.00 min 0.01 mean 0.86 25% 0.78 std 0.33 50% 0.92 median 0.92 75% 1.04 max 1.55
CV distribution (278 eligible customers) CV ≤ 0.10 Highly predictable CV ≤ 0.20 Moderately predictable 0.0 0.9 (median) 1.55
Fig. 1 · CV distribution. Main mass is chaotic; a tight cluster sits under the thresholds.
Step 04 · Classify into tiers

Three buckets. Two of them are worth emailing.

Apply the thresholds: CV ≤ 0.10 = Highly Predictable, 0.10–0.20 = Moderately Predictable, everything above = Irregular. That lands:

Total actionable: 27 customers, 9.0% of the base. And the seeded ground truth was 24 — the model recovered all of them and picked up 3 more irregulars who happened to fall into a tight rhythm by chance.

Two-tier split — high tier gets priority, moderate gets volume
predictive_transactions.ipynb — § 2. Tier classification
In [11]:
def classify_tier(row): if pd.isna(row['cv']) or row['total_trades'] < MIN_TRADES_FOR_PREDICTION: return 'Insufficient Data' elif row['cv'] <= CV_THRESHOLD_HIGH: return 'Highly Predictable' elif row['cv'] <= CV_THRESHOLD_MOD: return 'Moderately Predictable' return 'Irregular' customer_stats['predictability_tier'] = customer_stats.apply(classify_tier, axis=1)
Out[11]:
📊 Predictability Tier Distribution: Irregular 251 ( 83.7%) Insufficient Data 22 ( 7.3%) Highly Predictable 18 ( 6.0%) Moderately Predictable 9 ( 3.0%) Total predictable: 27 (9.0%) 📊 Cadence Breakdown (predictable only): Monthly (~30d) 10 Weekly (~7d) 9 Biweekly (~14d) 6 Other 2
Step 05 · Eyeball the top 10

Do the score's winners actually look predictable?

A low CV should mean "their trades line up on a grid when you plot them." Bars are trades (height = send amount), red dashed line is the predicted next trade date. The pattern is visually clear without needing any stats background — useful when presenting to marketing and ops.

The #1 customer (CUST_0183, monthly cadence) has a CV of 0.02: 24 trades, mean gap 30.00 days. The next trade is predicted for Jan 4, 2026. Trigger the email on Jan 2.

predictive_transactions.ipynb — § 3. Top 10 timelines
In [15]:
top10 = pivot.head(10)['customer_id'].tolist() for cid in top10: cust_txns = df[df['customer_id'] == cid] ax.vlines(cust_txns['date'], 0, cust_txns['send_amount']) ax.axvline(x=cust_info['predicted_next_trade'], color='red', linestyle='--')
Out[15]:
CUST_0183 · Monthly · CV=0.02nextCUST_0046 · Monthly · CV=0.02nextCUST_0197 · Monthly · CV=0.03nextCUST_0267 · Monthly · CV=0.03nextCUST_0058 · Monthly · CV=0.03nextCUST_0110 · Monthly · CV=0.03nextCUST_0227 · Monthly · CV=0.03nextCUST_0290 · Weekly · CV=0.12nextCUST_0238 · Weekly · CV=0.12nextCUST_0234 · Weekly · CV=0.14next
Fig. 2 · Top 10 most predictable customers. Bars = trades. Red dashed = predicted next trade.
Step 06 · Are they worth it?

Predictable customers earn 39.3% more revenue per head.

Being predictable is only useful if they also spend. Joining the revenue data onto the classification:

Total annual revenue from predictable customers: $33,249, or 12.5% of the book. Retention on this slice is worth 3× its headcount share.

predictive_transactions.ipynb — § 4. Revenue comparison
In [17]:
comp_df = pd.DataFrame(rows, columns=['Metric', 'Predictable', 'Irregular', 'All (3+ trades)']) lift = (pred_stats['total_revenue'].mean() - all_stats_3plus['total_revenue'].mean()) / all_stats_3plus['total_revenue'].mean() * 100
Out[17]:
MetricPredictableIrregularAll (3+ trades)
Count27251278
Avg Total Revenue$2,339$1,608$1,679
Median Total Revenue$890$653$722
Avg Annual Revenue$1,231$871$906
Avg Trades52.434.135.9
Avg Send Amount$2,620$3,841$3,722
Avg Tenure (days)649639640
📈 Predictable customer avg revenue is +39.3% vs all customers with 3+ trades
Avg total revenue ($) $2,339 $1,679 Predictable All (3+) Avg annual revenue ($) $1,231 $906 Predictable All (3+) Avg trades per customer 52.4 35.9 Predictable All (3+)
Fig. 3 · 27 predictable customers beat the field on revenue, annual run-rate, and trade volume.
Step 07 · Who they are

Cohort profile — over-indexed corridors point to acquisition.

Where do these predictable customers actually trade? Breaking their transactions down by corridor and comparing to the general population:

These are the hallmarks of expat living expenses and recurring remittance — not one-off trades. A Swiss worker paying UK bills. A retiree sending from Europe to Switzerland. That's the acquisition profile to target.

Use the cohort to guide marketing — not just retention
predictive_transactions.ipynb — § 5. Cohort analysis
In [19]:
corr_cmp['over_index'] = (corr_cmp['pct_pred'] / corr_cmp['pct_all'] * 100).round(0) print(corr_cmp.sort_values('over_index', ascending=False).head(10))
Out[19]:
corridor% pred% allover-index# customers
AUD → GBP2.0%0.8%2501
CHF → USD2.4%1.1%2181
EUR → CHF11.4%5.4%2114
GBP → JPY4.0%2.4%1672
USD → NZD8.1%5.0%1624
EUR → JPY4.0%2.5%1602
USD → EUR22.6%16.8%1358
USD → CHF2.8%2.2%1272
USD → JPY8.8%7.1%1244
EUR → USD8.7%8.4%1044
Step 08 · The payoff

The output: a dated email trigger list.

This is what the marketing tool actually ingests. One row per predictable customer. For each: their predicted next trade date, their trigger date (2 days earlier), their cadence, their primary corridor, and their expected annual revenue. Sorted by trigger date so the campaign manager can see the week ahead.

Total annual revenue tied to this list: $33,249. Projected 3-year LTV for the moderate tier alone (the weekly shoppers) is $7,182 per customer — 3.7× the highly-predictable tier, because they trade so often. Counter-intuitive but real.

Trigger 2 days early — chosen via production A/B vs same-day
predictive_transactions.ipynb — § 7. Email trigger list
In [23]:
email_list['email_trigger_date'] = email_list['predicted_next_trade'] - timedelta(days=2) email_list = email_list.sort_values('email_trigger_date') email_list.to_csv('email_triggers.csv', index=False)
Out[23]:
customer_idtiercadenceCVpredicted nexttrigger datecorridorannual rev
CUST_0171HighBiweekly0.072025-04-112025-04-09AUD→USD$499
CUST_0290ModerateWeekly0.122026-01-012025-12-30USD→EUR$4,088
CUST_0238ModerateWeekly0.122026-01-022025-12-31EUR→CHF$8,388
CUST_0234ModerateWeekly0.142026-01-022025-12-31CHF→USD$834
CUST_0149ModerateWeekly0.122026-01-022025-12-31USD→GBP$399
CUST_0166ModerateWeekly0.132026-01-022025-12-31GBP→USD$873
CUST_0183HighMonthly0.022026-01-042026-01-02USD→EUR$188
CUST_0058HighMonthly0.032026-01-232026-01-21GBP→USD$1,067
CUST_0046HighMonthly0.022026-01-282026-01-26EUR→USD$97
… 18 more rows …
✅ email_triggers.csv written — 27 rows, ready for Salesforce Marketing Cloud
Takeaways

What this buys the business.

9%
of customers, targeted
27 of 300
12.5%
of revenue protected
$33.2K annual
+39%
per-customer revenue lift
vs. the rest of the base
$99.7K
3-year LTV at risk
If churn unchecked

The shape of the win: a small, high-value slice of the book that you can now meet at exactly the moment they're about to trade. No extra product, no new channel — just better timing on the email you were already sending.