Case 02 / 2024

FX Predict — finding the customers who trade on a schedule.

Thesis: a small slice of FX customers trade on a rhythm you can predict — weekly, biweekly, monthly. Find them, score them, and email two days before their next trade instead of blasting everyone on Tuesday.

Built for a money-transfer company. Synthetic dataset (10,000 trades, 300 customers, 24 months) stands in for the proprietary real one; the method and code are production-faithful.

Python Pandas Coefficient of Variation Cohort analysis LTV

RoleAnalytics · Modeling

Year2024

Data10,000 trades · 300 customers

StackPython · Pandas · SciPy

predictive_transactions.ipynb — jupyter

predictive_transactions.ipynb Python 3 · idle

Predictive Transaction Model

Identifying predictable customers for proactive email engagement

Objective: score each customer's trading cadence, classify into tiers, and trigger an email 2 days before their predicted next trade.

In [1]:

import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy import stats CV_THRESHOLD_HIGH = 0.10 # Highly Predictable CV_THRESHOLD_MOD = 0.20 # Moderately Predictable MIN_TRADES_FOR_PREDICTION = 3

Out[1]:

✅ Configuration loaded Transactions: 10,000 Date range: 2024-01-01 → 2025-12-31 Predictable customer target: 8%

Results first

What the model found.

Of 300 customers in the dataset, 27 turned out to be predictable enough to act on — and they punch well above their weight on revenue. Here's the punchline before the walkthrough:

27 of 300

Predictable customers

9.0% of the base

+39.3%

Revenue lift per customer

vs. the rest of the base

12.5%

Share of annual revenue

$33.2K of $265.7K

$99.7K

3-year revenue at risk

If these customers churn

The output isn't a segment — it's a dated send list. Each predictable customer gets a predicted next-trade date (last trade + their average gap) and an email trigger date two days earlier. Drop the CSV into Salesforce Marketing Cloud, and the campaign runs itself.

The idea

Four ideas, stacked in a row.

Most email marketing sends on the company's calendar. Some customers, though, live on their own rhythm — payday remittance, rent abroad, a monthly treasury sweep. If we can spot that rhythm in the data, we can meet them at their moment instead of ours.

1 · The gap

Measure the days between each customer's trades.

A customer who trades on Jan 5, Jan 12, Jan 19 has two gaps — both 7 days. That list of gaps per person is the raw signal everything else is built on.

2 · The score

Check whether their gaps stay consistent.

Consistent gaps = predictable customer. We use one number — Coefficient of Variation — that goes down as the rhythm tightens. Clock-like scores low; sporadic scores high.

3 · The tier

Sort customers into three buckets.

Highly predictable (CV ≤ 0.10), moderately predictable (0.10–0.20), irregular (everyone else). Two tiers are worth triggering emails for.

4 · The schedule

Email two days before their predicted next trade.

For each predictable customer: next trade = last trade + average gap. Trigger = next trade − 2 days. One row per customer, one CSV, ready for the marketing tool.

Step 01 · Build the dataset

Generate 10,000 realistic FX trades across 300 customers.

Real customer data is proprietary, so I generated a synthetic replica that mirrors it — 300 customers, 10,000 trades, 24-month window, weighted across 20 popular currency corridors (USD→EUR, GBP→USD, etc.) using approximate mid-market rates.

8% of customers are seeded as predictable (weekly, biweekly, or monthly cadence with ±1 day jitter). The other 92% trade at random intervals. The model doesn't know which is which — it has to find the predictable ones from the gap data alone.

Seed with known ground truth — so we can check the model's work

predictive_transactions.ipynb — § 1. Data generation

In [5]:

# 300 customers, 8% seeded with a fixed cadence N_CUSTOMERS = 300 n_predictable = int(N_CUSTOMERS * PREDICTABLE_PCT) cadences = {'weekly': 7, 'biweekly': 14, 'monthly': 30} for cid in predictable_ids: cadence_days = cadences[np.random.choice(list(cadences.keys()))] while current_date <= DATE_END: jitter = int(np.random.choice([-1, 0, 0, 0, 1])) trade_date = current_date + timedelta(days=jitter) # … append trade with corridor, send_amount, revenue … current_date += timedelta(days=cadence_days)

Out[5]:

✅ Total transactions: 10,000 Unique customers: 300 Date range: 2024-01-01 → 2025-12-31 Avg txns per customer: 33.3 Predictable seeded: 24 (8.0%)

transaction_id	customer_id	date	send_amount	sell	buy	rate	revenue
TXN_003001	CUST_0001	2024-11-12	2,475.97	USD	JPY	148.77	37.14
TXN_009478	CUST_0002	2024-01-01	1,626.55	GBP	JPY	189.20	24.40
TXN_009479	CUST_0002	2024-02-10	3,412.80	EUR	CHF	0.96	51.19
TXN_009480	CUST_0002	2024-03-01	6,822.90	USD	AUD	1.53	102.34
TXN_009481	CUST_0002	2024-03-15	2,995.68	USD	AUD	1.53	44.94

Step 02 · Days between trades

For each customer, compute the gap between trades.

Sort every customer's trades by date, then use .diff() to get the days between each one and the next. That list of gaps per person is the raw signal the whole rest of the project sits on top of.

Then per-customer aggregates: mean gap, standard deviation, min, max, total revenue, tenure. One row per customer, ready for scoring.

predictive_transactions.ipynb — § 2. Inter-trade intervals

In [9]:

# Days between consecutive trades per customer df = df.sort_values(['customer_id', 'date']) df['days_since_last_trade'] = df.groupby('customer_id')['date'].diff().dt.days interval_stats = df.dropna(subset=['days_since_last_trade']).groupby('customer_id')['days_since_last_trade'].agg( mean_interval='mean', std_interval='std', median_interval='median', min_interval='min', max_interval='max', ).reset_index()

Out[9]:

✅ Interval statistics calculated for 285 customers with 2+ trades

Step 03 · Score the rhythm

One number per customer: how consistent is their rhythm?

Quick explainer — take a customer's gaps (say, 30, 29, 31, 30 days). Divide how much they wobble (standard deviation) by their average. Small wobble = low score = predictable. Statisticians call this ratio the Coefficient of Variation, or CV.

Compute CV for every customer with at least 3 trades — fewer than that and you can't tell a rhythm from a coincidence. The eligible pool: 278 customers. The CV distribution runs from a tight 0.01 (perfect clockwork) up to 1.55 (chaotic).

Minimum 3 trades. Two points make a line, not a pattern.

predictive_transactions.ipynb — § 2. Predictability scoring

In [9]:

# Coefficient of Variation — lower = more predictable interval_stats['cv'] = interval_stats['std_interval'] / interval_stats['mean_interval'] interval_stats['cv'] = interval_stats['cv'].fillna(0) eligible = customer_stats[customer_stats['total_trades'] >= MIN_TRADES_FOR_PREDICTION] print(eligible['cv'].describe())

Out[9]:

CV Distribution (278 eligible customers) count 278.00 min 0.01 mean 0.86 25% 0.78 std 0.33 50% 0.92 median 0.92 75% 1.04 max 1.55

Fig. 1 · CV distribution. Main mass is chaotic; a tight cluster sits under the thresholds.

Step 04 · Classify into tiers

Three buckets. Two of them are worth emailing.

Apply the thresholds: CV ≤ 0.10 = Highly Predictable, 0.10–0.20 = Moderately Predictable, everything above = Irregular. That lands:

18 Highly Predictable (6.0%) — near-perfect cadence
9 Moderately Predictable (3.0%) — reliable but looser
251 Irregular (83.7%) — too noisy to trigger on
22 Insufficient Data (7.3%) — fewer than 3 trades

Total actionable: 27 customers, 9.0% of the base. And the seeded ground truth was 24 — the model recovered all of them and picked up 3 more irregulars who happened to fall into a tight rhythm by chance.

Two-tier split — high tier gets priority, moderate gets volume

predictive_transactions.ipynb — § 2. Tier classification

In [11]:

def classify_tier(row): if pd.isna(row['cv']) or row['total_trades'] < MIN_TRADES_FOR_PREDICTION: return 'Insufficient Data' elif row['cv'] <= CV_THRESHOLD_HIGH: return 'Highly Predictable' elif row['cv'] <= CV_THRESHOLD_MOD: return 'Moderately Predictable' return 'Irregular' customer_stats['predictability_tier'] = customer_stats.apply(classify_tier, axis=1)

Out[11]:

📊 Predictability Tier Distribution: Irregular 251 ( 83.7%) Insufficient Data 22 ( 7.3%) Highly Predictable 18 ( 6.0%) Moderately Predictable 9 ( 3.0%) Total predictable: 27 (9.0%) 📊 Cadence Breakdown (predictable only): Monthly (~30d) 10 Weekly (~7d) 9 Biweekly (~14d) 6 Other 2

Step 05 · Eyeball the top 10

Do the score's winners actually look predictable?

A low CV should mean "their trades line up on a grid when you plot them." Bars are trades (height = send amount), red dashed line is the predicted next trade date. The pattern is visually clear without needing any stats background — useful when presenting to marketing and ops.

The #1 customer (CUST_0183, monthly cadence) has a CV of 0.02: 24 trades, mean gap 30.00 days. The next trade is predicted for Jan 4, 2026. Trigger the email on Jan 2.

predictive_transactions.ipynb — § 3. Top 10 timelines

In [15]:

top10 = pivot.head(10)['customer_id'].tolist() for cid in top10: cust_txns = df[df['customer_id'] == cid] ax.vlines(cust_txns['date'], 0, cust_txns['send_amount']) ax.axvline(x=cust_info['predicted_next_trade'], color='red', linestyle='--')

Out[15]:

Fig. 2 · Top 10 most predictable customers. Bars = trades. Red dashed = predicted next trade.

Step 06 · Are they worth it?

Predictable customers earn 39.3% more revenue per head.

Being predictable is only useful if they also spend. Joining the revenue data onto the classification:

Average total revenue: $2,339 (predictable) vs $1,679 (all 3+ trade customers) — +39.3%
Average annual revenue: $1,231 vs $906
Average trades per customer: 52 vs 36
Interesting wrinkle: average send amount is actually lower ($2,620 vs $3,841) — they send smaller amounts, but much more often. Volume beats size.

Total annual revenue from predictable customers: $33,249, or 12.5% of the book. Retention on this slice is worth 3× its headcount share.

predictive_transactions.ipynb — § 4. Revenue comparison

In [17]:

comp_df = pd.DataFrame(rows, columns=['Metric', 'Predictable', 'Irregular', 'All (3+ trades)']) lift = (pred_stats['total_revenue'].mean() - all_stats_3plus['total_revenue'].mean()) / all_stats_3plus['total_revenue'].mean() * 100

Out[17]:

Metric	Predictable	Irregular	All (3+ trades)
Count	27	251	278
Avg Total Revenue	$2,339	$1,608	$1,679
Median Total Revenue	$890	$653	$722
Avg Annual Revenue	$1,231	$871	$906
Avg Trades	52.4	34.1	35.9
Avg Send Amount	$2,620	$3,841	$3,722
Avg Tenure (days)	649	639	640

📈 Predictable customer avg revenue is +39.3% vs all customers with 3+ trades

Fig. 3 · 27 predictable customers beat the field on revenue, annual run-rate, and trade volume.

Step 07 · Who they are

Cohort profile — over-indexed corridors point to acquisition.

Where do these predictable customers actually trade? Breaking their transactions down by corridor and comparing to the general population:

AUD→GBP over-indexes 250% — predictable customers use this corridor 2.5× more than the base
CHF→USD over-indexes 218%
EUR→CHF over-indexes 211%
GBP→JPY and USD→NZD both over-index around 160–167%

These are the hallmarks of expat living expenses and recurring remittance — not one-off trades. A Swiss worker paying UK bills. A retiree sending from Europe to Switzerland. That's the acquisition profile to target.

Use the cohort to guide marketing — not just retention

predictive_transactions.ipynb — § 5. Cohort analysis

In [19]:

corr_cmp['over_index'] = (corr_cmp['pct_pred'] / corr_cmp['pct_all'] * 100).round(0) print(corr_cmp.sort_values('over_index', ascending=False).head(10))

Out[19]:

corridor	% pred	% all	over-index	# customers
AUD → GBP	2.0%	0.8%	250	1
CHF → USD	2.4%	1.1%	218	1
EUR → CHF	11.4%	5.4%	211	4
GBP → JPY	4.0%	2.4%	167	2
USD → NZD	8.1%	5.0%	162	4
EUR → JPY	4.0%	2.5%	160	2
USD → EUR	22.6%	16.8%	135	8
USD → CHF	2.8%	2.2%	127	2
USD → JPY	8.8%	7.1%	124	4
EUR → USD	8.7%	8.4%	104	4

Step 08 · The payoff

The output: a dated email trigger list.

This is what the marketing tool actually ingests. One row per predictable customer. For each: their predicted next trade date, their trigger date (2 days earlier), their cadence, their primary corridor, and their expected annual revenue. Sorted by trigger date so the campaign manager can see the week ahead.

Total annual revenue tied to this list: $33,249. Projected 3-year LTV for the moderate tier alone (the weekly shoppers) is $7,182 per customer — 3.7× the highly-predictable tier, because they trade so often. Counter-intuitive but real.

Trigger 2 days early — chosen via production A/B vs same-day

predictive_transactions.ipynb — § 7. Email trigger list

In [23]:

email_list['email_trigger_date'] = email_list['predicted_next_trade'] - timedelta(days=2) email_list = email_list.sort_values('email_trigger_date') email_list.to_csv('email_triggers.csv', index=False)

Out[23]:

customer_id	tier	cadence	CV	predicted next	trigger date	corridor	annual rev
CUST_0171	High	Biweekly	0.07	2025-04-11	2025-04-09	AUD→USD	$499
CUST_0290	Moderate	Weekly	0.12	2026-01-01	2025-12-30	USD→EUR	$4,088
CUST_0238	Moderate	Weekly	0.12	2026-01-02	2025-12-31	EUR→CHF	$8,388
CUST_0234	Moderate	Weekly	0.14	2026-01-02	2025-12-31	CHF→USD	$834
CUST_0149	Moderate	Weekly	0.12	2026-01-02	2025-12-31	USD→GBP	$399
CUST_0166	Moderate	Weekly	0.13	2026-01-02	2025-12-31	GBP→USD	$873
CUST_0183	High	Monthly	0.02	2026-01-04	2026-01-02	USD→EUR	$188
CUST_0058	High	Monthly	0.03	2026-01-23	2026-01-21	GBP→USD	$1,067
CUST_0046	High	Monthly	0.02	2026-01-28	2026-01-26	EUR→USD	$97
… 18 more rows …

✅ email_triggers.csv written — 27 rows, ready for Salesforce Marketing Cloud

Takeaways

What this buys the business.

of customers, targeted

27 of 300

12.5%

of revenue protected

$33.2K annual

+39%

per-customer revenue lift

vs. the rest of the base

$99.7K

3-year LTV at risk

If churn unchecked

The shape of the win: a small, high-value slice of the book that you can now meet at exactly the moment they're about to trade. No extra product, no new channel — just better timing on the email you were already sending.

FX Predict — finding the customers who trade on a schedule.

Predictive Transaction Model

Identifying predictable customers for proactive email engagement

What the model found.

Four ideas, stacked in a row.

Seven steps from raw trades to trigger list.

Generate 10,000 realistic FX trades across 300 customers.

For each customer, compute the gap between trades.

One number per customer: how consistent is their rhythm?

Three buckets. Two of them are worth emailing.

Do the score's winners actually look predictable?

Predictable customers earn 39.3% more revenue per head.

Cohort profile — over-indexed corridors point to acquisition.

The output: a dated email trigger list.

What this buys the business.