The CoT Backtest: 20 Years of Verified Positioning Signals
A rigorous 20-year backtest of CFTC Commitment of Traders signals across 10 markets. Hit rates, bootstrap confidence intervals, and the conditions that make the contrarian read actually work.
Before You Read Any Further
This page is educational research, not investment advice. The hit rates, Sharpes, edges, and forward-direction phrases below describe what the underlying did historically when CFTC positioning sat at similar conditions — they are not predictions about what it will do next time. Past performance does not guarantee or even reliably predict future results. Even with bootstrap confidence intervals, every backtest carries residual survivorship, look-ahead, and selection risk. Nothing on this page constitutes a recommendation to buy, sell, or hold any security. If you act on positioning data, do so at your own risk and consult a licensed financial advisor first.
Why We Ran This Backtest
The textbook story on CoT positioning is simple: when speculators are at a 3-year extreme, they're wrong. Crowded long means a top is in; crowded short means a bottom is in. It's been repeated in macro commentary for decades. We wanted to know whether it's actually true. The answer turns out to be 'sometimes' — the contrarian read works cleanly in some markets, fails completely in others, and is supercharged by conditions almost no commentator mentions. This page is the full audit: methodology, results, and the three findings that genuinely change how to read a CoT report.

Free charts, alerts, and screeners for every asset discussed on this page. Used by 50M+ traders.
Methodology
We pull the full CFTC archive for 10 actively traded futures contracts from 2006 to 2026. For each weekly release we compute the speculator group's net position ranked 0–100 against its own trailing 3-year range — the standard COT Index. A signal fires when the index enters a tail (≤ 20 = extreme short, ≥ 80 = extreme long), but only on the first week of entry — sitting at the extreme for eight weeks counts as one signal, not eight, eliminating auto-correlation that inflates naive backtests. We then look up the underlying asset's price on the report date and at +1W, +4W, +12W, +26W, and +52W trading days, computing the forward return and whether it moved in the textbook contrarian direction. Statistical significance is decided by a 1,000-sample bootstrap on the Sharpe ratio: a signal counts as 'real' only if the 95% confidence interval excludes zero. Everything is split by macro regime (risk-on vs risk-off) using the Alphameter's episode-aware classification, which inherits the ongoing directional episode through transient neutral days — so the conditioning is effectively a two-bucket test, the form most useful for trading the contrarian setups.
| Step | What we control for |
|---|---|
| Entry-only filter | Eliminates auto-correlation from consecutive weeks at extreme |
| Bootstrap 95% CI on Sharpe | Distinguishes real signal from sampling luck |
| Regime conditioning | Tests whether signal works in risk-on vs risk-off macro |
| Multi-horizon scoring | 1W, 4W, 12W, 26W, 52W — real signals decay coherently |
| 10-market basket | Cross-market alignment tests for solo vs broad capitulation |
Finding 1: The Textbook Read Fails on Half the Markets
Across 10 markets and 20 years, only a few of the textbook contrarian signals survive statistical scrutiny. Nasdaq 100 cotIndex ≤ 20 (specs at extreme shorts) bounces 78% of the time over 12 weeks at Sharpe 0.51 — that's a real signal. 10Y Treasury at extreme shorts has even better statistics (Sharpe 0.68, 77% hit). But gold fade-the-longs is brutal: at cotIndex ≥ 80, gold rallies an average of 4% over the next 12 weeks against the textbook view, with Sharpe -0.57. By 52 weeks it's down 20% against the fade. Bitcoin, copper, silver, EUR/USD, JPY, and WTI crude all classify as noise at the level — their forward returns are no different from random when positioning is at an extreme. Most of macro Twitter is reading positioning wrong, half the time.
| Market | Direction | 12W Sharpe | 95% CI | Verdict |
|---|---|---|---|---|
| Nasdaq 100 | Buy shorts (≤20) | 0.51 | [0.22, 0.95] | SIGNAL |
| 10Y Treasury | Buy shorts (≤20) | 0.68 | [0.39, 1.14] | SIGNAL |
| Gold | Fade longs (≥80) | -0.57 | [-1.01, -0.25] | ANTI-SIGNAL — DON'T FADE |
| NDX | Fade longs (≥80) | -0.37 | [-0.80, +0.04] | ANTI (mild) |
| Bitcoin | Either direction | ≈0 | wide | NOISE at level |
| EUR, JPY, Silver, Copper | Either | ≈0 | wide | NOISE at level |
Finding 2: Open Interest Direction Supercharges the Signal
The single most overlooked variable in CoT analysis is what open interest is doing. When NDX cotIndex hits ≤ 20 with OI contracting over the prior 13 weeks — meaning specs are unwinding without new positions arriving — the 12-week bounce rate is 100% across all 26 historical entries, with Sharpe 1.44 (95% CI [1.16, 1.92]). When OI is expanding instead (specs piling into new shorts despite price), the same threshold produces Sharpe 0.02 — pure noise. Same logic strengthens the anti-signals: gold's already-bad fade gets worse with contracting OI (Sharpe -1.39, 10% hit rate), and natural gas extreme shorts with contracting OI hit 0% — the underlying never bounces, falling -15% on average. We call the favorable subset 'confirmed washout' on the dashboard — extreme positioning plus OI capitulation is institutional-grade, not 'macro Twitter'-grade.
| Signal | All entries | OI contracting | OI expanding |
|---|---|---|---|
| NDX buy shorts | Sharpe 0.51, hit 78% | Sharpe 1.44, hit 100%, N=26 | Sharpe 0.02, noise |
| 10Y buy shorts | Sharpe 0.68, hit 77% | Sharpe 0.74, hit 80%, N=20 | Sharpe 0.46, weaker |
| Gold fade longs | Sharpe -0.57 | Sharpe -1.39, hit 10% | Sharpe -0.42 (still anti) |
| NatGas buy shorts | Sharpe -0.44 (anti) | Sharpe -1.63, hit 0% | Sharpe 0, noise |
Finding 3: Smart-Money Counter-Positioning Is the Final Filter
CFTC reports both sides of every futures market. When speculators are crowded short, somebody has to be net long against them. That somebody is the commercial group — dealers in financial futures, producers and merchants in commodities. They're the genuine smart money, with information edges and balance sheets that specs don't have. When NDX specs are at extreme shorts AND dealers are simultaneously at their own 3-year extreme long, the historical hit rate jumps to 100% over 12 weeks with Sharpe 1.79 (N=15). That's the highest single-condition Sharpe in the entire backtest. Without commercial confirmation, NDX shorts still work but much more weakly (Sharpe 0.28). 10Y Treasury shows the same pattern at smaller N. Gold doesn't benefit — its commercials are mostly hedging miners, not taking discretionary positions — confirming that gold positioning is structurally different from financials.
Finding 4: Alternative Trader Categories Unlock Hidden Signals
Every CFTC report breaks positioning down into multiple trader categories, but the standard backtest only looks at one — Leveraged Funds in financials, Managed Money in commodities. We tested whether the slower-moving institutional categories carry signal where the primary spec doesn't: Asset Managers in financials, Other Reportables in commodities. The result was bigger than expected. Gold via Other Reportables runs a buy-shorts signal at Sharpe 0.71 (95% CI [0.43, 1.04]) with 74% 12-week hit rate — the same threshold via Managed Money is statistical noise. Silver, previously marked as noise across the board, surfaces a real Other Reportables buy signal at Sharpe 0.36. Natural Gas, an anti-signal market under Managed Money, runs a verified fade-longs signal via Other Reportables at Sharpe 0.73 with 81% 12-week hit rate. On the equity/crypto side, Asset Managers add anti-signals the Leveraged Funds read misses — NDX Asset Managers at extreme longs has a 16% 12-week hit rate, and Bitcoin Asset Managers at extreme longs precedes an average -18% drop over 12 weeks. The takeaway: the textbook 'speculator' category is not always the right one for each market. Three previously-noise markets actually carry verified signal under a different proxy.
| Market | Alt actor | Direction | Sharpe (95% CI) | Hit rate |
|---|---|---|---|---|
| Gold | Other Reportables | Buy shorts | 0.71 [0.43, 1.04] | 74%, N=34 |
| Natural Gas | Other Reportables | Fade longs | 0.73 [0.31, 1.44] | 81%, N=26 |
| Silver | Other Reportables | Buy shorts | 0.36 [0.04, 0.69] | 66%, N=35 |
| Bitcoin | Asset Managers | Fade longs (anti) | -0.50 [-0.95, -0.14] | 40%, N=20, edge -18% |
| NDX | Asset Managers | Fade longs (anti) | -0.47 [-0.99, -0.12] | 16%, N=37 |
The Highest-Conviction Tier: Dual Confirmation
The strongest signal we can produce stacks both filters: speculators at extreme shorts, open interest contracting (washout), AND commercials at extreme longs (smart money buying). When all three conditions fire on NDX simultaneously, the backtest hit rate is 100% with the cleanest Sharpe in the entire system. This combination fires roughly once a year on average — rare enough to never be a primary trading strategy on its own, but a powerful confirmation when it does. The dashboard surfaces this as the 'Dual Confirmation' alert tier above the regular CoT panel, and it triggers an explicit email to subscribers when it activates.
What Doesn't Work
Three negative findings worth surfacing. First, cross-market alignment — when many markets are simultaneously at extreme positioning — has a directionally correct but statistically weak relationship with forward changes in the macro regime composite. The correlation is real but the standard errors are too wide for it to be a standalone signal; it's context, not an alert. Second, week-over-week change in speculator net positioning produces some signal on Bitcoin and natural gas that the level-based test misses, but it's redundant with cross-market alignment for the markets where it works. Third, applying the same backtest infrastructure to news sentiment and Polymarket-implied probabilities — both of which are surfaced on the dashboard — produces materially weaker results than CoT, partly because the historical samples are smaller and partly because both sources carry survivorship and selection biases that positioning data doesn't. We label these honestly rather than pretending every input is equally rigorous.
Where to See It Live
The verified scorecard, the alert widget, and the strength badges all run on the live Alphamancy dashboard. The full per-market table — split by regime, with hit rates, average edges, and 95% confidence intervals at every horizon — sits below the CoT panel under 'CoT Positioning Scorecard.' New CFTC data hits the system every Saturday morning UTC (CFTC publishes Friday for Tuesday's snapshot), and the backtest re-scores any new entries automatically. The methodology is open: every signal classification in the system can be traced back to a specific subset of the 20-year sample with its own N, Sharpe, and CI.

Free charts, alerts, and screeners for every asset discussed on this page. Used by 50M+ traders.
Frequently Asked Questions
Does the textbook 'extremes are contrarian' rule work?▼
Only sometimes. Across our 20-year backtest of 10 markets, the contrarian buy works cleanly on Nasdaq and 10Y Treasury when specs are at extreme shorts. The contrarian fade fails on gold (the trend continues — fading loses 20% over 52W on average), is noise on NDX longs, and most other markets show no actionable signal at level alone.
What's the strongest single signal in your backtest?▼
Nasdaq 100 dual confirmation: specs at cotIndex ≤ 20, open interest contracting over the prior 13 weeks, AND commercials (dealers) at their own 3-year extreme long. That subset shows a 100% 12-week bounce rate across 15 historical occurrences at Sharpe 1.79. It fires roughly once per year.
Why don't you use just COT Index level on its own?▼
Because it's noisy across most markets. The level alone gets you Sharpe 0.51 on NDX — modest. Adding the open-interest direction filter raises it to Sharpe 1.44 on the same trade. Adding commercials counter-positioning raises it further to 1.79. The level is the entry point; the conditioning variables are what separate signal from noise.
What about week-over-week positioning change?▼
It produces real signal on Bitcoin and gold via the WoW lens that the level-based test misses, but for the markets where it works it's largely redundant with the OI/commercials filters above. We compute it and surface it in the underlying scorecard table for transparency, but the headline alerts use the stronger combined filter.
Why is gold different from equities?▼
Gold positioning is dominated by mining hedgers on the commercial side and trend-followers on the speculator side. There's no significant discretionary 'smart money' standing against the speculative crowd, so commercials at extremes don't carry information the way they do for index futures. The contrarian read also fails outright on gold — extreme spec longs continue rallying, not reversing. Gold rewards trend-following, not fading.
Related Topics
Track These Indicators Live
The Alphameter synthesizes six macro indicators into a single regime score — updated daily. See the current reading and full indicator breakdown on the dashboard.
Get notified when the market regime changes
Regime alerts + weekly macro brief. Unsubscribe anytime.