The CoT Backtest: 20 Years of Verified Positioning Signals

A rigorous 20-year backtest of CFTC Commitment of Traders signals across 10 markets. Hit rates, bootstrap confidence intervals, and the conditions that make the contrarian read actually work.

10 markets
tested
NDX, BTC, gold, crude, 10Y, EUR, JPY, silver, copper, NatGas
20 years
of CFTC history
2006 → 2026, ~10,000 weekly observations
2 trader groups
per market
primary spec + alt actor (Asset Managers / Other Reportables)
1.79
best Sharpe
NDX dual-confirmation (CI excludes zero)

Before You Read Any Further

This page is educational research, not investment advice. The hit rates, Sharpes, edges, and forward-direction phrases below describe what the underlying did historically when CFTC positioning sat at similar conditions — they are not predictions about what it will do next time. Past performance does not guarantee or even reliably predict future results. Even with bootstrap confidence intervals, every backtest carries residual survivorship, look-ahead, and selection risk. Nothing on this page constitutes a recommendation to buy, sell, or hold any security. If you act on positioning data, do so at your own risk and consult a licensed financial advisor first.

Why We Ran This Backtest

The textbook story on CoT positioning is simple: when speculators are at a 3-year extreme, they're wrong. Crowded long means a top is in; crowded short means a bottom is in. It's been repeated in macro commentary for decades. We wanted to know whether it's actually true. The answer turns out to be 'sometimes' — the contrarian read works cleanly in some markets, fails completely in others, and is supercharged by conditions almost no commentator mentions. This page is the full audit: methodology, results, and the three findings that genuinely change how to read a CoT report.

TradingView Supercharts screenshot
SponsoredTradingView
Chart this on TradingView

Free charts, alerts, and screeners for every asset discussed on this page. Used by 50M+ traders.

Open TradingView

Methodology

We pull the full CFTC archive for 10 actively traded futures contracts from 2006 to 2026. For each weekly release we compute the speculator group's net position ranked 0–100 against its own trailing 3-year range — the standard COT Index. A signal fires when the index enters a tail (≤ 20 = extreme short, ≥ 80 = extreme long), but only on the first week of entry — sitting at the extreme for eight weeks counts as one signal, not eight, eliminating auto-correlation that inflates naive backtests. We then look up the underlying asset's price on the report date and at +1W, +4W, +12W, +26W, and +52W trading days, computing the forward return and whether it moved in the textbook contrarian direction. Statistical significance is decided by a 1,000-sample bootstrap on the Sharpe ratio: a signal counts as 'real' only if the 95% confidence interval excludes zero. Everything is split by macro regime (risk-on vs risk-off) using the Alphameter's episode-aware classification, which inherits the ongoing directional episode through transient neutral days — so the conditioning is effectively a two-bucket test, the form most useful for trading the contrarian setups.

StepWhat we control for
Entry-only filterEliminates auto-correlation from consecutive weeks at extreme
Bootstrap 95% CI on SharpeDistinguishes real signal from sampling luck
Regime conditioningTests whether signal works in risk-on vs risk-off macro
Multi-horizon scoring1W, 4W, 12W, 26W, 52W — real signals decay coherently
10-market basketCross-market alignment tests for solo vs broad capitulation
Each control matters: dropping any one can fabricate signals that don't replicate.

Finding 1: The Textbook Read Fails on Half the Markets

Across 10 markets and 20 years, only a few of the textbook contrarian signals survive statistical scrutiny. Nasdaq 100 cotIndex ≤ 20 (specs at extreme shorts) bounces 78% of the time over 12 weeks at Sharpe 0.51 — that's a real signal. 10Y Treasury at extreme shorts has even better statistics (Sharpe 0.68, 77% hit). But gold fade-the-longs is brutal: at cotIndex ≥ 80, gold rallies an average of 4% over the next 12 weeks against the textbook view, with Sharpe -0.57. By 52 weeks it's down 20% against the fade. Bitcoin, copper, silver, EUR/USD, JPY, and WTI crude all classify as noise at the level — their forward returns are no different from random when positioning is at an extreme. Most of macro Twitter is reading positioning wrong, half the time.

MarketDirection12W Sharpe95% CIVerdict
Nasdaq 100Buy shorts (≤20)0.51[0.22, 0.95]SIGNAL
10Y TreasuryBuy shorts (≤20)0.68[0.39, 1.14]SIGNAL
GoldFade longs (≥80)-0.57[-1.01, -0.25]ANTI-SIGNAL — DON'T FADE
NDXFade longs (≥80)-0.37[-0.80, +0.04]ANTI (mild)
BitcoinEither direction≈0wideNOISE at level
EUR, JPY, Silver, CopperEither≈0wideNOISE at level

Finding 2: Open Interest Direction Supercharges the Signal

The single most overlooked variable in CoT analysis is what open interest is doing. When NDX cotIndex hits ≤ 20 with OI contracting over the prior 13 weeks — meaning specs are unwinding without new positions arriving — the 12-week bounce rate is 100% across all 26 historical entries, with Sharpe 1.44 (95% CI [1.16, 1.92]). When OI is expanding instead (specs piling into new shorts despite price), the same threshold produces Sharpe 0.02 — pure noise. Same logic strengthens the anti-signals: gold's already-bad fade gets worse with contracting OI (Sharpe -1.39, 10% hit rate), and natural gas extreme shorts with contracting OI hit 0% — the underlying never bounces, falling -15% on average. We call the favorable subset 'confirmed washout' on the dashboard — extreme positioning plus OI capitulation is institutional-grade, not 'macro Twitter'-grade.

SignalAll entriesOI contractingOI expanding
NDX buy shortsSharpe 0.51, hit 78%Sharpe 1.44, hit 100%, N=26Sharpe 0.02, noise
10Y buy shortsSharpe 0.68, hit 77%Sharpe 0.74, hit 80%, N=20Sharpe 0.46, weaker
Gold fade longsSharpe -0.57Sharpe -1.39, hit 10%Sharpe -0.42 (still anti)
NatGas buy shortsSharpe -0.44 (anti)Sharpe -1.63, hit 0%Sharpe 0, noise
OI direction is a first-order filter — not a tiebreaker.

Finding 3: Smart-Money Counter-Positioning Is the Final Filter

CFTC reports both sides of every futures market. When speculators are crowded short, somebody has to be net long against them. That somebody is the commercial group — dealers in financial futures, producers and merchants in commodities. They're the genuine smart money, with information edges and balance sheets that specs don't have. When NDX specs are at extreme shorts AND dealers are simultaneously at their own 3-year extreme long, the historical hit rate jumps to 100% over 12 weeks with Sharpe 1.79 (N=15). That's the highest single-condition Sharpe in the entire backtest. Without commercial confirmation, NDX shorts still work but much more weakly (Sharpe 0.28). 10Y Treasury shows the same pattern at smaller N. Gold doesn't benefit — its commercials are mostly hedging miners, not taking discretionary positions — confirming that gold positioning is structurally different from financials.

Sharpe 1.79
NDX buy + commercials long
N=15, 100% 12W hit rate
Sharpe 0.28
NDX buy without commercial confirmation
N=34, the leftover
Sharpe ≈0
Gold buy + commercials long
Commercials don't help here

Finding 4: Alternative Trader Categories Unlock Hidden Signals

Every CFTC report breaks positioning down into multiple trader categories, but the standard backtest only looks at one — Leveraged Funds in financials, Managed Money in commodities. We tested whether the slower-moving institutional categories carry signal where the primary spec doesn't: Asset Managers in financials, Other Reportables in commodities. The result was bigger than expected. Gold via Other Reportables runs a buy-shorts signal at Sharpe 0.71 (95% CI [0.43, 1.04]) with 74% 12-week hit rate — the same threshold via Managed Money is statistical noise. Silver, previously marked as noise across the board, surfaces a real Other Reportables buy signal at Sharpe 0.36. Natural Gas, an anti-signal market under Managed Money, runs a verified fade-longs signal via Other Reportables at Sharpe 0.73 with 81% 12-week hit rate. On the equity/crypto side, Asset Managers add anti-signals the Leveraged Funds read misses — NDX Asset Managers at extreme longs has a 16% 12-week hit rate, and Bitcoin Asset Managers at extreme longs precedes an average -18% drop over 12 weeks. The takeaway: the textbook 'speculator' category is not always the right one for each market. Three previously-noise markets actually carry verified signal under a different proxy.

MarketAlt actorDirectionSharpe (95% CI)Hit rate
GoldOther ReportablesBuy shorts0.71 [0.43, 1.04]74%, N=34
Natural GasOther ReportablesFade longs0.73 [0.31, 1.44]81%, N=26
SilverOther ReportablesBuy shorts0.36 [0.04, 0.69]66%, N=35
BitcoinAsset ManagersFade longs (anti)-0.50 [-0.95, -0.14]40%, N=20, edge -18%
NDXAsset ManagersFade longs (anti)-0.47 [-0.99, -0.12]16%, N=37

The Highest-Conviction Tier: Dual Confirmation

The strongest signal we can produce stacks both filters: speculators at extreme shorts, open interest contracting (washout), AND commercials at extreme longs (smart money buying). When all three conditions fire on NDX simultaneously, the backtest hit rate is 100% with the cleanest Sharpe in the entire system. This combination fires roughly once a year on average — rare enough to never be a primary trading strategy on its own, but a powerful confirmation when it does. The dashboard surfaces this as the 'Dual Confirmation' alert tier above the regular CoT panel, and it triggers an explicit email to subscribers when it activates.

What Doesn't Work

Three negative findings worth surfacing. First, cross-market alignment — when many markets are simultaneously at extreme positioning — has a directionally correct but statistically weak relationship with forward changes in the macro regime composite. The correlation is real but the standard errors are too wide for it to be a standalone signal; it's context, not an alert. Second, week-over-week change in speculator net positioning produces some signal on Bitcoin and natural gas that the level-based test misses, but it's redundant with cross-market alignment for the markets where it works. Third, applying the same backtest infrastructure to news sentiment and Polymarket-implied probabilities — both of which are surfaced on the dashboard — produces materially weaker results than CoT, partly because the historical samples are smaller and partly because both sources carry survivorship and selection biases that positioning data doesn't. We label these honestly rather than pretending every input is equally rigorous.

Where to See It Live

The verified scorecard, the alert widget, and the strength badges all run on the live Alphamancy dashboard. The full per-market table — split by regime, with hit rates, average edges, and 95% confidence intervals at every horizon — sits below the CoT panel under 'CoT Positioning Scorecard.' New CFTC data hits the system every Saturday morning UTC (CFTC publishes Friday for Tuesday's snapshot), and the backtest re-scores any new entries automatically. The methodology is open: every signal classification in the system can be traced back to a specific subset of the 20-year sample with its own N, Sharpe, and CI.

TradingView Supercharts screenshot
SponsoredTradingView
Chart this on TradingView

Free charts, alerts, and screeners for every asset discussed on this page. Used by 50M+ traders.

Open TradingView

Frequently Asked Questions

Does the textbook 'extremes are contrarian' rule work?

Only sometimes. Across our 20-year backtest of 10 markets, the contrarian buy works cleanly on Nasdaq and 10Y Treasury when specs are at extreme shorts. The contrarian fade fails on gold (the trend continues — fading loses 20% over 52W on average), is noise on NDX longs, and most other markets show no actionable signal at level alone.

What's the strongest single signal in your backtest?

Nasdaq 100 dual confirmation: specs at cotIndex ≤ 20, open interest contracting over the prior 13 weeks, AND commercials (dealers) at their own 3-year extreme long. That subset shows a 100% 12-week bounce rate across 15 historical occurrences at Sharpe 1.79. It fires roughly once per year.

Why don't you use just COT Index level on its own?

Because it's noisy across most markets. The level alone gets you Sharpe 0.51 on NDX — modest. Adding the open-interest direction filter raises it to Sharpe 1.44 on the same trade. Adding commercials counter-positioning raises it further to 1.79. The level is the entry point; the conditioning variables are what separate signal from noise.

What about week-over-week positioning change?

It produces real signal on Bitcoin and gold via the WoW lens that the level-based test misses, but for the markets where it works it's largely redundant with the OI/commercials filters above. We compute it and surface it in the underlying scorecard table for transparency, but the headline alerts use the stronger combined filter.

Why is gold different from equities?

Gold positioning is dominated by mining hedgers on the commercial side and trend-followers on the speculator side. There's no significant discretionary 'smart money' standing against the speculative crowd, so commercials at extremes don't carry information the way they do for index futures. The contrarian read also fails outright on gold — extreme spec longs continue rallying, not reversing. Gold rewards trend-following, not fading.

Related Topics

Track These Indicators Live

The Alphameter synthesizes six macro indicators into a single regime score — updated daily. See the current reading and full indicator breakdown on the dashboard.

Get notified when the market regime changes

Regime alerts + weekly macro brief. Unsubscribe anytime.