APEX QQQ Trader — Signal Accuracy Under the Hood

2 days ago
5 min read

One of the things worth understanding about APEX is whether the machine learning model is actually good at predicting direction, or whether the algo's returns are mostly a product of being in a leveraged instrument during a bull market. A 3x leveraged ETF in a market that went up most of the last decade will produce impressive numbers regardless of timing skill. So I went back through four years of backtest data (2020–2023) and scored every weekly signal the algo produced, broken out by the type of market regime that was active at the time.

APEX operates in three distinct modes depending on what the market is doing. It is not always the ML model making the call. Understanding which mechanism is in control at any given time — and how accurate each one is — matters more than looking at a single headline accuracy number.

The Three Regimes

Bear Market. When broad trend conditions deteriorate beyond a sustained threshold, the algo moves entirely to cash and stays there until the environment structurally improves. This is the blunt instrument. It does not try to pick bottoms or trade around the weakness. It sits out. During the 2020–2023 backtest window, this regime was active for about 13 weeks, concentrated in early 2022 as the Nasdaq rolled over from its November 2021 highs. The algo moved to BIL in early February 2022 and stayed there until a separate mechanism took over in May.
Oversold Shocks and Recoveries. When the market suffers an acute single-day crash — the kind of move that can devastate a leveraged position — the algo liquidates immediately, waits for volatility to decelerate, and then systematically ramps back into TQQQ over several days. Once fully re-entered, it holds through the recovery until conditions normalize. The COVID crash in February 2020 triggered the first episode, which ran through August 2020. A secondary shock in September 2020 kept the algo in recovery mode through February 2021. Then in May 2022, another shock triggered a hold that persisted through the end of the backtest in late 2023. In each case, the algo was holding TQQQ through the recovery — not because the ML model was calling direction, but because the shock system was executing its own logic.
All Other Periods. This is when the ML model is in control. The market is not in a sustained bear trend, and no shock event is active. The model evaluates conditions each Friday, produces a probability estimate, and the algo goes 100% TQQQ or 100% BIL based on whether that probability clears the threshold.

Time in Each Regime

Regime	Weeks	% of Backtest	Allocation
Oversold Shocks and Recoveries	131	65%	TQQQ (recovery hold)
All Other Periods (ML active)	57	28%	TQQQ or BIL per model
Bear Market	13	7%	BIL (cash)

The 2020–2023 window included two major crash-and-recovery cycles (COVID and the 2022 drawdown), which meant the shock recovery system was active for an unusually large share of the backtest. In a calmer market, the ML model would be in control more often. Regardless of the split, each regime can be assessed on its own terms.

Accuracy by Regime

Regime	Metric	Right	Wrong	Total	Accuracy
ML Active	1-week direction	43	14	57	75.4%
ML Active	4-week direction	36	21	57	63.2%
Shock Recovery	1-week direction	7	8	15	46.7%
Shock Recovery	Episode win/loss	2	0	2	100%
Bear Market	4-week direction	6	8	14	42.9%

The ML-controlled weeks are where the algo's skill is most directly measurable. Each Friday signal is scored against the following week's actual QQQ return: did the market move in the direction the algo called?

Breaking down the ML's 57 weeks by signal direction reveals an asymmetry worth noting.

ML Signal	Right	Wrong	Total	1-Week Accuracy
TQQQ (bullish)	22	4	26	84.6%
BIL (bearish)	21	10	31	67.7%

Both signals carry a meaningful edge, though the model is particularly sharp on bullish calls. The bearish misses are typically weeks where the algo sat in cash during a market that continued grinding higher — uncomfortable, but structurally less damaging than the alternative error of being leveraged 3x into a downturn.

By year, the ML accuracy was consistent across the periods it was active:

Year	Right	Wrong	Total	Accuracy	Notes
2020	7	3	10	70.0%	Brief ML window before shock took over
2021	32	11	43	74.4%	Primary ML period
2022	4	0	4	100.0%	Last few weeks before bear regime activated

The shock recovery regime is harder to score on a weekly basis because it is not making a directional prediction each week. It is executing a mechanical process: enter after deceleration, hold through recovery. On any given week during the hold, TQQQ might be up or down — the weekly hit rate was 46.7%, essentially a coin flip. But the regime is not designed for weekly accuracy. It is designed to capture the full recovery arc. Both shock episodes in this backtest window were net wins when measured from entry to exit.

The bear market regime's implicit call — that the market will continue lower — was correct about 43% of the time when scored at the four-week mark. This looks weak in isolation, but the regime's value is not in predicting weekly direction. It is in avoiding sustained drawdowns in a 3x leveraged instrument, where a 30% decline in QQQ translates to a 70-80% loss in TQQQ. The bear regime only needs to be right about the big picture, and in 2022 it was.

What Confidence Indicates (and What It Does Not)

One of the model's more interesting dynamics is that its probability output — its own measure of confidence — does not meaningfully distinguish correct calls from incorrect ones.

Outcome	N	Mean Confidence	Median Confidence
Correct calls	43	0.792	0.851
Incorrect calls	14	0.779	0.808

The model is not more confident on its good calls. It is approximately equally confident on all of them.

This has a direct practical implication. It is the reason I tested and ultimately rejected proportional allocation — scaling TQQQ exposure to the model's probability level. If higher confidence meant higher accuracy, proportional sizing would improve risk-adjusted returns. It does not, because the relationship is not there. The binary approach — all in above the threshold, all out below it — works better precisely because it does not try to extract information from the confidence signal.

The Bigger Picture

The 2020–2023 period included two major crashes and two recoveries, which gave the shock system more to do than a quieter market would have. In a period with fewer acute shocks and more gradual trend shifts, the ML model will spend more time in control. That is the environment where the model's directional accuracy matters most — and where 75% at the weekly level, with an 85% hit rate on bullish calls specifically, should compound meaningfully in a 3x leveraged instrument. The three regimes are designed to complement each other across the full range of market conditions, and the accuracy numbers suggest each one is doing its job.