Does Dual N-Back Training Work? A Skeptic's Look at the Evidence

Direct answer

Dual n-back training reliably makes you better at the n-back task and at closely related working-memory tasks. That part is settled. The claim that sold the protocol to the public — that it raises fluid intelligence (your ability to reason and solve novel problems) — does not hold up under rigorous testing. When studies use an active control group that also does a demanding task, the gain in fluid intelligence shrinks to roughly zero.

So the practical answer is: yes, it trains a skill, and no, that skill does not appear to spread to general reasoning, IQ, or everyday cognition. If your goal is a higher score on n-back itself or a similar memory test, practice works. If your goal is to get smarter, the controlled evidence does not support spending dozens of hours on it.

What the dual n-back protocol actually is

N-back is a continuous working-memory task. You see a stream of stimuli and respond whenever the current item matches the one shown n steps earlier. At 2-back you compare each item to the one two before it; the level adapts upward as you improve, which is what makes it demanding.

The 'dual' version, used in the original 2008 study by Jaeggi and colleagues, runs two streams at once: a square appears in one of eight screen positions while a consonant plays through headphones, every three seconds. You track position-matches and audio-matches simultaneously and independently. Holding and updating two interleaved sequences is what makes dual n-back so taxing — and why advocates argued it should engage the same neural circuitry as fluid reasoning.

If you want to feel the task before reading the debate over it, the Deadline n-back test runs the standard adaptive protocol in your browser.

The study that started it: Jaeggi 2008

In 2008, Jaeggi, Buschkuehl, Jonides and Perrig published a paper in PNAS reporting that healthy young adults who trained on dual n-back improved on tests of fluid intelligence, with more training producing larger gains. The trained task was completely different from the IQ test, so the result looked like genuine far transfer — training one ability and improving another.

It was an exciting finding and it spread fast, spawning a generation of free n-back apps and paid 'brain training' products. But the study had design problems that became the center of the next decade of argument. Critics noted the intelligence tests were given under a compressed time limit — about 10 minutes instead of the standard ~45 — which can distort scores, and that several comparison groups received no alternative activity at all.

Near transfer vs. far transfer: the distinction that decides the debate

Cognitive scientists separate two outcomes. Near transfer means improvement on tasks very similar to the trained one — here, untrained n-back variants or other working-memory tests. Far transfer means improvement on something genuinely different, such as fluid intelligence, reading, or arithmetic. Near transfer is common and unremarkable; far transfer is the prize, and the thing brain-training marketing implicitly promises.

Soveri and colleagues' 2017 multi-level meta-analysis of 33 randomized controlled n-back studies drew the line cleanly. They found a medium-sized transfer effect to untrained n-back tasks — real near transfer. But transfer to other working-memory tasks, to fluid intelligence, and to cognitive control was very small. Notably, single versus dual n-back made no difference, undercutting the idea that the 'dual' format is special.

Outcome	Type of transfer	What the evidence shows
Trained n-back task	Direct practice	Large, reliable gains
Untrained n-back variants	Near transfer	Medium effect (Soveri 2017)
Other working-memory tasks	Intermediate	Small, often fades after training
Fluid intelligence (IQ)	Far transfer	Near zero with active controls
Reading, arithmetic, attention	Far transfer	No convincing evidence

The replications that deflated the IQ claim

The strongest single rebuttal is Redick and colleagues (2013). They ran the proper experiment: 20 sessions of adaptive dual n-back, an active placebo-control group doing adaptive visual search, and a no-contact control. They found no evidence that more dual n-back practice produced more fluid-intelligence gain — the dose-response signal that made the 2008 paper persuasive simply did not appear.

Meta-analysis tells the same story once you account for control type. Au and colleagues (2015), generally read as pro-training, reported an overall effect on fluid intelligence of g = 0.24 — small but significant. The catch is in the breakdown: studies with passive (do-nothing) controls showed g = 0.44, while studies with active controls showed g = 0.06, essentially nothing. The gain rides almost entirely on weaker comparison groups.

Melby-Lervåg, Redick and Hulme (2016) made that point definitively across 87 publications and 145 comparisons: for far-transfer measures — nonverbal and verbal ability, decoding, reading comprehension, arithmetic — there was no convincing evidence of reliable improvement when training was compared against a treated control. Their reanalysis of Au's data put the effect at g = 0.13, which loses significance once no-control studies are excluded.

Why the early results looked better than they were

The pattern — big effects against passive controls, vanishing effects against active ones — is the fingerprint of expectancy and placebo effects rather than real cognitive change. If your control group sits idle while the training group shows up, gets coaching, and believes it is getting smarter, the difference at post-test can come from motivation, test familiarity, and demand characteristics alone. Reviews of educational research have estimated expectancy effects at up to 0.3 standard deviations — large enough to manufacture the entire 'transfer' signal.

This is why the field now treats active, treated control groups as the minimum bar. Most of the early enthusiasm came from designs that could not separate genuine transfer from belief and engagement.

Deadline first-party data

Deadline first-party data (coming soon) — based on N real n-back runs. We are aggregating anonymized scores from real users to show how n-back performance distributes across levels and how quickly people improve with repeated practice. This will quantify the near-transfer effect everyone agrees is real — your own learning curve on the task — without overclaiming the far-transfer benefits the research does not support.

So should you bother?

Train n-back if you enjoy it, want a measurable working-memory challenge, or like watching your own score climb — those benefits are real and yours to keep. Just hold the right expectation: you are getting better at n-back, not rewiring your general intelligence. No controlled study has shown that the skill reliably exports to reasoning, school performance, or work.

If your aim is to gauge or track working memory rather than 'boost IQ', a battery of distinct tasks is more informative than grinding one. Pair n-back with the digit span test for a verbal-memory contrast, and treat each as a measurement of a specific capacity rather than a lever on the whole mind.

Frequently asked questions

Does dual n-back increase IQ? The best-controlled studies say no. When training is compared against an active control group that also performs a demanding task, the effect on fluid intelligence drops to near zero (g = 0.06 in Au et al., 2015). The apparent IQ gains in early studies largely reflect expectancy and placebo effects from weak control conditions.

Is dual n-back better than single n-back? For transfer, no. Soveri et al. (2017) found that single versus dual n-back made no difference to outcomes, which undercuts the idea that running two simultaneous streams produces a special cognitive benefit. Dual is simply a harder version of the same task.

What does dual n-back actually improve? It produces large gains on the trained task and a medium-sized improvement on untrained n-back variants — genuine near transfer. What it does not reliably improve is fluid intelligence, reading, arithmetic, or attention, the far-transfer outcomes that brain-training marketing implies.

Why do some studies show n-back works? Studies showing strong effects tend to use passive control groups who do nothing. Against such groups the training group's motivation, test familiarity, and belief in the program inflate the result. Studies with active control groups — the rigorous design — show little to no far transfer.

How long would I need to train to see results? You will see your n-back score improve within a handful of sessions. The original studies ran roughly 20 sessions over four to five weeks. But the apparent dose-response link to intelligence in early work did not replicate in Redick et al. (2013), so more hours buys a better n-back score, not measurably better reasoning.

Related on Deadline

n-back test · digit span test · memory tests · cognitive benchmark

Sources and notes

https://www.pnas.org/doi/10.1073/pnas.0801268105
https://pubmed.ncbi.nlm.nih.gov/28116702/
https://link.springer.com/article/10.3758/s13423-016-1217-0
https://englelab.gatech.edu/articles/2013/redick-et-al-20132c-wm-training2c-jepg.pdf
https://pubmed.ncbi.nlm.nih.gov/22708717/
https://link.springer.com/article/10.3758/s13423-014-0699-x
https://pubmed.ncbi.nlm.nih.gov/25102926/
https://journals.sagepub.com/doi/10.1177/1745691616635612
https://pmc.ncbi.nlm.nih.gov/articles/PMC4968033/