The famous psychology test gets roasted in the new era of replication.
Here’s some good news: Your fate cannot be determined solely by a test of your ability at age 5 to resist the temptation of one marshmallow for 15 minutes to get two marshmallows.
This relieving bit of insight comes to us from a paper published recently in the journal Psychological Science that revisited one of the most famous studies in social science, known as “the marshmallow test.”
The idea behind the new paper was to see if research from the late 1980s and early ’90s showing that a simple delay of gratification (eating a marshmallow) at ages 4 through 6 could predict future achievement in school and life could be replicated.
What the researchers found: Delaying gratification at age 5 doesn’t say much about your future. Rather, there are more important — and frustratingly stubborn — forces at work that push or pull us from our greatest potential.
The marshmallow test story is important. The original studies inspired a surge in research into how character traits could influence educational outcomes (think grit and growth mindset). They also influenced schools to teach delaying gratification as part of “character education” programs.
It’s also a story about psychology’s “replication crisis,” in which classic findings are being reevaluated (and often failing) under more rigorous methodology. It teaches a lesson on a frustrating truth that pervades much of educational achievement research: There is not a quick fix, no single lever to pull to close achievement gaps in America. Trendy pop psychology ideas often fail to grapple with the bigger problems keeping achievement gaps wide open.
“People are desperately searching for an easy, quick, apparently effective answer for how we can transform the lives of people who are under distress,” Brent Roberts, a personality psychologist who edited the new Psychological Science paper, says. “And what’s more frustrating than anything else is that another feature of human nature is that we get fooled by overemphasizing the quick and easy answers to the more complex ones.”
The marshmallow test, explained
How often as child were you told to sit still and wait? As a kid, being told to sit quietly while your parent is off talking to an adult, or told to turn off the TV for just a few seconds, or to hold off on eating those cupcakes before the guests arrive are some of the hardest challenges in a young life. A huge part of growing up is learning how to delay gratification, to sit patiently in the hope that our reward will be worth it.
Plotting the how, when, and why children develop this essential skill was the original goal of the famous “marshmallow test” study. Pioneered by psychologist Walter Mischel at Stanford in the 1970s, the marshmallow test presented a lab-controlled version of what parents tell young kids to do every day: sit and wait.
In the test, a marshmallow (or some other desirable treat) was placed in front of a child, and the child was told they could get a second treat if they just resisted temptation for 15 minutes. If they succumbed to the devilish pull of sugar, they only got the one. Here’s a video showing how it’s typically administered.
The test was a tool to chart the development of a young mind and to see how kids use their cognitive tools to conquer a tough willpower challenge.
Mischel learned that the subjects who performed the best often used creative strategies to avoid temptation (like imagining the marshmallow isn’t there). Follow-up work showed that kids could learn to wait longer for their treat. And further research revealed that circumstances matter: If a kid is led to mistrust the experimenter, they’ll grab the treat earlier.
But that work isn’t what rocketed the “marshmallow test” to become one of the most famous psychological tests of all time. It was the follow-up work, in the late ’80s and early ’90s, that found a stunning correlation: The longer kids were able to hold off on eating a marshmallow, the more likely they were to have higher SAT scores and fewer behavioral problems, the researchers said. The results were taken to mean that if only we could teach kids to be more patient, to have greater self-control, perhaps they’d achieve these benefits as well.
But the studies from the ’90s were small, and the subjects were the kids of educated, wealthy parents.
In fairness to Mischel and his colleagues, their findings, as written in 1990, were not so sweeping. In the study linking delay of gratification to SAT scores, the researchers acknowledged the possibility that with a bigger sample size, the magnitude of their correlation could decrease. They also mentioned that the stability of the home environment may play a more important role than their test was designed to reveal. It also wasn’t an experiment. The results also didn’t necessarily mean that teaching kids to delay their gratification would cause these benefits later on.
“The findings of that study were never intended to be prescriptions for an application,” Yuichi Shoda, a co-author on the 1990 paper linking delay of gratification to SAT scores, says in an email. “Our paper does not mention anything about interventions or policies.” And they readily admit that the delay task is the result of a whole host of factors in a child’s life. “‘Controlling out’ those variables, which contribute to the diagnostic value of the delay measure, would be expected to reduce their correlations,” Mischel, who says he welcomes the new paper, writes. In an interview with PBS in 2015, he said “the idea that your child is doomed if she chooses not to wait for her marshmallows is really a serious misinterpretation.”
Yet their findings have been interpreted to be a prescription by school districts and policy wonks. “If you’re a policy maker and you are not talking about core psychological traits like delayed gratification skills, then you’re just dancing around with proxy issues,” the New York Times’s David Brooks wrote in 2006. It’s not hard to find studies on interventions to increase delaying gratification in schools or examples of schools adopting these lessons into their curricula. Sesame Street’s Cookie Monster has even been used to teach the lesson.
How the new study changes the story
Over the years, the marshmallow test papers have received a lot of criticism. The biggest one is that delay of gratification might be primarily a middle- and upper-class value. Does it make sense for a child growing up in poverty to delay their gratification when they’re so used to instability in their lives? Also, there’s the case that some kids are just less interested in candy and treats than others.
It’s been nearly 30 years since the show-stopping marshmallow test papers came out. And what’s astounding is that it’s only now that researchers have bothered to replicate the long-term findings in a new data set. That’s more of an indictment of the incentives and practices of psychological science — namely, favoring flashy new findings over replicating old work — than of flaws in the original work. (Though, be assured, psychology is in the midst of a reform movement.)
Tyler Watts, the NYU psychology professor who is the lead author on the new replication paper, got lucky. He and his colleagues found that in the 1990s, a large NIH study gave a version of the test to nearly 1,000 children at age 4, and the study collected a host of data on the subjects’ behavior and intelligence through their teenage years. But no one had used this data to try to replicate the earlier marshmallow studies.
The new paper isn’t an exact replication of the original. The marshmallow test in the NIH data was capped at seven minutes, whereas the original study had kids wait for a max of 15. Nevertheless, it should test the same underlying concept.
And there are some other key differences. The original studies in the 1960s and ’70s recruited subjects from Stanford’s on-campus nursery school, and many of the kids were children of Stanford students or professors. That’s not exactly a representative bunch. The new study included 10 times as many subjects compared the old papers and focused on children whose mothers who did not attend college.
Here’s what they found, and the nuance is important.
While successes at the marshmallow test at age 4 did predict achievement at age 15, the size of the correlation was half that of the original paper. And the correlation almost vanished when Watts and his colleagues controlled for factors like family background and intelligence.
That means “if you have two kids who have the same background environment, they get the same kind of parenting, they are the same ethnicity, same gender, they have a similar home environment, they have similar early cognitive ability,” Watts says. “Then if one of them is able to delay gratification, and the other one isn’t, does that matter? Our study says, ‘Eh, probably not.’”
In other words: Delay of gratification is not a unique lever to pull to positively influence other aspects of a person’s life. It’s a consequence of bigger-picture, harder-to-change components of a person, like their intelligence and environment they live in.
The results imply that if you can teach a kid to delay gratification, it won’t necessarily lead to benefits later on. Their background characteristics have already put them on that path.
What’s more, the study found no correlation — even without controls — between delaying gratification and behavioral outcomes later in life. “In that sense, that’s the one piece of the paper that’s really a failure to replicate,” Watts says.
His paper also found something that they still can’t make sense of. Most of the predictive power of the marshmallow test can be accounted for kids just making it 20 seconds before they decide to eat the treat. “So being able to wait for two minutes, five minutes, or seven minutes, the max, it didn’t really have any additional benefits over being able to wait for 20 seconds.”
That makes it hard to imagine the kids are engaging in some sort of complex cognitive trick to stay patient, and that the test is revealing something deep and lasting about their potential in life. And perhaps it’s an indication that the marshmallow experiment is not a great test of delay of gratification or some other underlying measure of self-control.
Their study doesn’t completely reverse the finding of the original marshmallow paper. But it reduces the findings to a point where it’s right to wonder if they have any practical meaning.
It’s also worth mentioning that research on self-control as a whole is going through a reevaluation. Namely, that the idea people have self-control because they’re good at willpower (i.e., effortful restraint) is looking more and more like a myth. People who say they are good at self-control are often people who live in environments with fewer temptations. Similarly, the idea that willpower is finite — known in the academic literature as ego depletion — has also failed in more rigorous recent testing. Overall, we know less about the benefits of restraint and delaying gratification than the academic literature has let on.
Educational interventions often fail
Education research often calls traits like delaying gratification “noncognitive” factors. These are personal traits not related to intelligence that many researchers believe can be molded to enhance outcomes. The marshmallow test is the foundational study in this work. And today, you can see its influence in ideas like growth mindset and grit, which are also popular psychology ideas that have influenced school curricula (namely in the guise of “character education” programs.)
Growth mindset is the idea that if students believe their intelligence is malleable, they’ll be more likely to achieve greater success for themselves. A lot of research and money has gone into teaching this mindset to kids, in the hope that it can be an intervention to decrease achievement gaps in America.
The state of the evidence on this idea is frustrating. Recently, a huge meta-analysis on 365,915 subjects revealed a tiny positive correlation between growth mindset educational achievement (in science speak, the correlation was .10 — with 0 meaning no correlation and 1 meaning a perfect correlation).
“That’s inconsequentially small,” Roberts says. Interventions to increase mindset were also shown to work, but limply. The average effect size (meaning the average difference between the experimental and control groups) was just .08 standard deviations. That’s barely a nudge. (If you click here you can visualize what an effect size that small looks like.) It’s hard to know if the time and money that goes into growth mindset interventions is worth it.
There’s less comprehensive data on grit, an idea popularized by University of Pennsylvania psychologist Angela Duckworth. Grit, a measure of perseverance (which critics charge is very similar to the established personality trait of conscientiousness), is correlated with some measures of achievement. But the long-term work on whether grit can be taught, and whether teaching it can lead to academic improvements, is still lacking.
Also consider that these studies take place over a short period of time. Researchers find that interventions to increase school performance — even intensive ones like early preschool programs — often show a strong fadeout: that initially, interventions show strong results, but then over the course of a few years, the effects disappear. “Most interventions targeting children’s cognitive, social or emotional development fail to follow their subjects beyond the end of their programs,” a 2018 literature review finds. “When they do, complete fadeout is common.”
What’s more important: teaching patience or reducing income inequality?
It’s not that these noncognitive factors are unimportant. No one doubts delaying gratification is an important life skill, and one that squirmy kids need to master. And it’s obviously nice if kids believe in the possibility of their own growth.
What the latest marshmallow test paper shows is that home life and intelligence are very important for determining both delaying gratification and later achievement. These are factors that are constantly influencing a child.
Their influence may be growing in an increasingly unequal society. As income inequality has increased in America, so have achievement gaps. Today, the largest achievement gaps in education are not between white Americans and minorities, but between the rich and poor. Research from Stanford economist Sean Reardon finds that the school achievement gap between the richest and poorest Americans is twice the size of the achievement gap between black and white Americans and has been growing for decades.
Reducing poverty could go a long way to improving the educational attainment and well-being of kids. “It’s very hard to find psychological effects that are not explained by the socioeconomic status of families,” says Pamela Davis-Kean, a developmental psychologist at the University of Michigan. Nothing changes a kid’s environment like money.
Money buys good food, quiet neighborhoods, safe homes, less stressed and healthier parents, books, and time to spend with children. Teaching kids how to delay gratification or have patience “may not be the primary thing that’s going to change their situation,” Davis-Kean says.
Economic security possibly can. Greg Duncan, a UC Irvine economist and co-author of the new marshmallow paper, has been thinking about the question of which educational interventions actually work for decades. And, he says, “I’m not exactly sure I’m further along than I was 30 years ago.”
So he’s trying to find out what happens when a kid’s home environment is dramatically altered. Duncan is currently running an experiment asking whether giving a mother $333 a month for the first 40 months of her baby’s life aids the child’s cognitive development. If successful, the study could clarify the power reducing poverty has on educational attainment.
Reducing income inequality is a more daunting task than teaching kids patience. Increasing IQ is a more daunting task than teaching kids patience (though, helpfully, the research finds each year of schooling a person receives leads to a small boost in IQ). But if a simple, widely effective intervention for educational attainment exists, social scientists have yet to find it.
Even interventions to boost kids’ understanding of academic skills like math often yield lackluster findings. In other work, Watts and Duncan have found that mathematics ability in preschool strongly predicts math ability at age 15. From that work, you’d think that by boosting math ability in preschool, you’d put kids on a surer course. But yet, programs aimed at increasing math ability in preschool don’t work as powerfully as the correlation studies imply they should and show a strong fadeout effect.
Watts says his new marshmallow test study doesn’t mean it’s impossible to design preschool interventions that have long-lasting effects. Or that “delay of gratification can’t or couldn’t be a piece of that,” he says.
But if the recent history of social science has taught us anything, it’s that experiments that find quick, easy, and optimistic findings about improving people’s lives tend to fail under scrutiny. Harder work remains. Studies that find exciting correlations need to be followed up with long-term experimental research. This research is expensive and hard to conduct. But without rigorous studies, we’re going to remain prone to research hype.
“Our ability to test some of the things that we think are really fundamental has never been greater,” Watts says. “We have a unique opportunity now to go back to some of the findings we take for granted and test them. That doesn’t mean we need to go out to disprove everything.”
But it does mean we may get closer to the truth.
By Brian Resnick
Kathryn McNeer, LPC specializes in Couples Counseling Dallas with her sound, practical and sincere advice. Kathryn’s areas of focus include individual counseling, relationship and couples counseling Dallas. Kathryn has helped countless individuals find their way through life’s inevitable transitions; especially that tricky patch of life known as “the mid life crisis.” Kathryn’s solution-focused, no- nonsense counseling works wonders for men and women in the midst of feeling, “stuck,” or “unhappy.” Kathryn believes her fresh perspective allows her clients find the better days that are ahead. When working with couples, it is Kathryn’s direct yet non-judgmental approach that helps determine which patterns are holding them back and then helps them establish new, more productive patterns. Kathryn draws from Gottman and Cognitive behavioral therapy. When appropriate Kathryn works with couples on trust, intimacy, forgiveness, and communication.