Why no one really knows how bad Facebook’s vaccine misinformation problem is

Is Facebook “killing people” by enabling the spread of Covid-19 misinformation, as President Joe Biden said a few weeks ago? Or is the social media company efficiently purging Covid-19 misinformation from its platform and showing millions of people information about where to get vaccinated, as the company argued a day later in its response to the president?

Biden partially walked back his comments, but the reality is we simply don’t know the true size or effect of Covid-19 misinformation on Facebook and Facebook-owned Instagram. That’s in large part because Facebook isn’t giving researchers enough of the real-time data they need to figure out exactly how much Covid-19 misinformation is on the platform, who’s seeing it, and how it’s impacting their willingness to get vaccinated. Researchers say they need this kind of data to understand the scope of the misinformation problem, which misleading messages are resonating with people, and how public health officials can counter them.

“Right now, we’re guessing [on] a lot of stuff,” said Katherine Ognyanova, an associate professor of communications at Rutgers University who participates in the Covid States project, a research group that surveys people about their social media use and Covid-19 behaviors. “We can ask people questions. But Facebook truly has the data about what people have seen and how their attention is being devoted on the platform.”

Over a dozen independent researchers who regularly study Facebook, including six who are specifically researching the spread of information about Covid-19, told Recode that the company makes it difficult for people studying the platform to access vital information, including how many times people viewed Covid-related articles, what health misinformation Facebook takes down, and what’s being shared on private pages and groups.

Facebook does have some programs, like the Social Science One data-sharing initiative, to give researchers more detailed information than is publicly available. But some say that the process for receiving that data takes too long to keep up with the ever-changing Covid-19 situation. This has led researchers to use alternative methods to log posts manually, run opt-in user studies, or design independent surveys, and Facebook has sometimes disputed the results of those who do these workarounds.

Researchers aren’t just clamoring for more information about Facebook, either. YouTube, Twitter, and other social media networks also have troves of data about Covid-19 misinformation that could help researchers. But because Facebook is the largest social media platform for sharing news — one where many posts are private — the company is central to the debate about transparency in Big Tech and the societal impacts of its products.

Facebook VP of global affairs Nick Clegg said that the company is “already committed to providing unprecedented data sets to independent researchers” and that “everyone always wants more, and we will always seek to do more,” when asked about the issue of researcher data access at a recent event hosted by the nonprofit Freedom House.

Meanwhile, several academics Recode spoke with say that a lack of access to Facebook data is limiting their ability to understand how many people are seeing Covid-19 misinformation that could be causing vaccine hesitancy in the US. It’s an increasingly urgent issue as the delta variant of the virus spreads across the country, infecting tens of thousands of new people daily. Only about half the population is fully vaccinated, and an estimated 20 percent of Americans remain unwilling to get the shot.

Researcher access to how social media spreads online is “profoundly important” to overcoming vaccine hesitancy in the US, according to Surgeon General Vivek Murthy, whose office recently put out a report calling misinformation a threat to public health.

“The data gap means we are flying blind. We don’t know the extent of the problem. We don’t know what’s working to solve the problem. We don’t know who’s most impacted by the problem,” Murthy told Recode.

More accurate research data is “absolutely essential for us to be able to take targeted effective action to address misinformation,” he added. “The fact that you don’t have it is hampering us at a time when misinformation is actively harming people’s health.”

Facebook’s contentious relationship with researchers recently attracted headlines, after the company cut off access to the accounts of a group of outside researchers at NYU’s Ad Observatory, which was monitoring political ads on the platform. Facebook said it revoked the group’s access because of privacy concerns, but the Ad Observatory argued that the study’s participants were all opt-in volunteers, who willingly shared information about what ads they were seeing on Facebook for research purposes. The group’s leader said Facebook is “silencing” research that “calls attention to problems” with how the company handles political ads. The Ad Observatory was also helping with some Covid-19 misinformation research.

There are, however, legitimate privacy reasons for Facebook to be hesitant about giving researchers carte blanche to study user data. Since the Cambridge Analytica scandal in 2016, when a psychology researcher exploited the private information of up to 87 million Facebook users for political purposes, Facebook has been more guarded about how it shares information with academics. But researchers say there are still ways for Facebook to share anonymized data, such as a list of the most viewed articles in real time or aggregated information about which Covid-19 topics are popular with certain demographics of people.

“It’s defensible on the part of Facebook that they want to protect the data of an everyday person,” Rachel Moran, a researcher studying Covid-19 misinformation on social media at the University of Washington’s Information School, told Recode. “But in trying to understand actually how much misinformation is on Facebook, and how it’s being interacted with on a daily basis, we need to know more.”

While preserving user privacy is a laudable goal, the concern among the academic community is that Facebook is effectively using this rationale as a shield against critics who want more open access to the platform. And now more than ever, this access could be critical in helping researchers and public health experts understand what kinds of false narratives about Covid-19 are affecting vulnerable communities and how to allocate resources to help them.

How researchers are getting around the data gap

Facebook offers a few tools to people studying the platform, like the real-time analytics platform Crowdtangle and regular survey results about Facebook users’ Covid-19 symptoms and attitudes about Covid-19, including vaccines. The company also supplies a special data set to the Social Science One consortium of academics.

But these resources — while helpful — aren’t enough to keep up with the constantly evolving barrage of Covid-19 misinformation, and to truly understand how it impacts their behavior, according to several leading social media researchers.

So academics have devised their own manual methods to gather data, including independent surveys and opt-in user experiments.

“We often try and take an embedded approach where we’re like, ‘Okay, so if I was an average Facebook user, how would I encounter this information?’” said Moran. “I have a poor research assistant who literally is charged with manually capturing each story, each video that comes up, because there’s no way of accessing that information otherwise.”

Moran and her staff can spend “hours and hours” poring over Instagram stories of popular misinformation influencers, where users are slipping in bogus claims about Covid-19. While useful in understanding the tactics that influencers use to deceive their audiences, that kind of time-consuming research is ultimately just a small snapshot of the larger Facebook ecosystem.

To get a grasp on what Covid-19 misinformation may be going viral, many researchers use Crowdtangle as a starting point. This Facebook-owned tool lets researchers look up how many times a specific URL has been shared or reacted to on Facebook. Crowdtangle does not give researchers certain key metrics, though, like how many people view a post and what’s circulating on people’s private Facebook profiles as opposed to public pages. These details can be more important than how many people share or react to it.

Facebook itself acknowledges the limitations of Crowdtangle data but still declines to share more accurate data about what the most popular content is on its platform. It would be “extremely easy,” for example, for Facebook to release an up-to-date list of the most viewed websites that people link to on its platform, without raising any concerns over user privacy, according to David Rothschild, an economist at Microsoft Research. But Facebook has historically refused to release even high-level, aggregate data like this.

“It’s baffling,” Rothschild said. “Just baffling.”

Without more data access from Facebook about what people are seeing and what’s being taken down, researchers say they’re trying to crack open a black box. Making matters more difficult, Facebook and other social media companies are constantly changing their features and tweaking their algorithms, which can render researchers’ homegrown methods for studying the social network useless.

“Just when you think that you have a set of tools and scripts and codes coming from these platforms, they make some changes and you have to start over,” said Rutgers’s Ognyanova. “So that’s kind of the plight of social media researchers.”

Facebook’s history of criticizing outside research

David Lazer co-leads the Covid States Project, one of the top research groups trying to understand, in part, why so many Americans don’t want to get vaccinated. The well-respected team’s survey findings are regularly used by politicians, health experts, and other researchers to better inform public policy.

The Covid States Project put out a report in late July showing that Facebook news consumers were less likely to get vaccinated than Fox News viewers. Facebook promptly attacked the study’s methodology. A company spokesperson told Gizmodo that the results were “sensationalized” and “overstated,” in part because they relied on self-reported survey data over a short time window. Instead, Facebook argued, researchers should have used better data, like people’s actual reliance on the social network for news over self-reported survey data — data that only Facebook can access.

Lazer says he could have asked Facebook directly to collaborate to design an experiment together to get better data about how people used the platform, but that would take time. Last year, Lazer was one of several academics selected to work with Facebook on a separate elections-related ongoing research project, for which he’s receiving special access to user behavior data. But that model wouldn’t work for the Covid States Project, since his team needed real-time data to study quickly shifting messaging on Covid-19 vaccines.

“[Facebook] is saying: ‘You can’t answer this question unless you have data like that. Oh, and by the way, we have a monopoly on data like that,’” said Lazer. “That’s a problem.”

The back-and-forth represents a longstanding issue between Facebook and outside researchers who study social media. For years, researchers have requested more detailed information about how people use the site, including links they’ve clicked on and emotion-based reactions to posts. They want this data so they can better understand how content in people’s Facebook and Instagram feeds informs their opinions. More granular data could help them answer, for example, whether people who view one piece of misinformation are more likely to click on another, or whether a certain demographic is more susceptible to sharing Covid-19 hoaxes than others.

“Facebook can say, ‘Oh, you saw this story? Oh, you lingered on it,’” Lazer suggested. “So Facebook has the dream machine for understanding human behavior.”

Facebook has also disputed the findings of an influential report cited by Biden and Sen. Amy Klobuchar (D-MN) that claimed only 12 users — a so-called “Disinformation Dozen” — were responsible for 65 percent of vaccine misinformation on Facebook and Twitter. Facebook told Recode that it left out key facts about how the company had disabled many popular accounts responsible for spreading misinformation. But rather than critiquing outside studies, Facebook should be opening its books to researchers about how it prioritizes content people see in their News Feed, says Imran Ahmed, the CEO of the Center for Countering Digital Hate, which authored the report.

“It is extraordinary that companies whose core defense is that they need to provide open spaces are actually some of the most controlling and opaque organizations on the planet,” Ahmed told Recode. “They control the communications and knowledge architecture of the world and will not provide insight into their algorithms and what they want to amplify.”

Facebook even questioned the credibility of data coming from its own tool, Crowdtangle, after New York Times journalist Kevin Roose used the analytics platform to compile daily lists of the 10 most shared Facebook links, which were often dominated by right-wing pages. Facebook disputed these findings, arguing that the Crowdtangle data shows a distorted view of what’s really popular on Facebook. Last month, Roose reported that some executives within the company were considering limiting Crowdtangle data access to journalists altogether because of the negative PR repercussions, although Facebook has said it has no plans to shut down Crowdtangle.

Nevertheless, the incident has left some researchers worried that Facebook may be limiting one of the few direct data sources they have to work with. And it’s problematic that one of the most useful tools that journalists and researchers currently have to understand misinformation on the platform can be disabled whenever Facebook wants.

When Facebook effectively shut down the NYU Ad Observatory in early August, similar concerns spread not only in the academic community but also with lawmakers and the Federal Trade Commission. To critics, Facebook’s handling of the Ad Observatory incident was just another example of the company trying to silence those attempting to hold it accountable.

“For several years now, I have called on social media platforms like Facebook to work with, and better empower, independent researchers, whose efforts consistently improve the integrity and safety of social media platforms by exposing harmful and exploitative activity,” Sen. Mark Warner (D-VA) said in a statement the day after Facebook took action against the Ad Observatory. “Instead, Facebook has seemingly done the opposite.”

The limitations of Facebook’s outside research partnerships

To its credit, Facebook grants some researchers permission to access more detailed data sets about user behavior through the Facebook Open Research and Transparency (FORT) program. The problem is, researchers say, those data sets largely haven’t been useful so far in studying posts about Covid-19.

Social Science One is one of the most ambitious academic partnership projects Facebook has participated in through FORT to date. Started by Stanford law professor Nate Persily and Harvard political science professor Gary King in 2018, the group intended to set up a system for outside academics to study internal data generated by Facebook’s 2.2 billion users, like how many times a URL has been viewed across the platform and which demographics viewed it. Establishing such a workflow was initially expected to take two months but ended up taking two years, after Facebook raised legal concerns over sharing too much user data and potentially violating people’s privacy. (Facebook ultimately applied a “differential privacy” technique to anonymize the data, which some researchers say makes it less accurate and more difficult to parse.)

Since the original data set was released in February 2020, researchers have published eight academic papers using Social Science One data, according to Facebook. They range in topics from the influence of political campaigns on Facebook in Chile to the prevalence of fake news on the platform. There are currently 22 draft academic papers using Social Science One data. Only one involves research about Covid-19 misinformation.

Although the mission of Social Science One is laudable, several researchers say it offers only a static snapshot of Facebook’s data universe, one that isn’t particularly useful for understanding the constantly evolving world of Covid-19 misinformation. And until earlier this summer, the data set only included data until July 2019, though it has since been updated to include data up to March 2021. Something as simple as “speeding up” the process by which researchers apply for and get access to updated data via Social Science One, Lazer says, would be a big improvement.

Despite Facebook’s massive computing power, running data sets like the ones used in Social Science One can take significant time: up to a month and a half of work for data covering a three-month time period, the company said. According to researchers, that lag can render Covid-19 information outdated, so Facebook needs to find a way to get this information to them more quickly.

Data transparency through regulation

Some academics believe that government intervention is the only way to get Facebook and other social media companies to share more data with researchers.

Persily, the Stanford law professor who co-founded Social Science One, resigned from the organization ahead of the 2020 elections and is now advocating for new laws to address issues between social media companies and researchers. Such legislation would force companies like Facebook to share more data with researchers and loosen the privacy laws around them doing so. This could resolve the longstanding debate between researchers and social media companies about whether companies can legally share user data without violating privacy laws.

“Unless you create some kind of legal immunity for companies sharing data, and a legal compulsion for them to share that data, you can’t win the argument, because all it looks like is risk,” Persily said. “I think that sharing data is legal, but I’m not the one paying $5 billion if I’m fined.”

Persily added that Social Science One was a substantial step forward in getting Facebook to give researchers more freedom to study its platform. He commended Facebook for taking part in it.

But ultimately, Persily said, companies like Facebook need more incentive to participate in such projects without fear of getting in trouble with regulators, who also don’t want to see Facebook repeat the Cambridge Analytica scandal. Some lawmakers, like Klobuchar and Warner, have criticized Facebook for not sharing enough data with researchers. At the same time, they have also called for these companies to do a better job protecting user privacy.

“The spread of misinformation about the coronavirus vaccine has had dire consequences,” Klobuchar said in a statement to Recode. “These are some of the biggest, richest companies in the world and it is vital that they are transparent about the misinformation on their platforms so researchers and policymakers can better assess and tackle this problem.”

For Persily and many others in the academic community, getting researchers access to better data is a key step before regulators can solve other questions.

“Whether we can answer the question about whether Facebook is killing people with Covid misinformation depends on if outsiders are able to assess how much misinformation actually exists on Facebook,” said Persily. “Data access is the linchpin for all other social media issues.”