Subscribe to our LinkedIn so you don't miss important media news and analysis
Longtime Wikipedians joke that Wikipedia can exist only in practice, not in theory.
In theory, it’s unclear why many thousands of volunteers would devote their unpaid time to build out an encyclopedia. In practice, however, Wikipedia is one of the most widely read web resources in the world and the largest encyclopedia in human history, with almost 60 million articles in 300+ languages.
Thanks to Wikipedia’s reach and impact on the one hand and its openness to new contributors on the other – anyone can modify most articles, and new contributors aren’t even required to disclose their real name – hypothetically Wikipedia looks quite vulnerable to manipulation.
At the same time, Wikipedia has largely been able to fend off large-scale manipulation, thanks to relying on an army of volunteer editors and administrators, as well as on machine learning tools that help spot malicious changes. Reliance on crowdsourcing for ensuring the quality of content seems to work; Wikipedia – or at least its English-language edition – has been getting good press for its high-quality coverage of the COVID-19 pandemic and other topics.
Smaller language editions sometimes are more problematic, perhaps most notably in the case of Croatian Wikipedia, which for many years had been controlled by a group of volunteers promoting far-right views, failing to adhere to Wikipedia’s neutrality goals. Still, such failures have been exceptions rather than the rule.
However, with new world challenges like the rise of Chinese authoritarianism and Russia’s all-out invasion of Ukraine, Wikipedia might become an increasingly coveted target of information warfare by malicious actors. Recent research report by the UK-based Institute for Strategic Dialogue (IDS) and the Centre for the Analysis of Social Media (CASM) looks at information warfare on Wikipedia, including a case study of an English Wikipedia entry on the Russo-Ukrainian war.
The research hasn’t uncovered proof of a coordinated disinformation campaign on Wikipedia, but it provides a useful framework for thinking about protecting one of the most valuable treasures of the online age. The Fix spoke with CASM’s research director Carl Miller, the report’s lead author, about their key practical findings and about broader theoretical risks Wikipedia might face from autocratic regimes. (The interview has been edited for length and clarity).
Editor’s note: Anton Protsiuk is a long-time Wikipedia volunteer editor. He was involved in preparing the report as one of the authors’ interviewees.
Topics:
why Wikipedia is such an important target of information warfare | background to the report | the report’s key findings | what protection mechanisms Wikipedia needs | theoretical risks for manipulating Wikipedia coverage | delicate manipulation vs. brute-force repression | the risks to non-English Wikipedia editions | how journalists should think about Wikipedia and information warfare
The Fix: In the report you assert that Wikipedia is “arguably the most epistemically consequential of all the possible targets of information warfare.” Why is Wikipedia such an important battleground?
Carl Miller: Wikipedia fits into the information ecosystem in a completely different way to any of the social media platforms. It’s relied on by the other social media platforms for their own fact checking and their own way of navigating around falsehoods.
Its visibility is massive. It’s deliberately up ranked by Google and many of the other search engines. We all know the enormous, astonishing viewership that Wikipedia gets. [Editor’s note – according to the Wikistats tool, Wikipedia and its sister projects collectively received over 220 billion pageviews in 2021; out of that, the English-language edition of Wikipedia accounts for 95 billion views].
It has become increasingly an important way for people to learn about and keep track of live events as they develop, which I think is an evolving role for Wikipedia, but a very important one.
But most importantly, it’s trusted on being an impartial source of truth. And that is completely different from the kind of information that anyone would encounter on Twitter or Facebook or any of the other social media. Wikipedia is often resorted to by people to resolve disputes and arguments that they’re having on the other social media platforms.
And of course, it’s also the greatest achievement of the digital age so far, the largest collection of human knowledge ever assembled, all in one place, available to everyone for free. It is probably the greatest thing that the internet has ever created. So it’s worth protecting.
The Fix: In your 2018 article for The New Statesman, you laid out a brief outline for how Russia, or another state for that matter, could manipulate Wikipedia. Over the four years that passed, do you think that the Russian government has taken steps in this direction? How so? Or why not?
Carl Miller: [Let’s talk through the timeline].
In the 2018 article, it was purely me speculating at that point. It basically said, if I were an autocratic state, my line of attack would be to be entryism, not vandalism. I was worried… that [potential covert state interference as opposed to casual vandalism] seemed to be a bit of a blind spot for the Wikipedia community.
You start thinking about an information operation against Wikipedia that would operate at a scale that states would be capable of doing it at. The idea that you could just simply pay Wikipedia editors to make completely legitimate edits on completely different topics… Over time, they would gain community respect, recognition, run for office, assume office. And that would give the state a strategic capability of then not only changing the actual pages, but also the policies and processes and community composition…
That seemed to me to be really dangerous. And I couldn’t see back in 2018 a real way that Wikipedia would have of knowing that was happening or a way of stopping it if it did.
The next step actually wasn’t Russia, it was China. In 2019, I did an investigation for BBC Click where we looked at systematic edits on various kinds of politically sensitive political pages in China, some to do with the Hong Kong protests, some to do with Taiwan and others.
Then, we also basically made contact with a series of sources that passed us documents, one written by a Chinese party official and another written by a Beijing linked academic that laid out literally a strategy for systematically manipulating Wikipedia to make it more favourable to China. It was a strategy which was absolutely the one that I was warning about in 2018, which was that “we should be funding an active editor base that would then have enough recognition to go out and right the wrongs, all the lies that were on Wikipedia about the PRC”.
That brings us to now, so we can go through the findings of the [2022] paper. I wouldn’t say that we found or would have been able to have found a really direct connection between the Russian state and systematic editing on the pages that we looked at.
But I think the early steps are that we obviously have seen quite a high amount of activity coming from editors who were subsequently removed by Wikipedians as a result of policies which imply that they were engaged in ideologically driven or commercially driven editing… So we can’t link it to the state. But to be honest, we almost can never link [online activity to the state officially].
That Chinese example is literally probably the only real investigation I’ve done in my whole career where I was able to really quite directly link that back to the state. Normally all you can say is “pro-Kremlin”. It seems systematic and it’s occurring on topics which the Russian government cares about. And that’s probably about as good as we can get to.
The Fix: Could you take us through the report’s approach and key findings?
Carl Miller: We begin with a single page, the English-language page for the Russo-Ukrainian War. Our research looks at it from its beginning up to March 1, 2022.
On that page since 2014, when it was first [created], there have been just under 8000 edits by over 1700 users. 86 of them – so only a very small number – have subsequently been banned by the Wikipedia community for a rule violation that broadly fits within the rules which would have been broken by an ideologically motivated editor.
The case study focuses on those and basically asks – can we use some data science techniques to answer useful questions around what these editors have been up to and how they might link with everyone else?
Point number one – these editors have made quite a lot of edits [across all Wikipedia articles]. In total, if you look at all the edits that these 86 blocked editors have done, it’s 794,000 revisions across over 300,000 pages, so very heavy editing behaviour. And we then drew a map of the pages that these banned editors have edited in common with each other.
As a result, we identified the pages that have been most sustainably targeted by either one editor a lot or a number of editors in a collective way… We found three clusters of pages edited by a number of these banned editors – around airports and commercial aviation; around Judaism and Poland; and around Iraq, Libya and the so-called Islamic state. So, our finding number one is there’s tons of edits coming from these 86 editors, and there’s a substantial degree of overlap in the pages that they’re targeting, although there’s also plenty of them just targeting individual pages as well.
Technique number two is state-affiliated sourcing. So, we count 681 edits on the Russo-Ukrainian War, 8% of the total number – that’s the edits the 86 blocked editors made. They’ve added almost 350 links across these edits.
We filtered all of those links for ones which are linked to autocratic state-sponsored or affiliated news or information channels of various kinds. Overall, 22 edits added such sources.
Of course, you can do outbound links for a number of reasons, some of which will be perfectly legitimate. So, the next stage in the research was to analyse the actual edits themselves. For this report, we did this manually, but for the future we’d like to automate this analysis and do it at scale.
Broadly, we drew out some themes around these edits – casting doubt on objectivity; creating pro-Kremlin historical narratives; trying to add in alternative reporting, which has generally come from sources that support the Kremlin’s descriptions of ongoing situations; adding in Kremlin’s quotes and press releases to make sure their voices being represented.
Lastly – and I think this is one of the most promising avenues that we want to go down more as well [in future research] – is that we looked at patterns where editors consistently add the same links to the same pages. This is where a link might have been added by one editor, then taken out by someone else, and then added by another editor – like a revision war.
When you create a map of that, you can see that this technique can be used to discover new editors that haven’t yet been banned, but have very consistent patterns in terms of the way that they’re editing links into pages. Often it might be like two banned editors have added one link, and then there’s an anonymous edit that added the same link in. This might help discover additional bad actors [not discovered yet by the Wikipedia community].
There was a final piece of work where we looked at [other] pages that were being targeted by these 86 accounts in terms of adding state-affiliated domains. Weirdly, the top one [the Wikipedia article which saw most state-affiliated links added by the cluster of 86 banned editors] is the list of traffic collisions… Afterwards, it’s a lot of pages about the clashes in North Caucasus in 2010 and 2012, the Crimean Crisis, the war in Donbass, Russo-Ukrainian war, and so on.
So, this work isn’t complete enough and wasn’t ambitious enough for me to say we have smoking-gun evidence of a conspiracy by the Russian government to systematically manipulate Wikipedia. But what it shows is that starting from a pretty narrow base [one Wikipedia article in one language edition]… we have a number of editors that have already been banned… doing a substantial number of edits on overlapping pages, some of which are introducing state-affiliated domains and links, and many of which are trying to push the pro-Kremlin narrative in different ways.
This is a starting point now for us to scale those techniques up and to see how we can increase the detection rate, how we can automate various parts of it, how we can get this into the hands of, if useful, Wikipedia editors, and to see what we can do to actually use it, if it will be useful, to defend information spaces related to Wikipedia.
The Fix: You mentioned that you were worried in 2018 about the potential for manipulating Wikipedia. Based not only on this specific report but also on your general observations, have there been any improvements in the mechanisms Wikipedia has in protecting itself from manipulation over the past four years? And do you have specific suggestions or specific thoughts on what should be done further to protect Wikipedia from this kind of manipulation?
Carl Miller: [My team and I are not part of the Wikipedia community, and although we’ve done a lot of research], I’m not sure whether we’re fully across all of [the evolution in defence mechanisms since 2018]. Also, I know there’s probably bits of it that aren’t easily spoken about or shouldn’t be published. So, my response is definitely couched in recognising that I might not really know everything that happens.
That said, I don’t know whether I have seen either a capability or a body of policy which would allow Wikipedia to respond to [manipulation] at scale. The Wikipedia community has sockpuppet investigations which are well and good, and which can uncover, both technically and behaviourally, the use of fictitious accounts in order to do editing. [Editor’s note – sockpuppet investigations are technical investigations typically conducted by volunteer Wikipedia functionaries that uncover cases of one person operating multiple Wikipedia accounts, which in many cases violates the website’s rules].
Firstly, I don’t know whether that would truly uncover the kind of entryism that I am worried about. I think that you could easily imagine an attack on Wikipedia that wouldn’t require the use of fictitious accounts.
Secondly, what I understand to be the ways that are used to technically understand this, especially the use of IPs, are trivially easy for states to spoof. States spoof IP ranges all the time – this is how this is how any social media platform is attacked. Unless there is another internal secret, potent technical capability, I don’t know whether Wikipedia has it.
What I worry about is two things. First, I worry that Wikipedia’s volunteer editors will become exhausted, disenfranchised, feel lonely and isolated, and ultimately give up. Because they will very much be in the target line for any state[-backed manipulation] attempt. This won’t just be about sponsoring state-friendly editors, it will also be about targeting, harassing, threatening, and ultimately clearing out of Wikipedia, all the editors who, on a purely voluntary basis, stand in their way.
Using Taiwan as a case in point, what I see there is a very committed, heroic team of people who are trying to defend that information space, but I don’t think they feel particularly supported by Wikimedia, frankly, and I think they feel like ground down and just very tired from doing this on a daily basis, on a weekly basis. I worry that harassment and threats and denial of the space to genuine editors could be really damaging over the long term, and could be very successful.
The other capability… when I go to the tech giants and I talk to their health teams and platform integrity teams, each of them has spun up a team of dedicated data scientists doing very dynamic detection and mitigation work. This is a constantly moving battlefield, and the ways in which bad actors are detected constantly change. But then the people on the other side of the equation learn and develop, and they try and work out how they were detected, and they develop more camouflage techniques and their tradecraft develops. Subsequently, the defensive teams in the tech giants also responsively evolve their own tradecraft and their own technologies. This happens dynamically every day. These data scientists are spinning up new heuristics, doing more research, more measurement every single day as their models change.
In a way, we need to find a way of replicating this for Wikipedia, which celebrates all the things that Wikipedia is so great at. Wikipedia shouldn’t become more like a tech giant; we don’t want Wikipedia to become closed, we know that it’s fueled by volunteers, and it has to remain porous and open for people to join. But in a way we do have to try and learn from the tech giants – look at how they are doing platform defence and see what we can do to build an open source community based version of that.
Because in the long term the tradecraft that can be used to attack Wikipedia is only going to get better, and whatever sockpuppet investigations will reveal, that’s always going to be a learning point for the people on the other side. There has to be a kind of engine room of dynamic evolution on the defensive side, on the Wikipedia side as well.
[Editor’s note: over the past couple of years the Wikimedia Foundation, the nonprofit that runs Wikipedia, announced a number of initiatives designed to combat disinformation at scale, notably ahead of the 2020 election in the United States. Read more in a recent overview by researcher Tilman Bayer, who analyses the “Information Warfare and Wikipedia” report]
The Fix: Could you give an example of specific tools or specific ways that can potentially be used to subtly manipulate Wikipedia?
Carl Miller: I think the means to actually manipulate Wikipedia wouldn’t necessarily have to be technically aided at all…
On a purely manual level, the attacks would be around sourcing, to try and draw equivalence between Western and autocratic media sources. It would either be to try and narrow down the number of free media sources which would be allowable or widen the autocratic or the state media sources would be allowable.
Largely when we’ve looked at this with China, so much of this is about subtle moulding of language to make sometimes quite subtle implications or connotations about a certain topic or theme. So it can be pruning of language as much as it’s just simply labelling Taiwan a breakaway renegade state or province.
But if you want to talk about technologies, one that I’m very worried about is virtual agents and chatbots.
A lot of my colleagues work in the field of natural language processing, which is essentially a field which is about teaching computers how to understand and, in terms of chatbots, respond to human language. Some of these are getting extremely good. I think we’re getting close to passing maturing tests now when it comes to training and machine intelligence, to be able to talk to you in a way which is indistinguishable from a human being.
Now, if I wanted to, say, exhaust a particular community defending a particular page, I could start to build automated or semi automated processes to drag as many members of that community as I could into a whole series of bruising, nasty, exhausting, draining discussions in an automated way. You could build an elaborate network of accounts which would be challenging people on their edits whenever they make an edit, drag them into a discussion, go back and forth in a useless way. All of that is simply to drain their energies and distract them from the other things that they should be doing, the other edits that they should be making.
Two other kinds of technologies that would evolve from an attacker’s point of view… One would be around the mass analysis of Wikipedia pages to identify language which they want to change. In a way, that would be mimicking the defensive stuff we’ve done here. Say, well, where are the sources that I want to really reduce the prominence of on Wikipedia? Where are the specific claims on which pages that I want to try and deal with en masse? And then you can gather up all those pages and make edits on all of those pages together to begin to scale up the editing footprint that you want to make.
Lastly, camouflage – I don’t know the extent to which attackers do this, this is purely speculative – but if I were a really sophisticated information operative working for an autocratic government, I would be trying to detect myself. I would be running a red team detection effort on myself, doing as good a job as I could to mimic what I think Wikimedia would be doing or Wikipedia communities would be doing – and then learning ways to obfuscate and inject randomness into those patterns to make them less detectable.
That might be super simple stuff. Like just a rule to say, okay, every third page I won’t edit something on the Ukraine war. I’ll edit something on plants or botany to build up a more realistic legend. It might be more sophisticated stuff as well where you actually automate that too. For example, you might automate some accounts you’re wanting to camouflage to make a series of legitimate edits, small-scale edits on completely unrelated topics.
This is all very speculative; I’m not sure that states are necessarily there yet, to be honest. I think these are all questions for the future. I’m not seeing any evidence that states would be using any of these techniques [now], but those are just some top of mind things about what I would do if I was thinking through how to subvert Wikipedia.
The Fix: Building on what you’ve just said that states are not there yet – basically what we see is that just plain old-fashioned repression is easier for authoritarian states. We’ve seen that at least two Wikipedia editors were detained in Belarus, and several others have seen their personal details linked to pro-government Telegram channels. Although we don’t have evidence of specific manipulation efforts on Wikipedia content, we do have a lot of evidence of targeting Wikipedia volunteers in authoritarian states.
That leaves me thinking about this dichotomy between delicate manipulation, which is maybe beyond the reach of governments, and just brute-force shutdowns and repression. Might it be the case that just blocking Wikipedia and cracking down on its volunteers is easier and eventually enough for authoritarian regimes? Or do you think the Russian government and the likes of it will be interested in developing a more sophisticated approach to manipulating Wikipedia?
Carl Miller: There’s definitely truth in what you say. Autocratic states sometimes act in far more blunt and, frankly, stupid ways than we often think they will.
The Russia case is so interesting in the way that we all thought they were masters in a postmodern dance of disinformation, but actually they pirouetted into a retro Soviet style of information control extremely quickly.
Especially in a context where there’s a huge brain drain going on in Russia, are they necessarily deploying their increasingly scarce machine learning talent on this question yet? Probably not. They’ve probably got other things to worry about, especially seeing as they can at least exert domestic control across their own population’s interactions with Wikipedia without having to worry about any of that.
But I would say that states tend to think in both short-term and long-term ways. And I’ve rarely believed that a state won’t attempt to pursue longer-term capability development for aspects of power, which it thinks are interesting or important, willingly.
I think it’s just as conceivable that there are people within the Russian government right now that are basically wielding very blunt instruments to control and shut down and threaten, harass Wikipedians. But on the other hand, there will also be people thinking about what this looks like in 10 or 20 years, and would be interested, of course, as much in manipulating Wikipedia for its capacity to reach everyone else around the world as simply Russians. I mean, Russia really does care – it wouldn’t be doing influence operations if it didn’t – about how the rest of the world views the invasion, how they view Russia and its regime; especially the Global South – Africa and India.
They do care about that, and they can see the public around the world that they care about using Wikipedia. So I don’t think this issue is going away. And I don’t think that Russia in the short term resorting to harassment and blocking techniques will mean that the threat in any other sense has gone away.
The Fix: You just mentioned the Global South, and you’ve recently done a lot of work [outside of Wikipedia] that shows that actually Russia’s disinformation campaigns are much more successful in the Global South than in the West – Russia is winning the information war on another battlefield, so to speak. In this connection, do you think that Russia and China and other authoritarian regimes, are more likely to succeed in translating that in Wikipedia coverage targeting the Global South, targeting non-English language editions of Wikipedia, maybe non-major European languages of Wikipedia?
Carl Miller: That to me is definitely a worry.
I think there are two different things happening here. On the one hand, people that we interviewed for this report definitely pointed to non-English language Wikipedias as being especially vulnerable to kind of community capture. In some cases, that has happened – not necessarily directly sponsored by state, but particularly ideologically motivated communities have essentially captured smaller language Wikipedias and cleared off opposing or contrarian information to stamp their view on that Wikipedia [language edition]. So on the one hand, my understanding is that if you step outside of English-language Wikipedia, it can look much worse.
On the other hand, in terms of attention capture, you have to see English-language Wikipedia as being the juiciest target just because of the sheer size of it and the sheer number of people that see it… The influence operations that we saw [outside of Wikipedia] targeting the Global South, quite a lot that was in English as well. Whether it was in India or South Africa, quite a lot of the key countries at the heart of this speak English. For that reason as well, you’ve got to think that English-language Wikipedia is going to be in the crosshairs.
The Fix: The Fix is an outlet for journalists and media managers, particularly in Europe. Would you have any advice for journalists, for reporters, on how to think about this information and coordinated manipulation on Wikipedia, how not to fall prey to manipulation when using Wikipedia? How should they think about it?
Carl Miller: Firstly, write more about possible manipulation on Wikipedia, it’s important. I think Wikipedia is massively overlooked as a topic for mainstream journalism. I think it’s crazy that there isn’t more mainstream journalism and more mainstream journalists looking at Wikipedia and the things that happen there. It’s weirdly under the radar.
I work as a journalist in this area as well. And I would say that we, the journalists that cover how information and online influence works, we’ve massively over-focused on Facebook and Twitter. There’s all these other information sources out there.
Partly the whole point of this work was to make that point and to say – what about Quora? What about GitHub? What about small niche forums that directly access the specific audience you might want to reach? There’s so many information sources out there that we need to try and cover. We can’t just think that Facebook and Twitter are the places where information manipulation is happening.
Of course, in terms of actually using Wikipedia as a source, journalists shouldn’t be [doing that]. Wikipedia isn’t an original primary source and therefore journalists should only be tracking it back to the sources which it gathers together. I understand that journalists are busy, but that’s always the way that journalists have needed to use it.
One of the problems more generally is the way that journalists will write something wrong, which they see on Wikipedia in their sources, which then can get linked back to Wikipedia as a way of substantiating that source. That’s a cyclical loop that can be quite unhelpful.
But broadly speaking, the fact that any journalist is reading this article, is a good sign because simply to be interested in this topic and in Wikipedia, is the first step – because we really need more coverage on information spaces as precious as this one.
Everything you need to know about European media market every week in your inbox
We are using cookies to give you the best experience on our website.
You can find out more about which cookies we are using or switch them off in settings.