Researchers Track Influenza Using Wikipedia
Researchers Track Influenza Using Wikipedia lead image
NotarYES via shutterstock
(ISNS) -- Wikipedia isn’t just a website that helps students with their homework and settles debates between friends. It can also help researchers track influenza in real time.
A new study released in April in the journal PLOS Computational Biology
Influenza-like illness is an umbrella term used for illnesses that present with symptoms like those of influenza, such as a fever. These illnesses may be caused by the influenza virus, but they can have other causes as well. The Centers for Disease Control and Prevention publish data on the prevalence of influenza-like illness based off a number of factors like hospital visits, but the data takes two weeks to come out, so it’s of little use to governments and hospitals that want to prepare for influenza outbreaks.
The researchers compared the results from their algorithm to past data from the CDC and found that it predicted the incidence of influenza-like illness in America within 1 percent of the CDC data from 2007 to 2013.
The algorithm monitored page views from 35 different Wikipedia articles, including “influenza” and “common cold.”
“We also included a few things such as ‘CDC’ and the Wikipedia main page so we could glean the background level of Wikipedia usage,” said David McIver, one of the authors of the study and a researcher at Harvard Medical School. Those terms helped make the algorithm more accurate, even during the 2009 swine flu pandemic.
Google Flu Trends
McIver’s model attempts to account for this by assessing the background usage of Wikipedia. Additionally, a recent paper in Science
Some also lobbed criticism at Google for keeping their algorithms for Google Flu Trends a trade secret. McIver and his colleague, John Brownstein, wanted their algorithm to be all open-source.
“We initially decided to go with Wikipedia because all of their data is open and free for everyone to use. We really wanted to make a model where everyone could look at the data going in and change it as they saw fit for other applications,” McIver said.
The benefits of tracking influenza-like illness in real time are huge, McIver added.
“The idea is the quicker we can get the information out, the easier it is for officials to make choices about all the resources they have to handle,” he said.
Such choices involve increasing vaccine production and distribution, increasing hospital staff, and general readiness “so we can be prepared for when the epidemic does hit,” McIver said.
The Wikipedia model is one of many such tools, but is not without its limitations. Firstly, it can only track illness at the national level because Wikipedia only provides page views by nation.
The model also assumes that one visitor will not make multiple visits to one Wikipedia article. There is also no way to be sure that someone is not visiting the article for their general education, or if they really have the flu.
Nonetheless, the model still matches past CDC data in the prevalence of influenza- like illness in the U.S.
“This is another example of these types of algorithms that are trying to glean signals from using social media,” said Jeffrey Shaman, professor of environmental health sciences at Columbia University, in New York. “There are all these ways that we might get some lines on what’s going on.”
He said he was interested to see how well the model would do to predict future flu seasons, especially compared to Google.
Shaman and his colleagues use data from past influenza seasons to try and predict future ones, using models similar to those used by weather forecasters.
“They’re not any sort of replacement for the basic surveillance that needs to be done,” he said of the Wikipedia model, Google Flu Trends, and similar tools. “I like them and they’re great tools and I use them all the time, but we still don’t have a gold standard of monitoring influenza.”
“Right now the attitude is the more the merrier so long as they’re done well,” Shaman said.
McIver echoed similar sentiments, “People need to remember that these sorts of technologies are not designed to be replacements for the traditional methods. We’re designing them to work together – we’d rather combine all the information.”
Cynthia McKelvey is a science writer based in Santa Cruz, California. She tweets at @NotesofRanvier