| |||||||||||||
Stephen Boyer has spent most of his career helping IBM Research develop software for dealing with big data problems. He subtitled his talk, “How to Deal with Too Much Content and Not Enough Discovery”—quite suitable for any student, researcher, scholar, or casual keystroker performing a Google search of the massive amount of information available online. Mr. Boyer’s IBM colleagues developed machine-readable tools for the world’s patent literature in the 1990s and later expanded their efforts to include machine-readable chemical compounds. With the use of IBM’s Blue Gene supercomputer, billions of pages of tagged text could be analyzed in minutes. The project has since moved to pharmaceuticals. A consortium of drug companies has helped fund schemas for extensive tagging of biomedical literature so that drug chemistry can be tied to the myriad of trade names and effectively associated with the consequences of clinical drug use. The latest incarnation runs on IBM’s Watson supercomputer with the near-term goal of providing medical doctors and clinicians with manageable access to this complex literature. One intriguing question posed to Boyer concerned the legal question of IBM’s liability, if Watson’s data were used for a presumed misdiagnosis. Boyer, being an astute computer scientist, promptly referred the question to the IBM legal department. Hans Pfeiffenberger, of the Alfred Wegener Institute for Polar and Research, looked at several examples of the big data problem arising from the basic sciences. For example, the likely discovery of a Higgs boson from among the thousands of terabytes of data collected by CERN’s Large Hadron Collider has been well publicized as a massive big data problem. It is also, however, a perfect example of one of the world’s largest networks of computers performing both the analyses and archiving of the data. Pfeiffenberger offered two other examples on the same scale. In the geosciences, the worldwide network of subsurface ocean buoys are collecting and transmitting data on the ocean’s temperature and salinity; research institutions around the world collect and analyze these data. The new Beijing Genomics Institute (BGI) runs 180 gene sequencers, has its own supercomputer running in the cloud, and copublishes its own journal “Gigascience.” None of this existed a few years ago. Stefan Winkler-Nees from the German Research Foundation has discussed the explicit connection of the big data problem to the infrastructure and protocols set up by the scholarly publishing community for web-based journals. Winkler-Nees observed that the scientific community needs to address this problem starting at the front end, when an experiment or massive theoretical problem is first being planned. We need to design systems so that it is easier to manage and share data. He noted that there are few reward structures currently in place by our funding or research institutions for data management. The National Science Foundation’s (NSF) requirement for all grant applications to include data management plans is a start, but NSF managers will be the first to admit that few good models or standards are in place. Winkler-Nees called for the adoption of persistent identifiers, which can encode more value than the simple provenance of the data; the development of peer review methods for data to provide more confidence than the author’s endorsement; and essential links to the researchers and institutions providing the data. All of these protocols are well in use for scholarly publications and can form a basis for putting some order into the chaos of big data. Big data need publications linked to these massive data sets to provide the essential protocols for quality assurance, as well as established tools for discovery and archiving (the underlying metadata). He felt that publications can provide the “linking hubs” in our increasingly “digital ecosystem.” I am pleased to note that one of our Member Societies, the American Astronomical Society, has teamed up with AIP to explore some of the key aspects of linking data with publications. In early October we were notified by NSF that our partnership had been awarded a grant to explore data linking of two journals published by AAS and one by AIP. The project will first examine author attitudes toward linking data sets with publications, and then develop protocols that will be tested by volunteer authors from the candidate journals. What better way to solve a daunting problem than to test potential solutions with a series of experiments? |
|||||||||||||
![]() |
|||||||||||||
CrossMark implemented on AIP Journals to track papers' update status
The CrossMark logo is a service of CrossRef, a nonprofit corporation formed by a group of scholarly publishers in 2000, and signals to researchers that publishers are committed to maintaining the scientific accuracy and integrity of their scholarly content. Many scholarly publishers, including Elsevier, Oxford University Press, and The Royal Society, already display the CrossMark logo on some of their published content. You can learn more about CrossMark from the CrossRef website. |
|||||||||||||
![]() |
|||||||||||||
Inside Science TV experiences significant growth in TV stations
In a major development, Inside Science TV recently finalized an agreement with Gray Television, Inc., a company that owns stations affiliated with CBS, NBC, ABC, and Fox in many television markets across the United States, from Colorado Springs, CO to Tallahassee, FL. The agreement increases syndication of ISTV segments to 33 local news stations in the U.S. ISTV is continuing its efforts to add more television markets in the U.S. throughout the year and is also pursuing sales of the program in international markets. After the segments are provided to television stations, they are posted on the web, where they can be easily shared through social media outlets such as Facebook and Twitter. In addition, the National Science Foundation will soon be showcasing Inside Science TV segments on their Science 360 website as well as the Knowledge Network, an Internet video feed that they send to universities. We encourage readers to visit Inside Science TV on the InsideScience.org website and YouTube channel to sample the science video content that we are providing to the broad general public, and to spread the word about ISTV to your video-viewing friends and loved ones. |
|||||||||||||
![]() |
|||||||||||||
Students petition Congress to protect funding for science
Mandatory cuts were to take place on January 2 if Congress did not take action. Funding for civilian science programs would have been cut by 8.2% and for defense science programs by 9.4%. In the first hours of 2013, Congress officially delayed most sequestration decision making with a new deadline of March 1. (See FYI #3: No Resolution in Sight.) Science continues to need strong advocacy from the community. |
|||||||||||||
![]() | |||||||||||||
|
|||||||||||||
![]() |
|||||||||||||