On Thursday, March 29, 2012, the White House Office of Science and Technology Policy (OSTP), in collaboration with several federal departments and agencies, announced the creation of a Big Data Research and Development Initiative to a packed auditorium at the American Association for the Advancement of Science.
The goals of this initiative are “to advance state-of the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data; harness these technologies to accelerate the pace of discovery in science and engineering; strengthen our national security; and transform teaching and learning; and to expand the workforce needed to develop and use Big Data technologies.”
Last year the President’s Council of Advisors on Science and Technology (PCAST) concluded that the Federal Government is under-investing in research and development related to sharing and storing large quantities of data. In response, OSTP launched a Big Data Senior Steering Group to coordinate and expand the Government’s investments in this area.
OSTP Director John Holdren began the event by emphasizing “it’s not the data per se that create value, what really matters is our ability to derive from them new insights, to recognize relationships, to make increasingly accurate predictions. Our ability, that is, to move from data, to knowledge, to action.”
Though the private sector will take the lead in developing Big Data systems, Holdren stated that the government will play a large role in supporting Big Data research and development by investing in a Big Data work force, using new Big Data approaches to make progress on key national challenges, and shaping policies on issues such as electronic privacy.
Subra Suresh, Director of the National Science Foundation (NSF), outlined the strategies being implemented at NSF used to derive knowledge from Big Data; to develop infrastructure to manage, curate, and serve data to communities; and to build education and workforce opportunities.
NSF’s Big Data interdisciplinary efforts include: a collaborative project between NSF and the National Institutes of Health (NIH) to advance big data science and engineering, funding a $10 million Expeditions in Computing project based at the University of California, Berkeley; integrating human knowledge; and computer algorithms and machines to develop a new understanding of these Big Data. NSF will also encourage research universities to develop interdisciplinary graduate programs in Big Data and will provide the first round of grants to support “EarthCube” which is a system that will allow geoscientists to access, analyze, and share information about Earth. In addition, NSF will issue a $2 million award for undergraduate training in complex data, provide $1.4 million to support a group of statisticians and biologists to study protein structures and biological pathways, and create an “Ideas Lab” forum to enhance efforts to understand teaching and learning environments.
NIH Director Francis Collins was excited to announce the need for Big Data projects in the biological sciences community. He described a new collaboration between the National Human Genome Research Institute working with the National Center for Biotechnology Information and the European Bioinformatics Institute to put the largest set of data on human genetic variation, produced by the international 1000 Genomes Project, on the Amazon Web Services Cloud. The 200 terabytes of data from this project had become so massive that user access was very challenging. Therefore having the data in the cloud, and making it freely available, has benefited the science community by granting improved access to this data.
Marcia McNutt, Director of the US Geological Survey (USGS) announced the 2012 awardees for eight grant proposals selected through its John Wesley Powell Center for Analysis and Synthesis. These projects will focus on areas of research including climate change, earthquake recurrence rates, and ecological indicators.
Zach Lemnios, Assistant Secretary of Defense for Research and Engineering at the Department of Defense (DOD) stated that the DOD will invest approximately $250 million annually, with $60 million available for new initiatives projects. He described Big Data challenges such as the capability to use the large amounts of generated data and how scientists perform computations and employ data capacity. The three areas of focus for the Department’s work on Big Data include data-to-decision projects focused on reasoning and inferences, autonomy research to develop ways to adapt to “real world” scenarios, and human-system research such as the need for new technological interfaces.
Ken Gabriel, Acting Director of the Defense Advanced Research Projects Agency (DARPA) announced that the agency is beginning the XDATA program, which will invest approximately $25 million annually for four years to develop computational techniques and software tools for analyzing large volumes of data. The goals of this project are to develop scalable algorithms for processing data and to create effective human-computer interaction tools.
William Brinkman, Director of the Department of Energy (DOE) Office of Science (SC), spoke about the need to store, analyze, and use Big Data. Brinkman described one of the roles of SC, which is to operate and maintain facilities at National Laboratories including supercomputers, x-ray light sources, advanced light sources, nanoscience and systems biology laboratories. The data at these facilities is rapidly generated and there is a need for a way to better manage this Big Data. Brinkman was pleased to announce that SC is establishing the Scalable Data Management, Analysis and Visualization (SDAV) Institute to bring together six national laboratories and seven universities to develop tools to help scientists manage and visualize data on DOE’s supercomputers.
The announcements from federal agency staff were followed by a panel discussion with industry and academic leaders. These panelists provided insight and analysis into how Big Data can be used by universities, such as Stanford and MIT, which provide large-scale online courses in order to study student learning. Other topics of discussion included how Big Data is used to detect patterns in pathology, what is the effect of Big Data on human resources at large companies, and what are the skills challenges in a Big Data workforce.