At a July hearing, the House Science Committee discussed promising applications of machine learning techniques to scientific research and the Department of Energy’s role in supporting advanced computing infrastructure.
This summer, the House Science Committee has turned its attention to the implications of advanced computing, holding two hearings on the subject. The first focused on artificial intelligence, while the second dealt with the opportunities and challenges posed by big data and machine learning techniques.
At the second hearing, committee members explored the role the Department of Energy and its national laboratories play in developing cutting-edge computing methods and infrastructure to help advance science. Expressing optimism in future applications of machine learning to the analysis of scientific data, Energy Subcommittee Chair Randy Weber (R-TX) remarked that DOE is “uniquely equipped to fund robust fundamental research in machine learning.”
Witnesses highlighted applications of machine learning in a number of disciplines, including computer science, neuroscience, materials science, and astronomy. They also emphasized the competition in advanced computing posed by other countries.
Witnesses point to variety of applications, DOE’s role
Katherine Yelick, associate laboratory director for computing sciences as Lawrence Berkeley National Laboratory, said that machine learning “requires three things: large amounts of data, fast computers, and good algorithms,” adding “DOE has all of these.”
Bobby Kasthuri, a neuroscience professor at the University of Chicago and researcher at Argonne National Laboratory, pointed to his field as one that could benefit from DOE support, saying it suffers from a lack of tools and computing infrastructure needed to map the human brain.
“Although many fields of science have learned how to leverage the expertise and the resources available in the national labs system, neuroscientists have not,” Kasthuri remarked. “A national center for brain mapping situated within the DOE labs system could actually be a sophisticated clearinghouse to ensure that the correct physics and engineering and computer science tools are vetted and accessible for measuring brain structure and brain function.”
Contrasting the situation with other disciplines, Kasthuri suggested that access to advanced computing resources should be regarded as analogous to access to telescopes in astronomy. “If I was a young astrophysicist … no one would expect me to get money and build my own space telescope,” he said.
Yelick observed that coping with the increasingly expansive data output from the latest generation of scientific facilities also demands investment in machine learning. She pointed out that the National Science Foundation-sponsored Large Synoptic Survey Telescope, currently under construction in Chile, will produce 15 terabytes of data every night. “So you can imagine why scientists are interested in using machine learning to help them analyze that data,” she said.
Anthony Rollett, a materials science professor at Carnegie Mellon University, pointed to the smaller-scale task of identifying distinctive patterns in slices of steel that can resemble “a Jackson Pollock painting.” He said that training machine learning algorithms to identify such patterns would not eliminate the need for expert intervention but that it would allow experts to allocate their efforts more effectively.
Kasthuri also predicted machine learning will impact the scientific workforce. Given that much research currently relies on “getting relatively cheap labor to produce data and to analyze data,” he argued machine learning will give younger researchers more time to focus on other tasks. Researchers could even expand into other fields that could benefit from scientific education, “like the legal system or Congress,” he mused.
Committee eyes state of international competition
Weber and Research and Technology Subcommittee Chair Barbara Comstock (R-VA) both asked about the state of international competition in advanced computing and the importance of maintaining U.S. leadership.
Yelick recounted a recent visit to China, where she saw what were then the fastest and third-fastest supercomputers in the world. She noted the machines are used in part to “draw talent back to China from the U.S. or to keep talent.” While DOE’s Summit supercomputer at Oak Ridge National Laboratory is now the fastest in the world, Yelick emphasized that China has more computers on the authoritative list of the 500 fastest computers compiled semiannually by a group of leading computer science researchers.
Rollett also testified to the prowess of international competitors. “Having traveled abroad extensively, I can assure you that the competition is serious. Countries that we used to dismiss out of hand are publishing more than we are and securing more patents than we do,” he said.
Kasthuri pointed to Germany and China as the closest competitors in neuroscience. However, he also observed that “in the scientific world there is tension between collaboration and competition independent of whether the scientist lives in America or doesn’t live in America.”
He continued, “I think the good news is that for us, at least in neuroscience, we realized that the scale of the problem is so enormous and has so much opportunity, there’s plenty of food for everyone to eat.”
Need for federal support for scientific risk-taking emphasized
Because machine learning is now applied so widely, Rollett said federal agencies should create programs in areas “where it’s not so obvious how to apply the new tools, and to instantiate programs in communities where data, machine learning, and advanced computing are not yet prevalent.”
Rep. Bill Foster (D-IL) asked whether Congress is being too cautious in funding science and should move toward the “commercial model of move fast, take risks, and break things.” Yelick replied, “As a scientist I absolutely want to be able to take risks and I want to be able to fail,” but declined to weigh in on what Congress’ approach should be.
Rep. Elizabeth Esty (D-CT) asked panelists what they thought were the “most critical tasks” for federal research investment. Kasthuri replied that the federal government is well suited to support shared research infrastructure. Yelick emphasized the importance of support for fundamental research and argued that just as machine learning can improve science, machine learning algorithms will be strengthened by the scrutiny of scientific peer review.
Stressing the importance of scientific risk-taking and community input in setting research priorities, Rollett stated,
I think it’s important that program managers in the federal government have some discretion over what they fund and take risks. It’s also important that the agencies have effective means of getting community input, and I do not want to name names, but some agencies have far more effective mechanisms of that than others.
“We might want to follow up with that last point,” Esty replied.