Virtual Nuke Testing System Data at RiskGAO Questions Effectiveness of Disaster Recovery Plans
In a report to the House Energy and Commerce Committee made public Thursday, the Government Accountability Office questioned the effectiveness of contingency and disaster recovery plans employed by the National Nuclear Security Administration, the agency that oversees the management of the supercomputers situated at three Energy Department laboratories: Los Alamos, Sandia and Lawrence Livermore.
"Until the agency fully implements a contingency and disaster recovery planning program for its weapons laboratories, it has limited assurance that vital information can be recovered and made available to meet national security priorities and requirements," says the GAO report: National Nuclear Security Administration Needs to Improve Contingency Planning for Its Classified Supercomputing Operations.
GAO says all three labs have implemented some components of a contingency planning and disaster recovery program, but the National Nuclear Security Administration hasn't provided effective oversight to ensure that they have comprehensive and effective contingency and disaster recovery planning and testing. "Due to lack of planning and analysis by NNSA and the laboratories, the impact of a system outage is unclear," GAO says.
One lab, Los Alamos, had conducted a business impact analysis to assess the criticality of resources and acceptable outage time frames; yet, as GAO points out, the agency and all three laboratories consider the consequence associated with the loss of system availability to be low impact and do not consider the classified supercomputers to be mission critical.
Unclear Oversight Roles
Still, GAO says, shortcoming exist. For instance, all laboratories had backup processes in place and had developed contingency plans, but the plans were not comprehensive. One plan didn't address the supercomputing operations, and none of the plans had been tested. "These shortcomings existed, at least in part, because NNSA's component organizations, including the Office of the Chief Information Officer, were unclear about their roles and responsibilities for providing oversight in the laboratories' implementation of contingency and disaster recovery planning," the congressional auditors write.
GAO says the three labs have the technological capability to share supercomputing capacity, yet barriers exist that could impede recovery operations. For instance, should a disruption occur, the labs don't know the minimum supercomputing capacity needed to meet program requirements, such as simulating the effects of changes to weapons systems. The labs also haven't tested the technological capability to share the capacity on an on-demand basis for recovery operations. "Without having an understanding of capacity needs and subsequent testing," GAO says, "the laboratories have little assurance that they could effectively share capacity if needed."
Congressional auditors also says NNSA earmarked some $1.7 billion to help implement its classified supercomputing program from fiscal years 2007 through 2009, but it has not tracked costs for contingency and disaster recovery planning and is uncertain of actual funds that were spent toward these efforts.
GAO recommends, among other things, that NNSA clearly define roles and responsibilities for its component organizations in providing oversight for contingency and disaster recovery planning for the classified supercomputing environment. NNSA Associate Administrator Gerald Talbot Jr., in a written response to the audit, generally agreed with most of GAO's recommendations but didn't concur with the recommendation relating to capacity planning and cost tracking.
"Almost all classified supercomputing contingency and disaster recovery planning leverages computing resources and activities funded as part of a production simulation environment for weapons designers and engineers," Talbot says. "These expenses are integral to ASC's (advanced simulation and computing) facilities operations and user support program element and tracking them separately would not add significant value to managing contingency and disaster recovery."