Science Data May Soon Vanish From Government Websites.

Amid the torrent of executive orders signed by President Trump were directives that affect the language on government web pages and the public’s access to government data touching on climate change, the environment, energy and public health.
In the past two months, hundreds of terabytes of digital resources analyzing data have been taken off government websites, and more are feared to be at risk of deletion. While in many cases the underlying data still exists, the tools that make it possible for the public and researchers to use that data have been removed.
But now, hundreds of volunteers are working to collect and download as much government data as possible and to recreate the digital tools that allow the public to access that information.
So far, volunteers working on a project called Public Environmental Data Partners have retrieved more than 100 data sets that were removed from government sites, and they have a growing list of 300 more they hope to preserve.
It echoes efforts that began in 2017, during Mr. Trump’s first term, when volunteers downloaded as much climate, environmental, energy and public health data as possible because they feared its fate under a president who has called climate change a hoax.
Little federal information disappeared then. But this time is different. And so, too, is the response.
“We should not be in this position where the Trump administration can literally take down every government website if it wants to,” said Gretchen Gehrke, an environmental scientist who helped found the Environmental Data and Governance Initiative in 2017 to conserve federal data. “We’re not prepared for having resilient public information in the digital age and we need to be.”
While a lot of data generated by agencies, like climate measurements collected by the National Oceanic and Atmospheric Administration, is required by Congress, the digital tools that allow the public to view that data are not.
“This is a campaign to remove public access,” said Jessie Mahr, the director of technology at the Environmental Policy Innovation Center, a member group of the data partnership. “And at the end of the day, American taxpayers paid for these tools.”
The Public Environmental Data Partners coalition has received frequent requests for two data tools: the Climate and Economic Justice Screening Tool, or CEJST, and the Environmental Justice Screening Tool, or EJScreen.
The first was developed under a Biden administration initiative to make sure that 40 percent of federal climate and infrastructure investments to go to disadvantaged communities. It was taken offline in January. EJScreen, developed under the Obama administration and once available through the E.P.A, was removed in early February.
“The very first thing across the executive branch was to remove references to equity and environmental justice and to remove equity tools from all agencies,” Dr. Gehrke said. “It really impairs the public’s ability to demonstrate structural racism and its disproportionate impacts on communities of color.”
Just a dozen years ago, the E.P.A. defined environmental justice as “the fair treatment and meaningful involvement of all people regardless of race, color, national origin, or income.” The E.P.A.’s new administrator, Lee Zeldin, recently equated environmental justice to “forced discrimination.”
Nonprofit organizations used both screening tools to apply for federal grants related to environmental justice and climate change. But the E.P.A. closed all of its environmental justice offices last week, ending three decades of work to mitigate the effects on poor and minority communities often disproportionately burdened by industrial pollution. It also canceled hundreds of grants already promised to nonprofit groups trying to improve conditions in those communities.
“You can’t possibly solve a problem until you can articulate it, so it was an important source of data for articulating the problem,” said Harriet Festing, executive director of the nonprofit group Anthropocene Alliance.
Christina Gosnell, co-founder and president of Catalyst Cooperative, a member of the environmental data cooperative, said her main concern was not that the data won’t be archived before it disappears, but that it won’t be updated.
Preserving the current data sets is the first step, but they could become irrelevant if data collection stops, she said.
More than 100 tribal nations, cities, and nonprofits used CEJST to show where and why their communities needed trees, which can reduce urban heat, and then applied for funds from the Arbor Day Foundation, a nonprofit organization that received a $75 million grant from the Inflation Reduction Action. The Arbor Day Foundation was on track to plant over a quarter of a million new trees before its grant was terminated in February.
How hard it is to reproduce complex tools depends on how the data was created and maintained. CEJST was “open source,” meaning the raw data and information that backed it up were already publicly accessible for coders and researchers. It was put back together by three people within 24 hours, according to Ms. Mahr.
But EJScreen was not an open source tool, and recreating it was more complicated.
“We put a lot of pressure on the last weeks of the Biden administration to make EJScreen open source, so they released as much code and documentation as they could,” Dr. Gehrke said.
It took at least seven people more than three weeks to make a version of EJScreen that was close to its original functionality, and Ms. Mahr said they’re still tinkering with it. It’s akin to recreating a recipe with an ingredient list but no assembly instructions. Software engineers have to try and remember how the “dish” tasted last time, and then use trial and error to reassemble it from memory.
Now, the coalition is working to conserve even more complicated data sets, like climate data from NOAA, which hosts many petabytes — think a thousand terabytes, or more than a million gigabytes — of weather observations and climate models in its archives.
“People may not understand just how much data that is,” Dr. Gehrke said in an email. It could cost hundreds of thousands of dollars per month just in storage fees, she said, without including the cost of any sort of access. She said they were talking to NOAA personnel to prioritize the most vulnerable and highest impact data to preserve as soon as possible.
So far, the data they’ve collected is largely stored in the cloud and backed up using servers around the globe; they’ve worked out pro bono agreements to avoid having to pay to back it up.
Some data have, so far, been left alone, like statistics from the Energy Information Administration, among other agencies. Zane Selvans, a fellow co-founder of Catalyst Cooperative said the group had worked for the past eight years to aggregate U.S. energy system data and research in the form of open source tools. The goal is to increase access to federal data that is technically available but not necessarily easy to use.
“So far we’ve been lucky,” Mr. Selvans said. “Folks working on environmental justice haven’t been as lucky.”