Site: Newtown Creek
Author: Brooke Singer
Submitted: February 11, 2016
Tags: EPA, data, opengov
This is an interview I conducted with an EPA spokeswoman over email in December 2015 in preparation for a paper I wrote for the journal "Big Data & Society". I thought I would post the interview in full here because not all of it was quoted in my final piece. -- Brooke Singer
1. Does the EPA have a specific federal mandate to make public its Superfund data?
The National Oil and Hazardous Substances Pollution Contingency Plan (NCP), Superfund’s implementing regulation, does not mandate the sharing of programmatic data with the public, although EPA is required to annually update the National Priorities List. EPA accomplishes this through rulemaking actions published in the Federal Register. However, the Agency is committed to releasing data in order to promote transparency and support external parties’ independent research and analysis. This approach is also consistent with federal standards for Open Government and Open Data.
2. Is the Superfund data that is publicly reported complete? If not, what kinds of data are not reported publicly and why?
The Agency does not share sensitive data with external parties. Sensitive data include confidential business information (pursuant to 40 CFR Part 2 Subpart B), procurement sensitive information and planned enforcement/compliance activities, among others. These types of data are considered pre-decisional for planning purposes and release of the information could jeopardize Superfund site enforcement negotiations. We generally do not share planned resource expenditures or planned site activity schedules because this information is often deliberative in nature and frequently subject to change based on risk, site conditions, resource fluctuations and other factors beyond the program’s control.
3. I have heard from people who access Superfund information regularly through the EPA’s website that documents that used to be available detailing Superfund sites and the clean-ups are no longer online and as the transition to SEMS takes place it appears to them that less information is available not more. Can you comment on this?
EPA deployed a significant redesign of its entire website on October 1, 2015.
The new approach for sharing site documents has actually significantly increased the number of documents available on the website, but it will take some time for web visitors to become familiar with the new layout. This effort is still a work in progress and we continue to improve the website. Specifically, in the Superfund web area, we are now loading documents to the web from SEMS, while site-specific documents are primarily housed on the Superfund Site Progress Profile. Previously, our regional offices shared site documents in a variety of ways, so our goal with the new approach is to consolidate information and present a single source of site-specific information.
One particular example of the redesigned website’s benefits is the Administrative Records for the Web (ARWeb) project. EPA made a regulatory change in 2013 to allow for electronic sharing of site documents rather than relying solely on physical copies at Information Repositories. Approximately 1,000 administrative records collections are now available electronically, and this number increases weekly. The online availability of administrative records has greatly expanded public access to key Superfund site decision documents, while also potentially reducing the burdens and costs of maintaining physical copies of site documents at an information repository.
As we continue to upload documents, a web visitor can request a document they are having trouble finding. For site-specific information, the best approach is to contact the Remedial Project Manager identified in each site’s Site Progress Profile.
4. Is the data reported in a timely manner or as quickly as necessary to preserve the value of the data? Please explain and include EPA’s definition of the timely release of data.
EPA generally enters data into SEMS on a periodic basis as site conditions change. However, Superfund programmatic data guidance requires that, at a minimum, EPA update data on a quarterly basis. After each quarterly update, the program takes a snapshot of the data, which it makes publically available. These periodic frozen data sets can be preferable to a live data feed in many circumstances because the data can be preserved and used to analyze trends over time in a more effective manner. This approach also allows for repeat and follow-up analyses from a discrete data set rather than having to account for data that may change daily. As described in the response to Question 6, some site data such as site name, location and project personnel are shared directly from SEMS rather than a frozen data set, and these data change much less frequently.
5.Is all data reported machine processable or in a format that allows automatic processing? Please explain.
EPA provides data extract files in TXT, CSV or XLS file formats so that the data can be imported into a number of software programs for analysis. We intend to offer automated data sharing (e.g., web services) in the future.
6. Over two years ago the EPA announced SEMS and during early 2015 the EPA website stated it would launch in April 2015, yet it is still not completely online. I have been told by an EPA official that SEMS has taken longer than anticipated to launch because Superfund data is entered by hundreds of users spread across regional offices and the EPA is conducting quality assurance checks on the data. What exactly are quality assurance checks and are there other reasons for the slow down of the SEMS launch? Will there be new efficiencies built into the system to prevent future delays of releasing Superfund data?
There is an important distinction between the status of SEMS as a system and the program’s decision on when to share data from the system. SEMS became fully operational in 2014, and it serves as the Superfund program’s official record and data source system. There have been no SEMS launch delays. However, there have been some delays in sharing SEMS data extracts on the public website. The Agency takes data sharing very seriously. While our goal was to have a broad range of updated program data posted, the Agency feels there is greater risk in sharing data that have not been fully reviewed for accuracy and completeness.
In particular, the schedule of activities at more than 4,000 sites has been the most complex data to review. The data were migrated over from the legacy system, CERCLIS, but manual modifications were needed due to the migration from a custom-built system to a commercial off-the-shelf software tool. Several hundred data entry staff were trained to learn the new software, modify the data and update them to reflect the current status of site activities. Conducting this migration, quality review and updates has been a labor-intensive process and has, taken longer than expected. This process is not a reflection on SEMS’ functional status, but, rather, it is a reflection of the workload associated with a substantial system migration that affected tens of thousands of data points reflecting 35 years of detailed Superfund site history.
The Agency began releasing SEMS data to the public on October 1, 2015, with the launch of the Site Progress Profiles. Site data such as location, site status, contaminants and key personnel are fed to the Internet directly from SEMS in real time. On October 30, 2015, EPA shared the first SEMS data files, which provide complete lists of the inventory of SEMS sites, site status and locational information. EPA will update these files quarterly until web services are established.
Once the data quality and completeness review of site schedule data are completed in 2016, EPA will add these data to the currently available data sets and updated at the frequency described above.
7. One of the purported purposes of SEMS was to enhance data quality. What specifically is being enhanced? Will there be new types of data available to the public that were not previously available via CERCLIS?
The Superfund program has well-documented data quality standards contained in the Superfund Program Implementation Manual as well as Data Entry Control Plans developed by each regional office. These standards did not change with the SEMS migration; therefore, data are expected to be of the same high quality as that of CERCLIS data. SEMS’ data quality improvements are a result of the system’s integrated nature. For example, the site schedule information previously contained in CERCLIS is now connected to the site documentation previously contained in the Superfund Document Management System. As a result, there is a direct link between site activities and the supporting documentation associated with those activities.
When the program was sharing data from CERCLIS, EPA publically shared all data that the Agency did not consider internal or sensitive in nature. It is our intent to do the same with SEMS. Once we complete the quality assurance review of site schedules, the SEMS data sets will provide all the same data as those found in CERCLIS.
8. Specifically, the contaminants of concern at a site are presented as a list via CERCLIS but nothing more detailed like in which media are the contaminants present and which contaminants are most abundant or are most concerning for human health. Will this level of detail regarding contaminants of concern be available in future? Why or why not? Similarly, the Responsible Party list does not include any more details than who is on it. Will more details be made available in the new system regarding PRPs? If not, why?
With respect to media and contaminant information, EPA will release the additional data described after the Agency completes its quality assurance review. These data are connected to site activities within the system architecture, and, therefore, EPA will time the data release with the site schedule data release. With respect to responsible party information, SEMS will continue to provide data similar to CERCLIS, including the name and address of parties associated with a site, whether the party has received a special or general notice letter from EPA, as well as completed enforcement actions where they exist. Sensitive data such as planned enforcement and compliance activities that are held for internal purposes will not be shared with external parties.