Data Standard

All of the data collected is transformed into a common data standard. This section introduces the standard, defines the schema, and reviews the transformation process and services.


Edit

1 Introducing Common Environmental KML (CEK)

The Cleanup Site Map Service (CSMS) aggregates data from many sources to report site information using common terms. This process relies upon use of EPA and other federal data standards (where available) that are translated into a common data standard for mapping, called Common Environmental KML (CEK). The naming inspiration came from initial application of the service to Google Earth which applied KML (e.g. Keyhole Markup Language).

The CSMS relies on the federal and state agencies to serve as a data repository, while the focus of the CSMS is to serve as a map-enabled pointer to link to richer data from the data owners (e.g., states that provided the data). Therefore, by design, the CEK is a simple schema that contains many fewer data elements than an agency's database.

Edit

2 The CEK Data Schema

Image

The collected data is currently transformed into two tables: a facility table and a dataset table. A third geometry table is planned when perimeters (e.g. polygons) of institutional or engineering controls are collected by the CSMS. The schema may be viewed as a spreadsheet Web Spreadsheet where each tab represents a separate table. The tables are as follows:

Edit

2.1 Facility Information Table

This table has one record per facility, and therefore a certain dataset may have upwards of 45,000 records. In general, the facility dataset encompasses data elements that form the facility name, location both as a physical address and in latitude and longitude, an agency's URL to a web page for facility information, institutional or engineering control information and various facility identification numbers including both a state and federal number. A few system data elements (shown in gray) augment the state data. While upwards of 25 data elements are contemplated in a table, typically 10 fields are collected from an agency.

Edit

2.2 Dataset Information Table

This table has one record per dataset. This provides the description of the dataset, the agency contacts associated with the dataset, applicable agency URLs including feedback.

Edit

2.3 Facility Multi-Geometry Information Table

This is a preliminary rendering of the schema that would be applied to introduce additional geometric features. The table anticipates that one facility may have multiple geographic features. For example, it is common for a facility to incorporate multiple institutional or engineering controls to protect an environmental remedy.

Geographic Classes:
Click on the polygon to display information:
Edit

2.4 Next Steps in Schema Development

Understanding of user scenarios will guide the development of this schema. Structuring Activity and Use Limitations within Institutional Controls will guide local government, while structuring Chemicals of Concern will aide use of the service by contractors developing health and safety plans.



Edit

3 Data Transformation and Processing

While the core facility data is not modified, the data is transformed as it is entered into the CEK. An incoming dataset might be in various formats including ESRI shapefile, web pages, Microsoft Excel or Access. These incoming datasets are imported into a common MSSQL format. Once imported, the data is mapped to the CEK schema, the simplified red, yellow and green status codes are generated, and when necessary geocoding is provided.

Edit

3.1 Mapping to the CEK Standard

Edit

3.2 Formation of Status Codes

Edit

3.3 Geocoding When Latitude and Longitude is Missing