Five Safes Framework
The ADRF contains only agency approved projects that have been proposed and agreed upon by project and dataset stewards. Approved projects require signed agreements and only approved members can access the project workspace within the ADRF. Project workspaces are isolated from one another with controlled access to resources through individual and group memberships. This ensures there is no shared environment between projects and resources. All standard tools that are pre-installed as part of the Microsoft Windows 10 operating system are available. In addition, we have installed the following: Chrome Web Browser; DBeaver – Community Edition Database IDE; PyCharm; Python 3.x; LibreOffice; JupyterLab; R & RStudio; and, Stata. We regularly add new libraries and packages as requested. Other software solutions likely to be available soon include a code collaboration repository.
The ADRF is designed to provide secure methods of data transfer for agency micro-data, specifically data that includes Personally Identifiable Information (PII) in the dataset. Only agency identified and authorized personnel are invited to perform data transfers. The transfer of data into the ADRF uses the FedRAMP Authorized FIPS 140-2 validated Kiteworks Secure Environment. It is restricted to upload operations only. Files do not need to be encrypted or password protected in advance of initiating the transfer. Additional security protocols include regular system and application vulnerability scanning and third-party penetration testing.
Data Hashing Application
In collaboration with our data curators and technology partners, we’ve developed a stand-alone Windows based application to help simplify and facilitate the hashing of data prior to transmission to the ADRF. The application can be downloaded directly to the operator’s desktop and has no dependencies on external resources. It guides the user through the identification of the source file (with un-hashed data), selection of fields to hash, selection of basic data validation and identification of the target file to create (with hashed data). The default ADRF “salt” may be used or a custom salt can also be provided by the user.
Data Stewardship Application
The Data Stewardship web-based application is positioned primarily as the management and monitoring console for project and data stewards. It provides detailed insight on project configurations, user activity, user onboarding status, and overall cost of a project on the ADRF. We focus on four primary pillars of information a project/data steward most often focuses on:
- People – Who are the members of projects, how often do they use the ADRF, What exports have they requested and their status, estimated cost per person/project for current month and for the project since inception, detailed usage metrics.
- Projects – Details of project start/end dates, abstract description, number of members onboarded and pending, resources the project has access to (i.e. datasets, etc).
- Datasets – Description of the data set, location on the ADRF (database or file system), size, name of the data steward(s), link to Enterprise Data Catalog (Informatica) describing the dataset and metadata.
- Agreements – What agreements are related to these projects, indication of each member’s signing status, members pending signature, term (dates) covered by the agreement(s).
The ADRF prevents users from unauthorized removal of any information within the secure environment. Researchers seeking to export their work—summary data, analysis output, supporting code, etc.—must do so through the export module within the ADRF. The export module allows researchers to verify that they are not requesting intermediate output and to provide the documentation needed for Coleridge staff and external data stewards to conduct a thorough disclosure review of the requested materials. Once an export request is initiated, Coleridge staff perform an initial review in accordance with the agency guidelines for each data set used to generate output. During the review, staff ensure that proper cell suppression has been applied, there are no complementary disclosures, rounding and noise have been applied where appropriate, and there are no references to specific observations or counts that would be disclosive. After the export request has passed internal review, it is then given a final review by the appropriate external data stewards before being released to the researcher. Coleridge staff maintain a log of export requests for auditing purposes and to evaluate subsequent requests by the same researcher for complementary disclosure.