Collaborators in the ADRF
The biggest economic and social issues today require using confidential data from multiple organizations to be addressed. For example, examining the impact of access to jobs and neighborhood characteristics on the earnings and employment outcomes of ex-offenders and social benefit recipients on their subsequent recidivism or retention on welfare requires data from at least four different agencies (Corrections, Human Services, Labor and Housing). The Coleridge Initiative and collaborators have developed technical and human approaches to enable community access to and use of data on human subjects to make this kind of research possible. Join our collaboration to help address pressing social problems.
- Have secure remote access to their administrative data in the ADRF
- Make use of the Data Stewardship web application that manages data workflows, including reports on access and use
- Participate in class training in data science methods
- Securely combine data with data from other agencies for approved projects
Become a Collaborator
Joining this effort will allow your agency to help lead the federal data strategy, its practical implementation and value at both the state and national levels. You will host your data through the ADRF which guarantees secure access to administrative records and related data. You will analyze your data on the ADRF to generate the evidence required to develop policies, put in place procedures, share derived data, and take other actions to enable secure data sharing. You can use the reporting tool to support credible outreach to your various constituencies.
Please contact us at [email protected]. All collaborations are based on a data sharing agreement You can find a template here. You can use this template for your internal processes and share with your legal counsel to ensure it captures all of your requirements. One of our staff will be working with you to compile the data sharing agreement until it is fully executed.
The primary ADRF contact is Chris Angelosanto, Director, Research Administration NYU Wagner – Coleridge Initiative
Address: 370 Jay Street, 12th Floor, Brooklyn NY, 11201
Email: [email protected]
You may also direct questions to Julia Lane ([email protected]), Coleridge Initiative Director.
Data transfer is handled through Secure File Transfer Protocol (SFTP). Our Data Transfer Team will generate login credentials specific to your agency. After receiving the welcome email you will be required to change the password. If you prefer to use a graphical interface to initiate and monitor file transfer you may download tools like WinSCP or Cyberduck, both free and publicly available. Otherwise you can initiate the transfer through any SFTP command line tool installed with your operating system. Once your transfer is complete, please inform our Data Transfer Team.
You can find a form that contains the metadata schema on the ADRF here. Please return the completed form when you transfer data to the ADRF. On the worksheet titled Dataset Core Description, please fill in all relevant, mandatory metadata fields in the column: Value. On the second worksheet titled Data Fields, describe the fields in your data. Information from the metadata form will be made available in the Data Explorer on the ADRF.
Your data will be encrypted for transmission to the ADRF using a unique public-private key pair for each transfer using GPG. This ensures that the data will be encrypted during transfer and rest until getting decrypted on the ADRF. In order to encrypt data, you may download the software GPG (for Mac or Windows). This software has a Graphic User Interface (GUI) which allows you to select the public key received by the Coleridge Initiative, in addition to the file you wish to encrypt. Once encrypted, the data is ready for transfer.
The data you provide will be de-identified by applying a HMAC (Hash-based Message Authentication Code) algorithm to key variables, such as first name, middle name, last name, and social security number. The ADRF HMAC uses a “salt” to create an encryption key first, that is then used to encrypt the value of the variable that needs to be hashed. The salt is created by generating 32 random hexadecimal digits which are converted to integers and then hashed using SHA256. Afterwards the encrypted value is hashed using SHA256. The hashing is one way and cannot be ‘decrypted’, however, it will always lead to the same resulting hash value for a given value. This allows joins of hashed values in two different tables (data from different agencies). The Coleridge Initiative will provide you with the hashing program, unit tests, an end-to-end test harness, and a sample data file used in the tests before hashing.
By default, you and all Coleridge staff outlined in the agreement will have access to the data. If your data is used for teaching purposes, class participants will have access to the data for the time period of the class once they’ve signed the respective agreements. If you allow researchers to access your data you will have full control over who can access your data. Your dataset will be assigned a data steward within your organization who will be the point of contact for all access requests. Access to data will only be granted after approval from the data steward.
The ADRF is FedRAMP certified and follows an extensive security protocol. You can find a short description of the data management plan here. You can request the technical FedRAMP documentation on all the security protocols in place here: https://marketplace.fedramp.gov/#/product/administrative-data-research-facility-adrf?sort=productName&productNameSearch=ADRF
General requirements for data stewardship are specified in Title III of the Evidence-Based Policymaking Act of 2018, “PART D – ACCESS TO DATA FOR EVIDENCE, § 3583. Application to access data assets for developing evidence, (a) Standard Application Process” and NIST Special Publication 800-53 Revision 4 standards document – “Security and Privacy Controls for Federal Information Systems and Organizations”. These documents state that agencies shall follow a defined process to ensure full transparency which is guided by an agency official with statutory or operational authority for specified information and responsibility for establishing the controls for its generation, collection, processing, dissemination, and disposal. The ADRF is designed to address the core data stewardship functionalities: meeting the information security requirements and operational responsibilities of data stewards, streamlining the data request and approval process, and monitoring and reporting about the usage of sensitive data. The initial step in implementing a data governance framework involves defining the owners or custodians of data assets within an agency, in a process called data stewardship. Processes and workflows are defined to formalize how the data will be stored, archived, backed up, and protected from mishaps, theft or attacks. A set of standards and procedures are developed that define how data is to be used by authorized personnel. Controls and audit procedures are put into place to ensure ongoing compliance with internal data policies and external government regulations, to guarantee that data gets used in a consistent manner across multiple applications.
The Data Stewardship module is implemented as a web-portal which can be accessed by approved users. A user submits a project proposal using the Project Request workflow; the proposal includes the datasets to be used, the project members, and other information such as start and end dates. Before a request gets submitted to data stewards, members of the project must sign and upload any required non-disclosure agreements (NDAs) for their requested datasets. The request is then routed to the designated data stewards for evaluation. If approved, ADRF staff ensure that each user has completed the required security trainings before activating the project. Once a project is active, the Data Stewardship module includes an additional workflow for Monitoring and Reporting. These monitoring tools give data providers visibility into how their data is being used. Currently the ADRF platform logs all data access so that data owners can request to see how many people, on which projects, have accessed their data over a given period of time. Please refer to the official documentation for further details here: https://adrf.readthedocs.io/en/latest/