Exporting Results

The ADRF is a FedRAMP authorized remote access environment that uses the “five safes” approach (“safe projects, safe people, safe settings, safe data, and safe outputs”) to protect confidential data. As a condition of accessing the ADRF, all users sign a Terms of Use agreement asserting that they will not attempt to re-identify or re-disclose information about any individual or entity represented in the data they encounter within the ADRF. Before being allowed to access certain datasets, users may also be required to sign additional confidentiality agreements from the agencies that provided those data.

ADRF users are not allowed to upload data into, nor download data from, the ADRF. Further, when signing the Terms of Use, users agree not to inadvertently extract data from the ADRF by other means, including, for example, taking handwritten notes of specific numbers or dates, taking screenshots or pictures, talking to unauthorized colleagues about empirical results, or working in the ADRF while in a public space.

The point of these requirements is to tightly control what goes into, and comes out of, the ADRF, so that the agencies and organizations that steward the data that the ADRF contains can more effectively protect those data.

With this in mind, users are able to extract information from the ADRF through a formal review process, which takes into account the requirements of FedRAMP, the data providers, and the ADRF security team more generally. These guidelines are meant to limit disclosure from data in the ADRF, whether alone or in combination with other datasets. Put differently, these guidelines help to limit the ability of ADRF users or the consumers of their research to identify or learn specific information about individual people or organizations using any ADRF dataset. Our Export Review process is described in detail below.

If you have any questions about this process or its guidelines, please reach out to the ADRF support team at [email protected].

Requesting an Export Review

We are currently transitioning ADRF projects into a new system, accessible through https://adrf.okta.com. Because not all ADRF projects have been transitioned, the documentation for both systems is temporarily included below.

For users who log into the ADRF at http://adrf.okta.com (“ADRF 3”)

To request an export review, please follow the instructions below:

  1. From the ADRF desktop, open Google Chrome.
  2. Click “Export Request” on the Google Chrome homepage, or navigate to export.adrf.net.
    (Note: export.adrf.net is an address that will only work within the ADRF desktop.)
  3. Click “My Requests,” or the top (Person-shaped ) icon, at the left side of the window.
  4. Click “New Item”.
    1. You will be asked to select the project to which your export relates. If you do not see the correct project listed in the dropdown list, please reach out to our support team at [email protected].
    2. After selecting a project, click Continue.
  5. Read through the entirety of the page that loads. This page, titled “Create Export Request,” will ask for you to rename and organize your files to help the reviewer, and to document the context of each file that you would like to export.
    When you have read through and followed the page’s instructions, and are ready to proceed:
    1. Move the slider at the bottom of the page to indicate that you have followed the page’s guidelines.
    2. At the bottom of the page, upload each of the files that you have prepared.
    3. Click “Submit Request…” to create the export request.
  6. You can click “Pending Reviews,” or the bottom (Page-shaped) icon, at the left side of the window, to view your current and previous export requests.

For users who log into the ADRF at https://workspace.adrf.cloud (“ADRF 2”)

To request an export review, please follow the instructions below. This process may be different than you were shown in your initial ADRF training. The process has changed recently in order to facilitate reviews as we migrate projects into a new, upgraded ADRF system.

  1. Within the ADRF desktop, create a subfolder in your home folder. Title this new folder “For-Export”.
  2. Inside this new folder:
    1. Create two new directories: one named “Input”, and one named “Output”.
    2. There is a blank copy of a form (the “Export Request Memo”) in the ADRF project’s Shared folder. Copy this form into the Input folder. Open the copy in the Input folder, and read through and complete it.
      This form will ask for you to rename and organize your files to help the reviewer, and to document the context of each file that you would like to export.
    3. When you have read through the form and are ready to proceed:
      1. Copy the underlying code and any required supplemental statistics (such as underlying counts, depending on the types of files you are requesting to export) into the Input folder.
      2. Copy the files you would like to move out of the ADRF into the Output folder.
  3. Once you have completed the above steps, send an email to [email protected] to notify the ADRF review team that you have submitted an export request.

Export Guidelines

How Review Guidelines are Set

For each review, the ADRF Export Review team follows guidelines that are set by the agency that provided the data involved in that review. Prior to moving data into the ADRF from the agency, the Export Review team suggests default guidelines to implement, based on standard statistical approaches in the U.S. government1, 2 as well as international standards3, 4, 5. The data steward from the agency supplying the data works with the team to amend these default rules in line with the agency’s requirements. If you are unsure about the review guidelines for the data you are using in the ADRF, please reach out to [email protected] before submitting an export request.

To learn more about limiting disclosure more generally, please refer to our textbook or view our videos.

General Guidelines for ADRF Users

  • Currently, the review process is highly manual: Reviewers will read your code and view your output files, which may be time-consuming. For this reason, and because each additional release adds disclosure risk and therefore limits subsequent releases, we ask that users limit the number of files they request to export to just the outputs necessary to produce a particular report or paper. If you are requesting the export of more than 10 files, there may be an additional charge. Please email [email protected] for more information about those charges.
  • The reviewers may ask for you to make changes to your code or output to meet the requirements that we have been given by the providers of the data in the ADRF. Thus, we strongly encourage you to produce all output files – tables with rounded numbers, graphs with titles, etc. – through code, rather than manually. Please reach out to [email protected] for help.
  • We ask that users only request review of “final” versions of output files, rather than “in-progress” versions; and that users not request the release of intermediate output.
  • Every code file should have a header describing the content of the file, including a summary of the data manipulation that takes place in the file (e.g., regression, table or figure creation, etc.).
  • Documenting code using comments throughout is helpful for disclosure reviews. The better the documentation, the faster the turnaround of export requests. If data files are aggregated, please provide documentation on the level of aggregation and on where in the code the aggregation takes place.
  • In order to help reviewers, who may not have seen your code before, we ask that users use meaningful variable names. For instance, if you are calculating outflows, it is better to name the variable “outflows” than to name it “var1”.

Specific Guidelines

  • When describing samples, please clearly describe the construction of the sample, the datasets used, the time period, and the sample frame. Please also clearly document the unit of analysis (individuals, regions, etc.).
  • When reporting descriptive statistics, always report the total number of observations.
  • Do not include actual numbers in code comments and logs. Similarly, do not hard-code specific numbers into code (for example, peak_unemployment_percentage = 6.7) and do not include actual, specific numbers in table titles or figures (for example, “Unemployment peaked at 6.7%”).

Tables

Cell Sizes
  • Each agency has specific disclosure review guidelines, especially with respect to the minimum allowable cell sizes for tables. Please refer to these guidelines when preparing export requests. If you are unsure of what guidelines are in place for the dataset with which you are working in the ADRF, please reach out to [email protected].
  • For individual-level data, please report the number of observations from each cell.
    The default rule applied by the team for individual-level data is to suppress cells with fewer than 10 observations, unless otherwise directed by the guidelines of the agency that provided the data.
  • If your table includes row or column totals, or if it is dependent on a preceding or subsequent table, reviewers will need to take into account complementary disclosure risks — i.e., whether the tables’ totals, or the separate tables when read together, might disclose information about individuals in the data in a way that a single, simpler table would not. Reviewers will work with you by offering guidance on implementing any necessary complementary suppression techniques.
  • If you are working with data about businesses, please report the proportion of the cell count or value accounted for by the top four (4) businesses in a cell.
    The default rule applied by the Export Review team for business data is to suppress counts or values where more than 80% is accounted for by the top four (4) businesses in the cell.
Cell Values
  • Round all reported values to the nearest sensible units (e.g., do not report earnings of $45,675 – report $45,000; do not report employment at 12,345 – report 12,000).
Weighted Data
  • If weighted results are to be exported, you must report both weighted and unweighted counts.
Ratios
  • If ratios are reported, please report the number of valid cases for both the numerator and the denominator (e.g., number of men in state X and number of women in state X, in addition to the ratio of women in state X).
Percentiles
  • Do not report exact percentiles. Instead, for example, you may calculate a “fuzzy median,” by averaging the true 45th and 55th percentiles.
Maxima and Minima
  • Suppress maximum and minimum values in general.
  • You may replace an exact maximum or minimum with a top-coded value.

Graphs and Other Figures

  • Graphs are representations of tables. Thus, for each graph (which may have, e.g., a jpg, pdf, png, or tif extension), please provide information about the source data underlying the graph, following the guidelines for tables above.
  • Because graphs and other figures take the most time to review, the number of generated graphs should be as low as possible. Please consider the possibility that you could export the underlying table instead, and generate the graph in another package. 
  • If a graph is produced from aggregated data, or from tables that have been disclosure-proofed following the guidelines above (e.g., bar charts of magnitudes), please provide the underlying tables.
  • If a graph is produced directly from unit-record data but aggregated in the visualization (e.g., frequency histograms), please provide the underlying tables.
  • If a graph is produced directly from unit-record data and displays unit-record values (e.g., scatterplots, plots of residuals), the graph can be released only after ensuring that individuals cannot be reidentified and that values can only be estimated with a high level of uncertainty.
    • Further processing to meet this requirement can include, but is not restricted to, cutting off the tails of a distribution, removing outliers, jittering the actual values, and removing or modifying axis values.
  • If a graph is produced from the results of modeling or derivation and uses the unit-record data (e.g., regression curves), the graph can be released only if the values cannot be used to find original data values.
    • Graphs of this type are generally automatically cleared.
    • For precision/recall graphs, you will need to report the sample size used to generate your model(s).

Modeled Output

  • Output from regression or machine learning models are generally non-disclosive, as long as they are not based on small samples.
  • Only request the release of the key coefficients — suppress the coefficients of control variables.

References

[1] Confidential Information Protection and Statistical Efficiency Act of 2002:. (2002). Washington, D.C.: U.S. G.P.O.

[2] FCSM. 2005. “Report on Statistical Disclosure Limitation Methodology.” 22 (Second version, 2005). {Federal Committee on Statistical Methodology}. https://nces.ed.gov/fcsm/pdf/spwp22.pdf

[3] How to use microdata properly: Self-study material for the Users of Eurostat microdata sets. (2018). Retrieved from https://ec.europa.eu/eurostat/web/microdata/overview/self-study-material-for-microdata-users

[4] Research Data Centre of the German Federal Employment Agency at the Institute for Employment Research. (2020, December 8). Remote Data Access and On-Site Use at the FDZ of the BA at the IAB. Retrieved from http://doku.iab.de/fdz/access/Vorgaben_DAFE_EN.PDF

[5] Welpton, Richard (2019): SDC Handbook. figshare. Book. https://doi.org/10.6084/m9.figshare.9958520.v1