Coleridge Initiative

Show US the Data

October 20, 2021
Washington, D.C.

Keck Center of the National Academies

500 5th St NW, Washington, DC 20001


Agencies and CDOs are repeatedly asked for information about how their data are used. Congress has instructed agencies, in Title II of the Foundations of Evidence-based Policymaking Act (aka the Open Data Government Act) to provide the public with information about the usage of their data, to engage the public in prioritizing data assets and to expand the use of public data.   It is currently time-consuming and burdensome for agencies to do so.  A number of agencies (NSF, USDA and NOAA) as well as CHORUS partnered with the Coleridge Initiative to develop easy-to-use, evidence-based AI tools and approaches that government agencies can use to document and understand public use of their data for research.

This conference will demonstrate the value of existing tools and state-of-the-art approaches developed through data science competitions (a Kaggle competition), and will identify next steps for agencies and their collaboration.  It will highlight the 7 winning Kaggle challenge submissions, and bring together some of the challenge winners, scientific journal publishers, the philanthropic foundations and government agencies who supported the competition, as well as the US General Services Administration (GSA), the federal CDO Council, and the research community that uses federal data.


All times are EDT

10:00 AM - 10:10 AM

Welcome and Context

Suzette Kent – former Federal CIO

Nancy Potok – former US Chief Statistician

Federal Data Strategy

Modernizing Data Infrastructure

10:10 AM - 10:20 AM

Keynote Speech

Speaker Paul Ryan: Data, Evidence and Policy

Commission on Evidence-Based Policy-Making

Evidence Act 

10:20 AM - 10:30 AM

10:30 AM - 10:40 AM

The Winning Methods

10:40 AM - 11:00 AM

The Agency Scorecards

11:00 AM - 11:30 AM

Reactions from Stakeholders

  1. Federal Agencies
  2. Research Community
  3. Publishers
  4. Institutions

11:30 AM - 11:45 AM

Audience Discussion

11:45 AM - 12:00 PM

Next Steps


Khôi and Minh are colleagues at VNG, their areas of expertise include natural language and speech processing, as well as deploying machine learning models to real world applications. To view this model, click here.

Presentation Slides

Chung Ming Lee - Singaporean NLP Data Scientist

Transformer-Enhanced Heuristic Search

Chun Ming is a Data Scientist active in the Singaporean startup scene. He’s also worked as a Management Consultant at McKinsey & Co. and as a Software Developer in finance. Lee earned his MBA from London Business School and Bachelor’s in Computer Science from Carnegie Mellon University. To view this model, click here.

Presentation Slides

Mikhail Arkhipov

Pure Pattern Matching

Mikhail Arkhipov is from Moscow, Russia. Since 2017, he works on open-source NLP tools and performs research on Multilingual Transfer Learning and Named Entity Recognition. To view this model, click here

Presentation Slides


NOAA: Sea, Lake, and Overland Surges from Hurricanes

NSF: Survey of Earned Doctorates

USDA: ARMS Farm Financial and Crop Production Practices


In addition to the expert panelists sharing ideas, call-in participants from the stakeholder community may submit questions for panelists. Below are examples of dataset usage scorecards and a word cloud of how data has been used for research purposes.