Rich Context Workshop

SUMMARY

The focus of the Rich Context workshop is to build a scientific basis for the empirical foundations of data science in government. Empirical research relies critically on knowing how data has been produced and used before: the required elements include what does the data measure, what research has been done by which researchers, with what code, and with what results. The interest of funders in supporting data science, combined with the recent passage of the Evidence-Based Policymaking Act and the launch of the federal data strategy, make this an opportune time for such a workshop.

Logistics

When:  November 15-16, 2019
Where: The National Press Club
              529 14th St. NW, 13th Floor
              Washington, DC 20045
Hotel: We have a room block at the Sofitel Washington DC Lafayette Square located at 806 15th St NW, Washington, DC 20005. The room rates are $259 per night plus tax. Please reach out to Sofitel directly at (202) 730-8800 no later than Tuesday, October 15. Please ask for group reservations and identify yourself as a member of the Rich Context Workshop. A first night room deposit will be required during the time of reservation.

Outcomes

We expect the outcomes of the workshop to be:
1. A roadmap that will identify the opportunities, gaps, and necessary investments.
2. The development of an interdisciplinary community of computer scientists, life scientists, and social scientists who can work together to address the problems.
3. The engagement of key stakeholders, notably funding agencies, and government agencies.

Agenda

The Friday and Saturday sessions will begin at 9 a.m., and social activities will be planned around the sessions. Full agenda to follow!

Motivation

There is substantial interest in building an empirical basis for evidence-based policy. Doing so involves learning what data have been used by which experts to examine which topics, building better search and discovery tools for finding those datasets and experts, and building a platform that will both improve tools and disseminate the knowledge to both the scientific and policy making community.


There now exists the technical capacity to build such a platform, as demonstrated by a successful recent competition. Computer scientists and domain experts in the life sciences have developed the scientific underpinnings necessary to build each component: document corpus development, ontology development for dataset entity classification, natural language processing and machine learning models for dataset entity extraction, graph models for improving search and discovery, telemetry to capture dataset engagement and use. This workshop will bring together scientists who are working at the cutting edge of knowledge in each of these areas, policy makers who are in need of the results of their work, and funding agencies who have historically supported these efforts.


The scientific committee includes: Stefan Bender, Deutsche Bundesbank; Julia Lane, New York University; Paco Nathan, Derwen.ai; Ian Mulvany, SAGE publications; Christine Borgman, UCLA; and Waleed Ammar, Allen Institute for Artificial Intelligence. The workshop will be convened as an unconference in the “Foo Camp” style.


Most of the focus for this work centers on complex data governance for data science workflows involving sensitive data in regulated environments. We believe that current efforts in open source projects are contributing to much improved dataset modeling and knowledge graph applications for these purposes.


Sponsors

This workshop is generously funded by Eric and Wendy Schmidt at the recommendation of Schmidt Sciences, the Alfred P. Sloan Foundation, and the Overdeck Family Foundation.