Sponsored by ERS and in partnership with Westat, the Coleridge Initiative conducted this data challenge inviting researchers and data scientists to use machine learning and natural language processing to help USDA more efficiently link data on supermarket products to nutrient databases.

The outcome of the challenge will give USDA modern tools that will help the agency develop timely, objective, and high-quality linked data and facilitate its research on the costs of a healthy diet.

Read the Food for Thought: Competition and Challenge Design Case Study

Challenges & Objectives

The main challenge of this competition was to provide USDA with new ways to produce a crucial data resource: the Purchase to Plate Crosswalk (PPC). The PPC combines retail scanner data from the Information Resources, Inc. with nutrition information from the USDA Agricultural Research Service Food and Nutrient Database for Dietary Studies. The PPC crosswalk provides agencies with the ability to measure diet quality and assess USDA Food Plan costs.

Process & Work

The Challenge used confidential data in a secure data enclave hosted inside the ADRF. Over the course of the Challenge, teams competed to implement innovative machine learning and natural language processing approaches.

The data challenge was advertised widely. We received applications from both U.S. and international universities as well as U.S. research institutions and private companies. After reviewing carefully, 12 teams were selected to participate in the challenge by the scientific review board which was comprised of faculty from computer science and engineering departments of top universities and subject matter experts from USDA and WESTAT.


Results & Outcomes

The data challenge was successfully concluded with 3 winning teams from Auburn University, Loyola Marymount University, and a collaborating team from Indiana University, Bloomington and Worcester Polytechnic institute. The solutions submitted by the teams are innovative and well performed with high accurate match rates and effective run-time cost. To see challenge winners, click here.