Building policy based on evidence and science is at the center of new legislation and presidential executive orders to restore trust in government. But how can agencies demonstrate the impact of their data? The rich context project – which has been called a gamechanger by a former US chief statistician – uses AI and Machine Learning to find What data are being used, by whom and for what purpose? We have shown how to do this in two machine learning competitions , a workshop and a book. Our goal is to develop an open source platform for dataset discovery from research publications, the development of a catalog of datasets produced with agency funding, and a scorecard for those data that shows the relative value of the dataset based on
The approach is to apply machine-learning and natural language processing techniques that searches publications to
- Find what datasets are in the publications
- Show how they’ve been used
- Find other experts who have used the data
- Identify other related datasets
- Show how the data have been used
Follow our Kaggle competition here.