The goal of the Rich Context project is to create a new platform that enables empirical analysts to search for and discover datasets.
The challenge that empirical researchers face is that, for a given dataset, it is difficult to find out who has worked with the data before, what methods and code were used, and what results were produced.
Some estimates suggest that more than a third of analysts’ time is spent in finding out about data rather than in model development and production, and the Federal Data Strategy has directed government agencies to streamline access to federal data assets.
There now exists the scientific capacity to build a platform that will automate search and discovery. A competition brought together computer scientists from around the globe to develop models that would identify and extract dataset mentions from full-text publications. The finalists convened in New York City in February 2019 to present their work. A follow on workshop brought together more than 70 international experts to identify gaps and an operational roadmap.
A forthcoming SAGE publications book provides an overview of initial work in each area.
Much work has already been done
The Coleridge Initiative, in cooperation with Project Jupyter the Deutsche Bundesbank , several federal agencies and Derwen.ai has developed the following set of key inputs.
Read more about this work on RePEc's blog:
New initiative to help with discovery of dataset use in scholarly work, by Christian Zimmerman
Are you a researcher? Tell us about your recent publications and the datasets
you used by filling our our
Rich Context Publication Submission Form.
Researchers from any domain are encouraged to submit. Your submission will be added to our Knowledge Graph
and will help other researchers and data users benefit from your work.
Questions? Want to get involved in other ways? Get in touch!