Coleridge Initiative


Big Data and Social Science Book

The goal of this book is to provide social scientists with an understanding of the key elements of this new science, its value, and the opportunities for doing better work. The goal is also to identify the many ways in which the analytical toolkits possessed by social scientists can be brought to bear to enhance the generalizability of the work done by computer scientists.

We take a pragmatic approach, drawing on our experience of working with data. Most social scientists set out to solve a real- world social or economic problem: they frame the problem, identify the data, do the analysis, and then draw inferences. At all points, of course, the social scientist needs to consider the ethical ramifications of their work, particularly respecting privacy and confidentiality. The book follows the same structure. We chose a particular problem—the link between research investments and innovation— because that is a major social science policy issue, and one in which social scientists have been addressing using big data techniques. While the example is specific and intended to show how abstract concepts apply in practice, the approach is completely generalizable. The web scraping, linkage, classification, and text analysis methods on display here are canonical in nature. The inference and privacy and confidentiality issues are no different than in any other study involving human subjects, and the communication of results through visualization is similarly generalizable.

This book will serve as the text book for the class. For more information on the text, visit