Diego García: Leeds School

Depository for different datasets from my research.

Data and code from the project with Xiaowen Hu and Max Rohrer "The color of finance words" (updated 20211201): README file describing the contents of the shared data. The file dictionaries data (RData file) contains the dictionaries, with associated code part 1, which among other things, includes a disambiguation function (as in Table 6 of the paper). The metadata and document-term-matrices (RData file) contain public returns (from Kaggle) that allow for a partial replication of our main results. The associated code (part 2) presents some simple commands to both manipulate dtms and create sentiment scores, as well as showing how we fit the MNIR model. If you are only interested in the lists of words, here are the ML+LM dictionaries from the paper (zip file with txt files).
Data on geographic dispersion of US firms (1994-2008), from my paper with Oyvind Norli "Geographic dispersion and stock returns," published in the Journal of Financial Economics (2012), datafile (tar.gz with README file).
Data on media content from the New York Times, from my paper in the Journal of Finance (2013), "Sentiment during recessions," datafile (csv.gz, see the paper's Technical appendix for details).