Diego García: Leeds School

Content

Depository for different datasets+code.

Data and code from the project with Xiaowen Hu and Max Rohrer "The colour of finance words," version from 20221108. README file describing the contents of the shared data. ML dictionaries from the paper (txt files). Robust MNIR output containing main robust MNIR output from the paper (csv files). Code from the project, used to produced all tables in the paper Data shared with the profession, including the metadata files needed to reproduce our results, as well as the dtms from the three corpora we study in the paper. This is a link to the 2022 paper version.
Below is the data we shared in the version from 20211201. The methodology/output is very similar, but we use a more stringent criteria for inclusion in the 2022 version (smaller dictionaries, avoiding overfitting as much as possible). This is a link to the 2021 paper version, which has the robust MNIR discussion in Section 6, and it has a simpler implementation of the MNIR model in the body of the draft (slightly easier to replicate). README file describing the contents of the shared data. The file dictionaries data (RData file) contains the dictionaries, with associated code (part 1), which among other things, includes a disambiguation function (as in Table 6 of the paper). The metadata and document-term-matrices (RData file) contain public returns (from Kaggle) that allow for a partial replication of our main results. The associated code (part 2) presents some simple commands to both manipulate dtms and create sentiment scores, as well as showing how we fit the MNIR model. If you are only interested in the lists of words, here are the ML+LM dictionaries from the paper (zip file with txt files).
Data on geographic dispersion of US firms (1994-2008), from my paper with Oyvind Norli "Geographic dispersion and stock returns," published in the Journal of Financial Economics (2012), datafile (tar.gz with README file).
Data on media content from the New York Times, from my paper in the Journal of Finance (2013), "Sentiment during recessions," datafile (csv.gz, see the paper's Technical appendix for details).