Depository for different datasets+code.





  1. Data and code from the project with Xiaowen Hu and Max Rohrer "The color of finance words," version from 20220614. README file describing the contents of the shared data. ML+LM dictionaries containing main robust MNIR output from the paper, as well as some simple code to produce dictionaries using different stringency criteria. ML+LM dictionaries from the paper (txt files). This is a link to the 2022 paper version.

    Below is the data we shared in the version from 20211201, which we will keep online until we write a more extensive R package for fitting the robust MNIR. The methodology/output is very similar, but we use a more stringent criteria for inclusion in the 2022 version (smaller dictionaries, avoiding overfitting as much as possible). This is a link to the 2021 paper version, which has the robust MNIR discussion in Section 6, and it has a simpler implementation of the MNIR model in the body of the draft (slightly easier to replicate). README file describing the contents of the shared data. The file dictionaries data (RData file) contains the dictionaries, with associated code (part 1), which among other things, includes a disambiguation function (as in Table 6 of the paper). The metadata and document-term-matrices (RData file) contain public returns (from Kaggle) that allow for a partial replication of our main results. The associated code (part 2) presents some simple commands to both manipulate dtms and create sentiment scores, as well as showing how we fit the MNIR model. If you are only interested in the lists of words, here are the ML+LM dictionaries from the paper (zip file with txt files).

  2. Data on geographic dispersion of US firms (1994-2008), from my paper with Oyvind Norli "Geographic dispersion and stock returns," published in the Journal of Financial Economics (2012), datafile (tar.gz with README file).
  3. Data on media content from the New York Times, from my paper in the Journal of Finance (2013), "Sentiment during recessions," datafile (csv.gz, see the paper's Technical appendix for details).