############################################################# ## This data depository contains all code/data necessary ## to replicate "The Colour of Finance Words" by Diego ## GarcĂ­a, Xiaowen Hu, and Maximilian Rohrer, forthcoming ## in the Journal of Financial Economics. ## ## Dated: 20221108. ############################################################# ############################################################# ## We are sharing our output in four different files, with ## different purposes. ## ## 1. ML dictionaries (dictionary.zip). We provide the list ## of unigrams and bigrams in our final specifications, using ## both the training and the full samples. These are small ## files, the unigram lists should be plug-and-play, with ## the bigrams you should use our text normalizing routines. ## ## 2. Robust MNIR output (robustMNIR.zip). We provide the ## estimated loadings according to the robust MNIR model for all ## unigrams/bigrams, both for the training sample, and ## the full sample. These are also small files, and researchers ## can choose different cutoffs for the inclusion criteria ## into final dictionaries. ## ## 3. Code (code.zip). We provide the code we use in our paper, ## starting from the estimation of the robust MNIR model, and the ## construction of the dictionaries, to the details on generating ## the output for the tables we present in the final version of ## the paper. ## ## 4. Data (data.zip). We provide dtms for all the analysis we ## do in the paper (earnings calls, 10Ks and WSJ articles), ## as well as a blueprint of the metadata we use (without variables ## that we cannot share, such as stock prices). We note that ## we provide GVKEYS/PERMNOS so researchers should be able to ## link files easily. We also include the LM dictionaries we ## use in the paper. We note that this file is large (~3Gbs). ############################################################# ############################################################# ## 1. ML dictionaries (dictionary.zip) ############################################################# ## This file contains 8 different dictionaries, which form ## the core of our output. The non-dated files contain the ## dictionaries we produce using the full sample. The ones ## dated *20151231* contain the dictionaries we produce using ## the pre-2016 data. ############################################################# ############################################################# ## 2. Robust MNIR output (robustMNIR.zip) ############################################################# ## This file contains 4 different csv files, containing the ## robust MNIR scores (% positive/negative, freq). The non-dated ## files contain the MNIR scores we produce using the full sample. ## The ones named *20151213* contain the dictionaries we produce ## using the pre-2016 data. ############################################################# ############################################################# ## 3. Code (code.zip) ############################################################# ## This file contains 3 different R scripts. ## ## The first shows how we estimate the robust MNIR model. ## ## The second simply converts the MNIR output to dictionaries. ## ## The third is the code we use to generate the tables in the ## paper. ############################################################# ############################################################# ## 4. Data (daga.zip) ############################################################# ## This file contains 10 different files. ## ## The six files named dtm* contain the document-term-matrices ## for the different corpora we study. ## ## The three meta* files contain the metadata we use in our code. ## We note that we are not sharing fields that come from ## proprietory datasets (e.g. CRSP/Compustat), but we trust ## researchers with access to such data can find it easily. ## ## We note that the meta* and dtm* files match on rows (for ## each of the corpora). ## ## The LM_2021* file contains the Loughran and McDonald (2011) ## dictionaries we use (downloaded in 2021). #############################################################