############################################################ ## README file for the paper "The colour of finance words," ## by Garcia, Hu and Rohrer (2021). ## Dated 20210426. ############################################################ ############################################################ ## FinWordColour01.zip ############################################################ ## If you are interested in just getting dictionaries, ## the file in FinWordColour01.zip are small and give you ## some flexibility as to how to tweak the choices we make ## in our paper. ## ## The csv files are the dictionaries discussed in Section ## 5.1 of the paper. ## ## The RData object, together with the metadata file and the ## R code allows the reader to adapt the choices we make ## when constructing the dictionaries (default settings ## yield the output in the csv files discussed above). ############################################################ Dictionary files: - ML_positive_bigram.csv: ML positive bigram dictionaries, 12130 tokens. - ML_negative_bigram.csv: ML negative bigram dictionaries, 13330 tokens. - ML_positive_unigram.csv: ML positive unigram dictionaries, 617 tokens. - ML_negative_unigram.csv: ML negative uniigram dictionaries, 727 tokens. - ML_dictionaries.RData: RData object with tons on colour (see *.R file for details). - meta_public.rds: metadata for all 61,041 earnings calls in our dataset. - dictionary_construction.R shows how we construct uni/bigram dictionaries as presented in Section 5.1. - replication_kaggle.R: replicates Table 1 column 2 and Table 2 column 1 & 4 in the paper, using the dtms provided above. ############################################################ ## FinWordColour02.zip ############################################################ ## If you are interested in testing the training algorithm ## and seeing how it performs out of sample, the files in ## FinWordColour02.zip should provide some colour. ## ## We matched our database to public sources (from Kaggle, ## see https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs). ## ## The data from this match, in particular the abnormal returns, ## are from the meta_public.rds file included in FinWordColour01.zip ## ## We note that the data from Kaggle stops in 2017 ## (slightly smaller sample than in our paper). ## ## We provide the document-term-matrix with 65K terms (2^16), ## both with tf and tf-idf weights. ## ## The code replication_kaggle.R (see above) runs through ## the training given a cutoff date, then looks at out-of-sample ## performance. ############################################################ Document term matrices/metadata: - dtm_bigram.rds: document-term-matrix for all 61,041 earnings calls in our dataset. - dtm_bigram_tfidf.rds: document-term-matrix using tf-idf weights for all 61,041 earnings calls in our dataset.