The two primary sources of textual information about the performance of public firms are the EDGAR database of public filings maintained by the Securities and Exchange Commission and the transcripts of earnings calls from public firms. The goal of this project is to identify and separate the factual content of these sources from opinions expressed in them. There are already BERT-trained models on these sources so that aspect of the project should not require much work. What will require thought and effort will be the design of an algorithm that can distinguish between factual statements and opinion statements in our corpus. Our understanding is that some work has been done on this problem at a general level, but not in the financial and legal context we plan to examine. The ultimate aim of the project is to understand whether and how changes in the legal consequences for stating facts and opinions have changed the way firms speak about their performance.

This would be a great project for undergraduate students who are interested in going to law school and should be especially interesting for those who want to learn more about the rapidly growing area at the intersection of machine learning and law.

Separating Financial Fact from Opinion using Bidirectional Encoder Representations from Transformers (BERT) - Spring 2023 Discovery Project
Term
Spring 2023
Topic
Humanities
Social Sciences
Technical Area(s)
Natural language processing (NLP)