We use several off-the-shelf text sentiment analysis tools to analyze the sentiment of the Fed's Beige Book reports from 1970--2020.
The raw text data is scraped from the Minneapolis Fed by scrape.py
and stored in txt
. Do whatever you want with this data (I clearly do not own the copyright).
The reports are scored in sentiment.py
using three pre-trained models. I am in the process of adding more. So far the models used are
- VADER from
NLTK
- Pattern analyzer from
TextBlob
- LSTM text classifier from
flair
The analysis is done in analysis.py
, where we get the following graphs:
Dependencies are listed in requirements.txt
. Tested + Developed on Python 3.8.
scrape.py
requests
beautifulsoup4
files.py
pandas
clean.py
cleantext
sentiment.py
pandas
nltk
textblob
flair
transformers
analysis.py
numpy
pandas
statsmodels
matplotlib
- Fix parsing errors
- Bug with
<br>
tag instead of<br />
(no more breaks) - Remove
<strong>
(ignored) -
problem (check if this gets removed) - Delete "learn more"
<p>
at the bottom (grep -RIl "www\." txt/
)
- Bug with
- Find missing/incomplete files
- Some files are empty
- Analyze
errors.txt
- Grab missing files
- Grab missing
2016-0(4|6)-su
files - Grab missing
2015-07-*
files - Try to find missing
1971-01-bo
- Grab missing
- Clean text
- Replace
&%-+
with text? - Replace numbers with words
- Check that text is ASCII
- Replace
- Run sentiment analysis
- Check out
flair
package -
flair
gives valuesx<-0.5 | x>0.5
(fixed in analysis) - Check if all text is used or just first
n
words -
transformers
package - Just extract numbers (bigger is better)
- Check out
- Get exact dates of publication
- Generate histograms
- Normalize values
- Check out outlier (
1971-01-bo
missing doc)
- Regress national sentiment on regional sentiments
- Do you add a constant here?
- See if coefficient sum to 1
- Create proxy measure including all regions
- Graph time series
- Pretty up plots (title+legend)
- Get GDP data
- Check stock market data
- Think about timing of Beige Book data
- By region in a grid
- Bond yields
- Time series regression
- Investigate discrepancies between sentiment scores
- In
su
TextBlob
is high during 1974 recession and higher during 1990s boom
- In
- Add info + pictures to
README.md
- GDP Growth by region
- Aggregate state data