This site uses cookies to enhance your experience. By clicking "Accept", you agree to the use of cookies.

Back to Blog

Disaster Tweets: Exploratory Data Analysis

DevelopmentFeb 15, 2020
Parsing the top 30 keywords from the Disaster Tweets corpus.

My first foray into NLP! In this notebook I conduct basic EDA to tease out important characteristics of the data, including keyword prevalence, tweet length and author locations. I also utilize the NLTK package to parse common words and stopwords.

Few things that stood out to me:

Disaster tweets originate from different locations that non-disaster tweets.

Location frequency for disaster vs. non-disaster tweets.

Disaster tweets also tend to be slightly longer.

Tweet length comparison for disaster vs. non-disaster tweets.

Check out the full notebook for more interesting findings:

View Notebook