The development of Web 2.0 has led to an important amount of content in webpage. Users are free to express their opinions about products, places and events. This project research is aimed at introducing sentiment analysis into touristic attractions. To begin with, we scrap TripAdvisor reviews from the most touristic attraction in Spain, the Alhambra. We then create two sentiment labels: the expert sentiment which is the rate of the reviewer; and the machine sentiment which is extracted from a Natural Language Processing toolkit developed in Stanford University. After that, we build classification models so as to predict polarity sentiments. Finally, we develop a subgroup discovery method so as to extract valuable information about negative reviews.
There is a brief description for each script of the code:
-
scrapTripAdvisorLoop_anony_ENG_complete.R: Scrapper for Alhambra's English reviews of TripAdvisor.
-
scrapTripAdvisorLoop_anony_SPA_complete.R: Scrapper for Alhambra's Spanish reviews of TripAdvisor.
-
coreNLP.R: Applying CoreNLP Sentiment Analysis method to reviews.
-
UFSM.R: Unigram Feature Selection Method (UFSM) script.
-
BFSM.R: Bigram Feature Selection Method (BFSM) script.
-
WordCloud.R: Print a wordcloud of reviews.
-
WordCloudBigram.R: Print a wordcloud of reviews with bigrams.
-
ReviewGraphics.R: Plots for analyzing Alhambra's TripAdvisor data (number of reviews by language, reviews per month, average ratings...).
-
SplitDataTrainTest.R: Script that splits reviews for training and testing.
-
ModelPerformance.R: Different models for sentiment classification.
-
Models_xgboost_FINAL.R: XGBoost models for sentiment classification.
License: CC BY-NC-SA