Sentimental Analysis of Airbnb Customer Reviews using Machine Learning and DL

1 of

Published on Nov 24, 2022

Page 1 (0s)

Blue glass building. Sentimental Analysis of Airbnb Customer Reviews using Machine Learning and DL.

Page 2 (25s)

Agenda. Introduction Literature Review Problem Statement Research Methodology Implementation Results and Discussion Conclusions and Recommendations.

Page 3 (2m 3s)

Introduction. close up of building. The lodging industry has contributed massively towards economic growth and the unprecedented growth has benefited the hospitality and tourism sectors by encouraging travel among the global population and increasing the number of hotel nights available for both pleasure and business travel. Apart from contributing to economic growth, the lodging industry has created many employment opportunities and is the main driver of global value creation. Airbnb has been gaining a lot of popularity since 2008 because of its low prices which allow guests to stay in a more homely atmosphere. The guests can review the properties based on the ratings and interact directly with the property owners. Customer experience and perceptions are the core focus areas of Airbnb’s business model. Many property owners have the necessity to review these customers’ points of view and use this information in numerous service-oriented industries to improve the quality of their services and plan a new business strategy, achieving higher customer satisfaction and ratings, and understanding overall customer behaviour that impacts a higher property rating. As a result, customer online reviews analysis is a topic of research among many different studies and makes a basis for our research..

Page 4 (3m 20s)

Literature Review. In 2021 ( Chiny et al., 2021b) created a client-centric evaluation process to evaluate guest satisfaction on Airbnb using machine learning and NLP. The methodology used by the researchers was to collect customer reviews based on Airbnb listings. The data pre-processing was done on the input data set. The data was then segmented into dividing the customers into different categories. The next step was to train the model to calculate the score based on key columns that contribute to higher customer reviews and then predict the output score. After performing the techniques of cleaning and filtering the data, data segmentation was performed based on the opinions, mainly using NLP Algorithms, mainly regression algorithms like Multiple Linear Algorithms and Support Vector Algorithms to calculate coefficients which take input elementary scores given by customers and influence the overall score. The studies in this research suggest the key indicators of customer satisfaction scores. (Chang et al., 2021) developed a hotel room occupancy forecasting model using LSTM with sentimental analysis and scores of customer online reviews. The customer scores and reviews in the platform are a key source to derive customer experiences hence they can highly influence forecasting hotel occupancy was the main objective of this study. The study included three data sets mainly sentimental analysis of the customer reviews, and customer score ratings, and then combining the data with the customer rating score to predict and forecast the hotel occupancy. The forecasting model used were LSTM, BPNN, GRNN, LSSVR, RF, and GPR. The study concluded with the result that the LSTM model provided more precise listing occupancy forecasting than the other algorithms used (Singla et al., 2017) r esearched data from Amazon, NRC sentiment dictionary is used to extract different emotions of anger, anticipation, positive-negative, fear, etc. The model is trained and tested using Naïve Bayes, SVM, and Decision Tree models. The accuracy obtained is then validated using the 10-fold cross-validation. The researcher suggests that future research can use other models, like Word2Vec. (K and S, 2022b) proposed a recommender-based system designed based on sentimental analysis using different deep learning models. The deep learning models - Hybrid CNN-SVM, PSO-based-CNN-SVM, and PSO-based CBiLSTM -SVM are applied to the input data to calculate the sentiment score. ( Čumlievski et al., 2022) researched a travel industry case study using machine learning models based on accommodation features and online customer reviews. The study objective was to study that accommodation factors such as satisfaction with the comfort, value, cleanliness, and similar have the greatest influence on guest satisfaction, unlike more general factors such as region or county, price, permission for parties, and others that do not have a direct connection to the accommodation category (and review). This study concluded that machine learning algorithms can be used to predict guest satisfaction based on specific accommodation properties (such as personnel, cleanliness, comfort, etc.) besides the more general accommodation size, type, and number of stars. (Singh and Sarraf , 2020) gathers customer reviews from internet. Examining popular e-commerce websites, a customer's rating of a product is frequently inconsistent with the product review that the customer has written. Sometimes the customer review system doesn’t show an accurate result as the customer may not buy the product after completely understanding the reviews. However, collecting the reviews can be more accurate when collected from a more trustworthy websites like Flipkart. The researcher also recommends enhancing the model by using other machine learning models so that the performance can be enhanced. (Rezazadeh Kalehbasti et al., 2021) studied a price prediction model using Airbnb dataset. The researcher conducted experiments using different algorithms of machine learning and compared the models using the evaluation metrics. To help validate and generalize the findings of this research, the authors advise that additional research be conducted to delve more deeply into the factors that influence the customer experience in the shared accommodation space and that the relationship between those factors be better understood through examination of other research contexts in various cities and nations..

Page 5 (6m 3s)

Problem Statement. This research’s primary goal is to categorize consumer opinions expressed in reviews to ascertain if they had an overall positive or bad experience with their stay in Airbnb. The online reviews gathered from Airbnb customers will be a clear indication of how they feel about their service and what are the key areas they were clearly impressed with and what are the areas that need improvement. Our aim will target a popular tourist destination that is visited by tourists all over the world. A popular tourist destination always has competition among hotels and short-term rentals as visitors always prefer a variety of factors before choosing a place to stay during their visit. Online reviews function as a key marketing strategy for any service provider and hence key guidance towards improving the services based on the review comments. Thus, we can say that online reviews function as a free marketing strategy for any business owner whose focus is getting more online customers. The research will focus on how customer reviews can be analyzed and used by the industry to analyze whether the review comments are positive or negative. Sentimental analysis is the technique of automatically extracting pertinent data that reflects a user's viewpoint on a particular topic. The simplest form of such analysis would be to diversify the customer opinions into positive feedback – negative feedback and neutral feedback. There may also be other forms of sentimental analysis where we can predict customer feedback on a scale. The research objectives are formulated based on the aim of the study as follows: To select a popular tourist place for Airbnb customer reviews we will choose Bangkok. To select a time frame of data set which can capture customer reviews of the pre-covid and post-covid timeframe to have enough customer reviews information. To do data pre-processing which will be the key objective so that key features of customer reviews can be chosen for sentimental analysis. To categorize the thoughts customers have expressed in brief reviews to ascertain whether the review's emotion toward their experience is positive or negative. To determine the optimum technique for performing sentimental analysis. For model building use machine learning algorithms and deep learning algorithms. To assess the effectiveness of both models and recommend the one that performs better. To make conclusions based on accuracy and finalize the best model that can be used by tourist managers to get an overall sentiment on Airbnb customer reviews of tourists staying in Airbnb Bangkok Listings..

Page 6 (8m 41s)

Research Methodology. The first module of the framework is Data Collection and Preprocessing stage - In the data collection stage, a large sample of customer reviews is loaded from the Airbnb dataset which will be the initial stage of the framework. The next step will involve the data processing stage which will include data cleansing and data treatment to make the data readable which will include removing the stop words, punctuation, removing whitespaces, digits, and stemming. The sentimental analysis phase will involve feature selection and determining the sentiment orientation of the customer reviews. We then proceeded with the model-building stage where we will train and test the data using various machine learning models like Naïve Bayes, Support Vector Machine (SVM), and Decision Tree. We then compared the accuracy of the machine learning models and concluded with a better approach to determine the best-performing ML model to predict customer satisfaction. We have built an accurate and high-performing deep-learning model to predict customer satisfaction.

Page 7 (10m 8s)

Implementation. close up of building. Airbnb Bangkok Data Set – reviews.csv.

Page 8 (11m 29s)

close up of building. Text Processing. Text processing is performed on the data set to make the data easy to analyze. The text Processing stage involved the following steps : Case Folding : We performed case folding on the comments column by transforming the text to lower characters. To make the reviews more standardized and simpler to process, we changed all of the uppercase letters to lowercase. We then filtered the string for special characters. Tokenization : It is a process of breaking down every sentence, in this our scope is customer review, in simpler components like individual words or keywords. These expressions or objects are referred to as tokens. We performed tokenization to split paragraphs and sentences into smaller units that can be more easily assigned meaning. Removing Stop Words : Stop words further enhance the accuracy of the model building by removing the words that occur commonly across all the documents in the corpus. Typically, articles and pronouns are generally classified as stop words. To remove stop words from the comments, we tokenized the sentence and then removed the word if it exists in the list of stop words provided by NLTK. Lemmatization : Lemmatization is a text normalization technique used in Natural Language Processing (NLP), that switches any kind of a word to its base root mode. We performed lemmatization on the text on the comments_cleaned feature to get the tokenized result for the modeling part. Sentimental Analysis : Since we may assume that client reviews are closely related to how they felt about their hotel stay, we initially implement sentiment analysis features. We employed VADER, a sentiment analysis-focused component of the NLTK module. Using VADER, we calculated a compound score which further translated to Sentiment – positive/negative/neutral . Topic Modeling : Topic Modelling falls under unsupervised machine learning where the documents are processed to obtain the relative topics. It is a very important concept of the traditional Natural Processing Approach because of its potential to obtain the semantic relationship between words in the document clusters. We used one of sklearn’s methods of topic modeling , the NMF modeling . The NMF is based on Non-negative Matrix Factorization to implement topic modeling . In the NMF model, we will use the TF-IDF feature vector to train the model..

Page 9 (13m 38s)

close up of building. Modeling. The classification involved building and training models like Decision Trees, Naïve Bayes, Logistic Regression, Support Vector Machines, or Neural Networks. Before starting with the model, we did the following steps: Label the Target : We label the target variable sentiment using SK learn library LabelEncoder Label Encoding in Python can be achieved using Sklearn Library. Sklearn provides a very efficient tool for encoding the levels of categorical features into numeric values. LabelEncoder encodes labels with a value between 0 and n_classes-1 where n is the number of distinct labels Set the dependent variable : After labeling , we set the dependent variable as comments. Train test Split : Before we build our machine learning models, we are going to go through a few steps. The first one is to Split the data frame into Train and Test. Second, we will vectorize and embed the reviews. To prevent overfitting, we balanced the train set and reduced the number of its features of it. Then we created 3 functions of metrics that will be used in our model.

Page 10 (14m 41s)

close up of building. Decision Tree Classifier. The first algorithm we used for making our model is the Decision Tree. A decision tree is a flowchart-like tree structure where an internal node represents a feature (or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The topmost node in a decision tree is known as the root node. It learns to partition based on the attribute value. In a Decision Tree algorithm, there is a tree-like structure in which each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label. The paths from the root node to the leaf node represent classification rules. To build this model we imported and instantiated the DecisionTreeClassifier module from sklearn.Tree Library. To fit the model, we passed x_train and y_train and compute the accuracy score, confusion matrix, and classification report..

Page 11 (15m 18s)

close up of building. Naïve Bayes Classifier. Naive Bayes can be extended to real-valued attributes, most commonly by assuming a Gaussian distribution. This extension of naive Bayes is called Gaussian Naive Bayes. Other functions can be used to estimate the distribution of the data, but the Gaussian (or Normal distribution) is the easiest to work with because we only need to estimate the mean and the standard deviation from our training data. We will use the classifier Gaussian Naive Bayes . This classifier is employed when the predictor values are continuous and are expected to follow a Gaussian distribution. To build this model we imported and instantiated the Gaussian Naive Bayes module from SKlearn GaussianNB . To fit the model, we passed x_train and y_train and compute the accuracy score, confusion matrix, and classification report..

Page 12 (15m 47s)

close up of building. SMOTE. A problem with imbalanced classification is that there are too few examples of the minority class for a model to effectively learn the decision boundary. One way to solve this problem is to oversample the examples in the minority class. This can be achieved by simply duplicating examples from the minority class in the training dataset prior to fitting a model. This can balance the class distribution but does not provide any additional information to the model. One approach to addressing imbalanced datasets is to oversample the minority class using SMOTE We imported and instantiated the SMOTE module with a minority parameter from imblearn.over_sampling . We will fit the x_train and y_train and compute the new smote.

Page 13 (16m 28s)

close up of building. Logistic Regression. Logistic regression is an example of supervised learning. It is used to calculate or predict the probability of a binary (yes/no) event occurring. An example of logistic regression could be applying machine learning to determine if a person is likely to be infected with COVID-19 or not. We imported and instantiated the Logistic Regression model from SKlearn linear_model . To fit the model, we passed x_train and y_train and computed the accuracy score, confusion matrix, and classification report..

Page 14 (17m 7s)

close up of building. Support Vector Machines (SVM).

Page 15 (18m 9s)

close up of building. Neural Network Model. It is a part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems. This is a behavior required in complex problem domains like machine translation, speech recognition, and more. LSTMs are a complex area of deep learning . Embedding : We converted text inputs into embedded vectors such that we can apply machine learning. In word embeddings, every word is represented as an n-dimensional dense vector. The words that are similar will have similar vectors. In order to embed our text, first transformed our reviews into vector representations. Here we will use Tokenizer module from keras.preprocessing.text to vectorize the text corpus and we will need the module pad_sequences from keras.preprocessing.sequence to ensure that all sequences in the list of words have the same length. Balancing : As the data set is highly unbalanced, to be more precise in the results we will balance the dataset using the imblearn Random OverSampling technique.Having a balanced data set for a model would generate higher accuracy models, higher balanced accuracy, and a balanced detection rate. Hence, it’s important to have a balanced data set for a classification model. Adjusting the balanced weight for the cost function to give more attention to the minority class in a neural network model. Feature Selection : To prevent overfitting, we reduced the features of the data frame. To do that we used one of the simplest and most common ways to select relevant features for classification which is to calculate the F-Score for each feature. The F-Score is calculated using the variance between the features and the variance within each feature. A small F-score usually means that the feature is less important than a feature with a high F-score. We calculated the F-Score of the features per sentiment using sklearn modules SelectKBest and f_classif to return the ANOVA F-value..

Page 16 (19m 40s)

close up of building. Neural Network Model. Model Summary : To create a model to predict the text sentiment, we have one input layer shape 20, the number of features. One Embedding Layer that takes the vocab size and the embedding matrix created before. One Long Short-Term Memory (LSTM) layer with 128 neurons and finely the output layer with 3 neurons since we have 3 labels in the output, Positive, Neutral, and Negative. We used the keras model module and Keras . Layers, Dense, LSTM, and Embedding. For the metrics, we will measure Accuracy, F1, Precision, and Recall. Model Architecture : To visualize, we have printed the architecture of our neural network with the plot_model from keras utils. Model Training : We trained our model with 0.7 of the random oversampling DF. Batch size 32 and epochs 15. The other 0.3 of the data frames will be used for the validation..

Page 17 (21m 5s)

close up of building. Results and Discussions. A confusion Matrix is a table that is used in classification problems to assess where errors in the model were made. The rows represent the actual classes the outcomes should have been. While the columns represent the predictions we have made. Using this table, it is easy to see which predictions are wrong. The following evaluation metrics give details about the model performance. Accuracy : This is a great measure but only when you have symmetric datasets where values of false positives and false negatives are almost the same. Accuracy is calculated as TP+TN/ TP+FP+FN+TN Precision : Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. Precision is calculated as: TP / TP + FN Recall : Recall is the ratio of correctly predicted positive observations to all observations in actual class. A recall is calculated as TP / TP + F P F1 score : The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall. F1 Score = 2(Recall Precision) / (Recall + Precision).

Page 18 (21m 33s)

close up of building. Decision Tree Model Results.

Page 19 (21m 54s)

close up of building. Naïve Bayes Model Results. We computed the accuracy score for the training and testing data set and check the model accuracy. As per the diagram, we obtained a training accuracy of 72.25% and a testing accuracy of 72.89% . We can see that training and testing scores are quite comparable, so there is no sign of overfitting. Now, based on the above analysis we can conclude that our classification model accuracy isnt very good. Our model isn’t doing a good job in terms of predicting the class labels..

Page 20 (22m 10s)

close up of building. SMOTE Results. We computed the accuracy score for the training and testing data set and check the model accuracy. As per the diagram, we obtained a testing accuracy of 72.89% . Now, based on the above analysis we can conclude that our classification model accuracy isn’t very good. Our model isn’t doing a good job in terms of predicting the class labels..

Page 21 (22m 31s)

close up of building. Logistic Regression Results.

Page 22 (22m 47s)

close up of building. Support Vector Machines Results.

Page 23 (23m 0s)

close up of building. Ensembling Results. Using the Ensembling method we achieved a training accuracy score of 97.28% and a testing accuracy of 93.92%.

Page 24 (23m 32s)

close up of building. Neural Network Model Results.

Page 25 (23m 48s)

close up of building. Neural Network Model Results.

Page 26 (24m 19s)

close up of building. Model Results Summary. The following table shows the models summary -.

Page 27 (24m 41s)

close up of building. Data Visualization. The main goal of data visualization is to make it easier to identify patterns, trends, and outliers in large data sets. The term is often used interchangeably with others, including information graphics, information visualization and statistical graphics Learning this information gives us an idea about the model we will build. It will also provide guidance on our methods of analyzing data..

Page 28 (25m 5s)

close up of building. Data Visualization. Sentiment Word Cloud Representation : W e used the famous Word Cloud graph, an image composed of words used in the Airbnb reviews, in which the size of each word indicates its frequency. To do that we are going to create 3 data frames. Positive, Neutral, and Negative with the words that compose the reviews of each sentiment, and we are going to count how many times it repeats using values count().

Page 29 (25m 39s)

close up of building. Data Visualization. Sentiment N-Grams Plot: To get more context we used the N-Grams to identify what kind of thing reviewers are talking about in each sentiment. The N_Grams plot is a very nice way to understand the most common topics per sentiment. We built a function that counted the 1,2,3 and 4 grams per sentiment and another function to define the data frame per sentiment calling the first function..

Page 30 (26m 16s)

Conclusion and Recommendations. The experiments performed in this thesis show that our model using Deep Learning Algorithm proves to be more accurate than other machine learning algorithms. We would however recommend that future studies focus on building additional features: Use the model on other data sets to establish a comparison between the studies. The model can use customer reviews for e-commerce service providers like Lazada and Amazon and perform sentimental analysis on different products. The same study can also be recommended for Travel Industry service providers like Trivago and Booking.com. Create an additional feature to convert the non-English review comments to English and perform evaluation metrics of the language-translation part. Public Airbnb datasets have more positive reviews than negative/neutral reviews, A well-balanced data set should be helpful to perform sentimental analysis. Since we performed experiments using Bangkok Dataset, future work can be done on other country’s data sets and evaluate the model performance..

Page 31 (28m 26s)

Thank You.