Amazon products reviews classification based on machine learning, deep learning methods and BERT
Saman Iftikhar, Bandar Alluhaybi, Mohammed Suliman, Ammar Saeed, Kiran Fatima
In recent times, the trend of online shopping through e-commerce stores and websites has grown to a huge extent. Whenever a product is purchased on an e-commerce platform, people leave their reviews about the product. These reviews are very helpful for the store owners and the product’s manufacturers for the betterment of their work process as well as product quality. An automated system is proposed in this work that operates on two datasets D1 and D2 obtained from Amazon. After certain preprocessing steps, N-gram and word embedding-based features are extracted using term frequency-inverse document frequency (TF-IDF), bag of words (BoW) and global vectors (GloVe), and Word2vec, respectively. Four machine learning (ML) models support vector machines (SVM), logistic regression (RF), logistic regression (LR), multinomial Naïve Bayes (MNB), two deep learning (DL) models convolutional neural network (CNN), long-short term memory (LSTM), and standalone bidirectional encoder representations (BERT) are used to classify reviews as either positive or negative. The results obtained by the standard ML, DL models and BERT are evaluated using certain performance evaluation measures. BERT turns out to be the best-performing model in the case of D1 with an accuracy of 90% on features derived by word embedding models while the CNN provides the best accuracy of 97% upon word embedding features in the case of D2. The proposed model shows better overall performance on D2 as compared to D1.
deep learning; feature extraction; machine learning; sentiment analysis; transformer technique;