Analysis of Indian Railway customer reviews using NLP
Data science projects in pondicherry
Create New

Analysis of Indian Railway customer reviews using NLP

Project period

05/04/2020 - 06/01/2020




Analysis of Indian Railway customer reviews using NLP
Analysis of Indian Railway customer reviews using NLP

Customer sentiment analysis is a technique of processing information, typically in text format from social media sources, to evaluate customer opinions and responses. While analyzing real time customer reviews and resolving customer problems quickly. Sentiment Analysis is the active space of analysis that focuses on analyzing the opinions or emotions of users and classifies them into positive or negative reviews. In this project, I propose a brand new approach for sentiment classification of railway train reviews using the Deep learning concept. Data/reviews are gathered via social media sites at different locations which are distributed. 

Why: Problem statement

Nowadays, we people have been pushed into long travelling for jobs and other works. For long travelling, maximum people prefer trains. Most of the railway departments provide unhealthy foods, uncleaned coaches, and many of them are unhygienic. Being unhygienic may lead to greater health issues for the passengers.

How: Solution description

Currently people use social media as a tool to post their problems. Inorder to reduce the problems occur in railways, we are going collect all the passenger reviews they have posted in social media and classify them using Machine learning models.

Data collection:

I collected Indian railway customer reviews from the site attached here.

The collected data has been read using Pandas library. The independent variables are username, subject, rating, reviews and sentiment. The dependent variable is  Sentiment_data (contains 0 if the sentiment is negative and contains 1 if the sentiment is positive) . The below image shows the first five data from the whole dataset.

Here, you can find the histogram plot of the reviews. From the given data, 85 people have given rating 1 for the railway department. 38 people have given the rating 3. Then, 22 people were given the rating 2. Only 18 people were given the rating 5. Rating 4 was given by 15 people. So, from the analysis we found that rating 4 and 5 was given by the minimum number of people.


A stop word is a commonly used word that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query.

NLTK (Natural Language processing Toolkit) in python has a list of stopwords stored in the nltk_data directory. By using this technique, words like the, because, a, was, is,  were deleted in the reviews. 

The below image is produced from wordcloud. It displays the important words cracked from the reviews of the whole dataset.

Machine Learning Models:

The main libraries used in the projects are Nltk, word cloud, stopwords etc., I applied the different methods of naive bayes like Multinomial, Bernoulli and Logistic regression.

How is it different from competition

Existing systems of Indian railways don’t classify and analyze the reviews into positive and negative sentiments. Additionally there is no automatic classification of departments depending upon the complaints or reviews received for any action. I tend to address this issue by developing the approach for sentiment classification using ML concepts.

Who are your customers

People who are in the Indian railway department. Also this method is used for all the customer review analysis. 

Project Phases and Schedule

Phase 1: Data collection

Phase 2: Data cleaning

Phase 3: Feature extraction

Phase 4: Training using Naive Bayes

Resources Required

Anaconda tool with Python 3.7 version

Installation of required libraries

Jupyter notebook

View on Github


Leave a Comment

Post a Comment

Are you Interested in this project?

Do you need help with a similar project? We can guide you. Please Click the Contact Us button.

Contact Us

Social Sharing