DISEASE PREDICTION BASED ON SYMPTOMS USING MACHINE LEARNING TOOLS
Create New

DISEASE PREDICTION BASED ON SYMPTOMS USING MACHINE LEARNING TOOLS

Project period

01/04/2020 - 02/02/2020

Views

103

4



DISEASE PREDICTION BASED ON SYMPTOMS USING MACHINE LEARNING TOOLS
DISEASE PREDICTION BASED ON SYMPTOMS USING MACHINE LEARNING TOOLS

Nowadays, Industries are generating more data which is growing faster. The researchers and organizations are making use of this informations to make and predict important decisions. Few industries that create huge data are Health sectors like Hospitals, Educational Sectors, and many other companies. But healthcare is considered as the top industry that generates a huge amount of data. We usually use Machine learning algorithms in order to maintain the full set of hospital data. Machine learning allows us to build models so that we can quickly analyze data and interpret results, using both past and real-time data. Using machine learning methods, doctors can make the best decisions on the patient’s diagnoses and treatment options, which leads to improved healthcare services. 

Big data has three dimensions. The first one is velocity, The second one is variety and the last is volume. Healthcare is a perfect example of how the three dimensions of Big data is utilized. This type of data is almost spread among many healthcare systems, health insurance, researchers, government sectors, etc. An important challenge is how to obtain the information from these data since the amount is very large, we are using some data mining and machine learning techniques. Also, it is expected that from this project, if a disease is predicted, then we can treat the patients earlier and therefore we reduce the risk and cost involved thereby patients life is been saved.

Why: Problem statement

Big data has a vital impact on the healthcare sector and it has the capacity to reduce treatment costs, predict outcoming results, avoid many preventable diseases and improve the living of life. 100 % accurate analysis of medical data benefits us in early disease detection and well patient care in big data. The accuracy of the analysis is reduced if we have incomplete data. In this project, machine learning algorithms are used for the best prediction of diseases. Latent factor model is used to overcome the difficulty of missing data. A new convolutional neural network-based multimodal disease risk prediction (CNN-MDRP) algorithm is proposed in this paper. It uses both structured and unstructured data from hospitals for effective prediction of diseases.

Vast researches have been experimented to improve the accuracy of risk classification from big data. The existing model will make use of only structured data. But for, unstructured data, convolutional neural network (CNN) is used to obtain text characteristics automatically. But none of the previous work handles medical text data by CNN and also there is a prevalence of differences between diseases in many regions, because of the changing climate and living habits in all regions.

How: Solution description

By employing machine learning algorithms, we predicted the disease based on their symptoms.

DATA COLLECTION:

 Data is collected from various websites, research papers, and articles. The features are symptoms like headache, vomit, constipation, stomach pain, dizziness, and target is a type of disease.

DATA CLEANING:

           One hot encoding

           Format: xlsx to csv

        

Steps in the algorithms:

  • Decision tree

Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too. The goal of using a Decision Tree is to create a training model that can use to predict the class or value of target variables by learning simple decision rules inferred from prior data(training data). In Decision Trees, for predicting a class label for a record we start from the root of the tree. We compare the values of the root attribute with the record’s attribute. On the basis of comparison, we follow the branch corresponding to that value and jump to the next node.

  • Random forest

Random forest is a supervised learning algorithm which is used for both classification as well as regression. But however, it is mainly used for classification problems. As we know that a forest is made up of trees and more trees means more robust forest. Similarly, a random forest algorithm creates decision trees on data samples and then gets the prediction from each of them and finally selects the best solution by means of voting. It is an ensemble method which is better than a single decision tree because it reduces the over-fitting by averaging the result.

  • Naive Bayes

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Among the three algorithms used, the decision tree algorithm yielded the largest accuracy of 95%.

How is it different from competition

In the previous models, they have used only the decision tree algorithm. But user interface has not been done. In this project, we have used three algorithms. We also used Tkinter for the user interface. Using this model, we end up with 95% accuracy.

Who are your customers

Public and patients

Project Phases and Schedule

Phase 1 Data collection process

Phase 2 Algorithm development

Phase 3  User interface development

Resources Required

Anaconda tool – Python 3.7 Version

Download:
Project Code Code copy
/* Your file Name : Predict.ipynb */
/* Your coding Language : python */
/* Your code snippet start here */
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9512195121951219\n",
      "39\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\PRIYA PALVANNAN\\Anaconda3\\lib\\site-packages\\sklearn\\ensemble\\forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.\n",
      "  \"10 in version 0.20 to 100 in 0.22.\", FutureWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9512195121951219\n",
      "39\n",
      "0.9512195121951219\n",
      "39\n",
      "0.9512195121951219\n",
      "39\n",
      "0.9512195121951219\n",
      "39\n"
     ]
    }
   ],
   "source": [
    "from tkinter import *\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "# from gui_stuff import *\n",
    "\n",
    "l1=['back_pain','constipation','abdominal_pain','diarrhoea','mild_fever','yellow_urine',\n",
    "'yellowing_of_eyes','acute_liver_failure','fluid_overload','swelling_of_stomach',\n",
    "'swelled_lymph_nodes','malaise','blurred_and_distorted_vision','phlegm','throat_irritation',\n",
    "'redness_of_eyes','sinus_pressure','runny_nose','congestion','chest_pain','weakness_in_limbs',\n",
    "'fast_heart_rate','pain_during_bowel_movements','pain_in_anal_region','bloody_stool',\n",
    "'irritation_in_anus','neck_pain','dizziness','cramps','bruising','obesity','swollen_legs',\n",
    "'swollen_blood_vessels','puffy_face_and_eyes','enlarged_thyroid','brittle_nails',\n",
    "'swollen_extremeties','excessive_hunger','extra_marital_contacts','drying_and_tingling_lips',\n",
    "'slurred_speech','knee_pain','hip_joint_pain','muscle_weakness','stiff_neck','swelling_joints',\n",
    "'movement_stiffness','spinning_movements','loss_of_balance','unsteadiness',\n",
    "'weakness_of_one_body_side','loss_of_smell','bladder_discomfort','foul_smell_of urine',\n",
    "'continuous_feel_of_urine','passage_of_gases','internal_itching','toxic_look_(typhos)',\n",
    "'depression','irritability','muscle_pain','altered_sensorium','red_spots_over_body','belly_pain',\n",
    "'abnormal_menstruation','dischromic _patches','watering_from_eyes','increased_appetite','polyuria','family_history','mucoid_sputum',\n",
    "'rusty_sputum','lack_of_concentration','visual_disturbances','receiving_blood_transfusion',\n",
    "'receiving_unsterile_injections','coma','stomach_bleeding','distention_of_abdomen',\n",
    "'history_of_alcohol_consumption','fluid_overload','blood_in_sputum','prominent_veins_on_calf',\n",
    "'palpitations','painful_walking','pus_filled_pimples','blackheads','scurring','skin_peeling',\n",
    "'silver_like_dusting','small_dents_in_nails','inflammatory_nails','blister','red_sore_around_nose',\n",
    "'yellow_crust_ooze']\n",
    "\n",
    "disease=['Fungal infection','Allergy','GERD','Chronic cholestasis','Drug Reaction',\n",
    "'Peptic ulcer diseae','AIDS','Diabetes','Gastroenteritis','Bronchial Asthma','Hypertension',\n",
    "' Migraine','Cervical spondylosis',\n",
    "'Paralysis (brain hemorrhage)','Jaundice','Malaria','Chicken pox','Dengue','Typhoid','hepatitis A',\n",
    "'Hepatitis B','Hepatitis C','Hepatitis D','Hepatitis E','Alcoholic hepatitis','Tuberculosis',\n",
    "'Common Cold','Pneumonia','Dimorphic hemmorhoids(piles)',\n",
    "'Heartattack','Varicoseveins','Hypothyroidism','Hyperthyroidism','Hypoglycemia','Osteoarthristis',\n",
    "'Arthritis','(vertigo) Paroymsal  Positional Vertigo','Acne','Urinary tract infection','Psoriasis',\n",
    "'Impetigo']\n",
    "\n",
    "l2=[]\n",
    "for x in range(0,len(l1)):\n",
    "    l2.append(0)\n",
    "\n",
    "# TESTING DATA df -------------------------------------------------------------------------------------\n",
    "df=pd.read_csv(\"Training.csv\")\n",
    "\n",
    "df.replace({'prognosis':{'Fungal infection':0,'Allergy':1,'GERD':2,'Chronic cholestasis':3,'Drug Reaction':4,\n",
    "'Peptic ulcer diseae':5,'AIDS':6,'Diabetes ':7,'Gastroenteritis':8,'Bronchial Asthma':9,'Hypertension ':10,\n",
    "'Migraine':11,'Cervical spondylosis':12,\n",
    "'Paralysis (brain hemorrhage)':13,'Jaundice':14,'Malaria':15,'Chicken pox':16,'Dengue':17,'Typhoid':18,'hepatitis A':19,\n",
    "'Hepatitis B':20,'Hepatitis C':21,'Hepatitis D':22,'Hepatitis E':23,'Alcoholic hepatitis':24,'Tuberculosis':25,\n",
    "'Common Cold':26,'Pneumonia':27,'Dimorphic hemmorhoids(piles)':28,'Heart attack':29,'Varicose veins':30,'Hypothyroidism':31,\n",
    "'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis':34,'Arthritis':35,\n",
    "'(vertigo) Paroymsal  Positional Vertigo':36,'Acne':37,'Urinary tract infection':38,'Psoriasis':39,\n",
    "'Impetigo':40}},inplace=True)\n",
    "\n",
    "# print(df.head())\n",
    "\n",
    "X= df[l1]\n",
    "\n",
    "y = df[[\"prognosis\"]]\n",
    "np.ravel(y)\n",
    "# print(y)\n",
    "\n",
    "# TRAINING DATA tr --------------------------------------------------------------------------------\n",
    "tr=pd.read_csv(\"Testing.csv\")\n",
    "tr.replace({'prognosis':{'Fungal infection':0,'Allergy':1,'GERD':2,'Chronic cholestasis':3,'Drug Reaction':4,\n",
    "'Peptic ulcer diseae':5,'AIDS':6,'Diabetes ':7,'Gastroenteritis':8,'Bronchial Asthma':9,'Hypertension ':10,\n",
    "'Migraine':11,'Cervical spondylosis':12,\n",
    "'Paralysis (brain hemorrhage)':13,'Jaundice':14,'Malaria':15,'Chicken pox':16,'Dengue':17,'Typhoid':18,'hepatitis A':19,\n",
    "'Hepatitis B':20,'Hepatitis C':21,'Hepatitis D':22,'Hepatitis E':23,'Alcoholic hepatitis':24,'Tuberculosis':25,\n",
    "'Common Cold':26,'Pneumonia':27,'Dimorphic hemmorhoids(piles)':28,'Heart attack':29,'Varicose veins':30,'Hypothyroidism':31,\n",
    "'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis':34,'Arthritis':35,\n",
    "'(vertigo) Paroymsal  Positional Vertigo':36,'Acne':37,'Urinary tract infection':38,'Psoriasis':39,\n",
    "'Impetigo':40}},inplace=True)\n",
    "\n",
    "X_test= tr[l1]\n",
    "y_test = tr[[\"prognosis\"]]\n",
    "np.ravel(y_test)\n",
    "# ------------------------------------------------------------------------------------------------------\n",
    "\n",
    "def DecisionTree():\n",
    "\n",
    "    from sklearn import tree\n",
    "\n",
    "    clf3 = tree.DecisionTreeClassifier()   # empty model of the decision tree\n",
    "    clf3 = clf3.fit(X,y)\n",
    "\n",
    "    # calculating accuracy-------------------------------------------------------------------\n",
    "    from sklearn.metrics import accuracy_score\n",
    "    y_pred=clf3.predict(X_test)\n",
    "    print(accuracy_score(y_test, y_pred))\n",
    "    print(accuracy_score(y_test, y_pred,normalize=False))\n",
    "    # -----------------------------------------------------\n",
    "\n",
    "    psymptoms = [Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]\n",
    "\n",
    "    for k in range(0,len(l1)):\n",
    "        # print (k,)\n",
    "        for z in psymptoms:\n",
    "            if(z==l1[k]):\n",
    "                l2[k]=1\n",
    "\n",
    "    inputtest = [l2]\n",
    "    predict = clf3.predict(inputtest)\n",
    "    predicted=predict[0]\n",
    "\n",
    "    h='no'\n",
    "    for a in range(0,len(disease)):\n",
    "        if(predicted == a):\n",
    "            h='yes'\n",
    "            break\n",
    "\n",
    "\n",
    "    if (h=='yes'):\n",
    "        t1.delete(\"1.0\", END)\n",
    "        t1.insert(END, disease[a])\n",
    "    else:\n",
    "        t1.delete(\"1.0\", END)\n",
    "        t1.insert(END, \"Not Found\")\n",
    "\n",
    "\n",
    "def randomforest():\n",
    "    from sklearn.ensemble import RandomForestClassifier\n",
    "    clf4 = RandomForestClassifier()\n",
    "    clf4 = clf4.fit(X,np.ravel(y))\n",
    "\n",
    "    # calculating accuracy-------------------------------------------------------------------\n",
    "    from sklearn.metrics import accuracy_score\n",
    "    y_pred=clf4.predict(X_test)\n",
    "    print(accuracy_score(y_test, y_pred))\n",
    "    print(accuracy_score(y_test, y_pred,normalize=False))\n",
    "    # -----------------------------------------------------\n",
    "\n",
    "    psymptoms = [Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]\n",
    "\n",
    "    for k in range(0,len(l1)):\n",
    "        for z in psymptoms:\n",
    "            if(z==l1[k]):\n",
    "                l2[k]=1\n",
    "\n",
    "    inputtest = [l2]\n",
    "    predict = clf4.predict(inputtest)\n",
    "    predicted=predict[0]\n",
    "\n",
    "    h='no'\n",
    "    for a in range(0,len(disease)):\n",
    "        if(predicted == a):\n",
    "            h='yes'\n",
    "            break\n",
    "\n",
    "    if (h=='yes'):\n",
    "        t2.delete(\"1.0\", END)\n",
    "        t2.insert(END, disease[a])\n",
    "    else:\n",
    "        t2.delete(\"1.0\", END)\n",
    "        t2.insert(END, \"Not Found\")\n",
    "\n",
    "\n",
    "def NaiveBayes():\n",
    "    from sklearn.naive_bayes import GaussianNB\n",
    "    gnb = GaussianNB()\n",
    "    gnb=gnb.fit(X,np.ravel(y))\n",
    "\n",
    "    # calculating accuracy-------------------------------------------------------------------\n",
    "    from sklearn.metrics import accuracy_score\n",
    "    y_pred=gnb.predict(X_test)\n",
    "    print(accuracy_score(y_test, y_pred))\n",
    "    print(accuracy_score(y_test, y_pred,normalize=False))\n",
    "    # -----------------------------------------------------\n",
    "\n",
    "    psymptoms = [Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]\n",
    "    for k in range(0,len(l1)):\n",
    "        for z in psymptoms:\n",
    "            if(z==l1[k]):\n",
    "                l2[k]=1\n",
    "\n",
    "    inputtest = [l2]\n",
    "    predict = gnb.predict(inputtest)\n",
    "    predicted=predict[0]\n",
    "\n",
    "    h='no'\n",
    "    for a in range(0,len(disease)):\n",
    "        if(predicted == a):\n",
    "            h='yes'\n",
    "            break\n",
    "\n",
    "    if (h=='yes'):\n",
    "        t3.delete(\"1.0\", END)\n",
    "        t3.insert(END, disease[a])\n",
    "    else:\n",
    "        t3.delete(\"1.0\", END)\n",
    "        t3.insert(END, \"Not Found\")\n",
    "\n",
    "# gui_stuff------------------------------------------------------------------------------------\n",
    "\n",
    "root = Tk()\n",
    "root.configure(background='orange')\n",
    "\n",
    "# entry variables\n",
    "Symptom1 = StringVar()\n",
    "Symptom1.set(None)\n",
    "Symptom2 = StringVar()\n",
    "Symptom2.set(None)\n",
    "Symptom3 = StringVar()\n",
    "Symptom3.set(None)\n",
    "Symptom4 = StringVar()\n",
    "Symptom4.set(None)\n",
    "Symptom5 = StringVar()\n",
    "Symptom5.set(None)\n",
    "Name = StringVar()\n",
    "\n",
    "# Heading\n",
    "w2 = Label(root, justify=LEFT, text=\"Disease Predictor using Machine Learning\", fg=\"white\", bg=\"orange\")\n",
    "w2.config(font=(\"Elephant\", 30))\n",
    "w2.grid(row=1, column=0, columnspan=2, padx=100)\n",
    "\n",
    "# labels\n",
    "NameLb = Label(root, text=\"Name of the Patient\", fg=\"yellow\", bg=\"black\")\n",
    "NameLb.grid(row=6, column=0, pady=15, sticky=W)\n",
    "\n",
    "\n",
    "S1Lb = Label(root, text=\"Symptom 1\", fg=\"yellow\", bg=\"black\")\n",
    "S1Lb.grid(row=7, column=0, pady=10, sticky=W)\n",
    "\n",
    "S2Lb = Label(root, text=\"Symptom 2\", fg=\"yellow\", bg=\"black\")\n",
    "S2Lb.grid(row=8, column=0, pady=10, sticky=W)\n",
    "\n",
    "S3Lb = Label(root, text=\"Symptom 3\", fg=\"yellow\", bg=\"black\")\n",
    "S3Lb.grid(row=9, column=0, pady=10, sticky=W)\n",
    "\n",
    "S4Lb = Label(root, text=\"Symptom 4\", fg=\"yellow\", bg=\"black\")\n",
    "S4Lb.grid(row=10, column=0, pady=10, sticky=W)\n",
    "\n",
    "S5Lb = Label(root, text=\"Symptom 5\", fg=\"yellow\", bg=\"black\")\n",
    "S5Lb.grid(row=11, column=0, pady=10, sticky=W)\n",
    "\n",
    "\n",
    "lrLb = Label(root, text=\"DecisionTree\", fg=\"white\", bg=\"red\")\n",
    "lrLb.grid(row=15, column=0, pady=10,sticky=W)\n",
    "\n",
    "destreeLb = Label(root, text=\"RandomForest\", fg=\"white\", bg=\"red\")\n",
    "destreeLb.grid(row=17, column=0, pady=10, sticky=W)\n",
    "\n",
    "ranfLb = Label(root, text=\"NaiveBayes\", fg=\"white\", bg=\"red\")\n",
    "ranfLb.grid(row=19, column=0, pady=10, sticky=W)\n",
    "\n",
    "# entries\n",
    "OPTIONS = sorted(l1)\n",
    "\n",
    "NameEn = Entry(root, textvariable=Name)\n",
    "NameEn.grid(row=6, column=1)\n",
    "\n",
    "S1En = OptionMenu(root, Symptom1,*OPTIONS)\n",
    "S1En.grid(row=7, column=1)\n",
    "\n",
    "S2En = OptionMenu(root, Symptom2,*OPTIONS)\n",
    "S2En.grid(row=8, column=1)\n",
    "\n",
    "S3En = OptionMenu(root, Symptom3,*OPTIONS)\n",
    "S3En.grid(row=9, column=1)\n",
    "\n",
    "S4En = OptionMenu(root, Symptom4,*OPTIONS)\n",
    "S4En.grid(row=10, column=1)\n",
    "\n",
    "S5En = OptionMenu(root, Symptom5,*OPTIONS)\n",
    "S5En.grid(row=11, column=1)\n",
    "\n",
    "\n",
    "dst = Button(root, text=\"DecisionTree\", command=DecisionTree,bg=\"green\",fg=\"red\")\n",
    "dst.grid(row=8, column=3,padx=10)\n",
    "\n",
    "rnf = Button(root, text=\"Randomforest\", command=randomforest,bg=\"green\",fg=\"red\")\n",
    "rnf.grid(row=9, column=3,padx=10)\n",
    "\n",
    "lr = Button(root, text=\"NaiveBayes\", command=NaiveBayes,bg=\"green\",fg=\"red\")\n",
    "lr.grid(row=10, column=3,padx=10)\n",
    "\n",
    "#textfileds\n",
    "t1 = Text(root, height=1, width=40,bg=\"white\",fg=\"black\")\n",
    "t1.grid(row=15, column=1, padx=10)\n",
    "\n",
    "t2 = Text(root, height=1, width=40,bg=\"white\",fg=\"black\")\n",
    "t2.grid(row=17, column=1 , padx=10)\n",
    "\n",
    "t3 = Text(root, height=1, width=40,bg=\"white\",fg=\"black\")\n",
    "t3.grid(row=19, column=1 , padx=10)\n",
    "\n",
    "root.mainloop()\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

Comments

Leave a Comment

Post a Comment