Predicting Border Crossing Entry Data using various Machine learning techniques
Create New

Predicting Border Crossing Entry Data using various Machine learning techniques

Project period

10/20/2019 - 11/18/2019

Views

63

0



Predicting Border Crossing Entry Data using various Machine learning techniques
Predicting Border Crossing Entry Data using various Machine learning techniques

Driving is considered to be one of the most difficult tasks of the day. All the drivers, it may be men or women will definitely experience tiredness or frustration whenever they have to drive through long traffic conditions. Nowadays, Road accidents play a major problem in public health and development. Road injuries and accidents are predicted to increase if road safety is not addressed adequately.  Also, road traffic is the most complicated daily need. According to a report, more than 150,000 people are killed each year in traffic accidents, leading to around 400 deaths per day. Studies have released that road accidents and death- laceration ratio will keep on increasing. Designing and controlling traffic by advanced systems are available in order to fulfill the vital traffic needs. Assumption on the risks in traffic and the law and regulations will tend to reduce the road accidents.

Why: Problem statement

Nowadays traffic has been considered a difficult structure in designing and managing by the reason of increasing large number of vehicles. This situation has increased road accidents. Road accidents have influenced public health and country economy and many studies have been done to obtain a solution. Arising the need of accession to information from this large calibrated data obtained the cornerstone of the data mining. In this project, we will be using the most advanced machine learning classification techniques for road accident prediction by data mining.

There are a number of problems with trending practices for prevention of the accidents occurring in all areas. We will use few databases that are readily available officially by many sectors and government websites. The collected data will be analyzed, integrated and grouped together based on different constraints using the best-suited algorithm. This analysis will be useful to examine and identify the mistakes and the possible reasons for road accidents. It will also be helpful while construction roads and bridges. These predictions made will be very much helpful to plan and manage such problems.

How: Solution description

Data Collection:

I collected the US border crossing entry data dataset for this problem. It is a multivariate dataset containing attributes that are: port name, state, portcode, border, measure, value and location. Here, we are not using any data cleaning process since there are no missing values.

The Block Diagram has been shown below:

 

Machine Learning Models:

First, I used the k-nearest-neighbor algorithm. Often abbreviated knn, it is an approach to data classification that estimates a data point is to be a member of one group or the other depending on the group the data points nearest. While training the model, I got the accuracy of 0.7894.

Then, I used the decision-tree classifier algorithm. It is a predictive modeling tool that has applications spanning a number of different areas, and decision trees are constructed via an algorithmic approach that identifies ways to split a data set based on different conditions. While training the model, I got the accuracy of 1.0.

How is it different from competition

Deep learning is the best method used for transportation or traffic-related predictions. But the existing projects fall short in different angles such as the use of the information, insufficient depth of machine learning tools, etc.

To analyze the raw data manually and predict a solution for traffic congestions form a tough process. But machine learning technology automatically detects the patterns or information from the available data, using the data mining algorithms. when using Data mining algorithms. Decision trees are the best-used approach for representing the data. Using Decision trees, data can be clearly understood in the most clear form of data. Every algorithm has a unique decision tree from the input data.

After using machine-learning models, I was able to conclude that the decision tree classifier algorithm is the best classification algorithm for the border crossing entry dataset.

Who are your customers

Traffic Controllers and Traffic Police are my Customers.

Project Phases and Schedule

Phase 1: Data Collection

Phase 2: Data Cleaning

Phase 3: Data Analysis 

Phase 4: Prediction using machine learning techniques.

Resources Required

Software Used:

1. Anaconda tool

2. Python 3.7

3. Text editor - Jupyter notebook

Download:
Project Code Code copy
/* Your file Name : bordercrossing.ipynb */
/* Your coding Language : python */
/* Your code snippet start here */
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "data=pd.read_csv('Book1.csv',index_col='portname')\n",
    "# changing the column names for convenience\n",
    "data.columns = list(map(str.lower, data.columns.values.tolist()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th>portcode</th>\n",
       "      <th>border</th>\n",
       "      <th>measure</th>\n",
       "      <th>value</th>\n",
       "      <th>location</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>portname</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Calexico East</th>\n",
       "      <td>California</td>\n",
       "      <td>2507</td>\n",
       "      <td>US-Mexico Border</td>\n",
       "      <td>Trucks</td>\n",
       "      <td>34447</td>\n",
       "      <td>POINT (-115.48433000000001 32.67524)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Van Buren</th>\n",
       "      <td>Maine</td>\n",
       "      <td>108</td>\n",
       "      <td>US-Canada Border</td>\n",
       "      <td>Rail Containers Full</td>\n",
       "      <td>428</td>\n",
       "      <td>POINT (-67.94271 47.16207)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Otay Mesa</th>\n",
       "      <td>California</td>\n",
       "      <td>2506</td>\n",
       "      <td>US-Mexico Border</td>\n",
       "      <td>Trucks</td>\n",
       "      <td>81217</td>\n",
       "      <td>POINT (-117.05333 32.57333)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Nogales</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>2604</td>\n",
       "      <td>US-Mexico Border</td>\n",
       "      <td>Trains</td>\n",
       "      <td>62</td>\n",
       "      <td>POINT (-110.93361 31.340279999999996)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Trout River</th>\n",
       "      <td>New York</td>\n",
       "      <td>715</td>\n",
       "      <td>US-Canada Border</td>\n",
       "      <td>Personal Vehicle Passengers</td>\n",
       "      <td>16377</td>\n",
       "      <td>POINT (-73.44253 44.990010000000005)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    state  portcode            border  \\\n",
       "portname                                                \n",
       "Calexico East  California      2507  US-Mexico Border   \n",
       "Van Buren           Maine       108  US-Canada Border   \n",
       "Otay Mesa      California      2506  US-Mexico Border   \n",
       "Nogales           Arizona      2604  US-Mexico Border   \n",
       "Trout River      New York       715  US-Canada Border   \n",
       "\n",
       "                                   measure  value  \\\n",
       "portname                                            \n",
       "Calexico East                       Trucks  34447   \n",
       "Van Buren             Rail Containers Full    428   \n",
       "Otay Mesa                           Trucks  81217   \n",
       "Nogales                             Trains     62   \n",
       "Trout River    Personal Vehicle Passengers  16377   \n",
       "\n",
       "                                            location  \n",
       "portname                                              \n",
       "Calexico East   POINT (-115.48433000000001 32.67524)  \n",
       "Van Buren                 POINT (-67.94271 47.16207)  \n",
       "Otay Mesa                POINT (-117.05333 32.57333)  \n",
       "Nogales        POINT (-110.93361 31.340279999999996)  \n",
       "Trout River     POINT (-73.44253 44.990010000000005)  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1c52412a5f8>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sb.boxplot(x='portcode', y='value', data=data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1c524a22710>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sb.boxplot(x= 'portcode', y='border', data = data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x1c524a66470>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sb.boxplot(x= 'portcode', y= 'state', data = data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\hp\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:230: UserWarning: The `size` paramter has been renamed to `height`; please update your code.\n",
      "  warnings.warn(msg, UserWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<seaborn.axisgrid.FacetGrid at 0x1c524b924a8>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 416x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sb.FacetGrid(data, hue='border', size=4).\\\n",
    "                   map(plt.scatter, 'portcode',\n",
    "                   'portcode').add_legend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\hp\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:230: UserWarning: The `size` paramter has been renamed to `height`; please update your code.\n",
      "  warnings.warn(msg, UserWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<seaborn.axisgrid.FacetGrid at 0x1c524c5acc0>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 416x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sb.FacetGrid(data, hue='border', size=4).\\\n",
    "                   map(plt.scatter, 'state',\n",
    "                   'state').add_legend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\hp\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:230: UserWarning: The `size` paramter has been renamed to `height`; please update your code.\n",
      "  warnings.warn(msg, UserWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<seaborn.axisgrid.FacetGrid at 0x1c524c32048>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 416x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sb.FacetGrid(data, hue='border', size=4).\\\n",
    "                   map(plt.scatter, 'measure',\n",
    "                   'measure').add_legend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training Accuracy 0.9017857142857143\n",
      "Testing Accuracy 0.7894736842105263\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "# Now we know which features are most likely to contribute to our model\n",
    "# so lets create a model and see the accuracy\n",
    "X_train, X_test, Y_train, Y_test = train_test_split(\n",
    "        data.loc[:, ['portcode', 'value']], \n",
    "        data.loc[:, 'border'])\n",
    "\n",
    "knn = KNeighborsClassifier(n_neighbors=2, p=2, metric='minkowski')\n",
    "knn.fit(X_train, Y_train)\n",
    "print (\"Training Accuracy {}\".format(knn.score(X_train, Y_train)))\n",
    "print (\"Testing Accuracy {}\".format(knn.score(X_test, Y_test)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training Accuracy 1.0\n",
      "Testing Accuracy 1.0\n"
     ]
    }
   ],
   "source": [
    "from sklearn.tree import DecisionTreeClassifier\n",
    "des=DecisionTreeClassifier()\n",
    "des.fit(X_train,Y_train)\n",
    "des.predict(X_test)\n",
    "print(\"Training Accuracy {}\".format(des.score(X_train,Y_train)))\n",
    "print(\"Testing Accuracy {}\".format(des.score(X_test,Y_test)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
View on Github
github_link

Comments

Leave a Comment

Post a Comment