Predicting Incident Duration Based on Machine Learning Methods

Traffic incidents dont only cause various levels of traffic congestion but often contribute to traffic accidents and secondary accidents, resulting in substantial loss of life, economy, and productivity loss in terms of injuries and deaths, increased travel times and delays, and excessive consumption of energy and air pollution. Therefore, it is essential to accurately estimate the duration of the incident to mitigate these effects. Traffic management center incident logs and traffic sensors data from Eastbound Interstate 70 (I70) in Missouri, United States collected during the period from January 2015 to January 2017, with a total of 352 incident records were used to develop incident duration estimation models. This paper investigated different machine learning (ML) methods for traffic incidents duration prediction. The attempted ML techniques include Support Vector Machine (SVM), Random Forest (RF), and Neural Network Multi-Layer Perceptron (MLP). Root mean squared error (RMSE) and Mean absolute error (MAE) were used to evaluate the performance of these models. The results showed that the performance of the models was comparable with SVM models slightly outperforms the RF, and MLP models in terms of MAE index, where MAE was 14.23 min for the best-performing SVM models. Whereas, in terms of the RMSE index, RF models slightly outperformed the other two models given RMSE of 18.91 min for the best-performing RF model.


INTRODUCTION
Traffic congestion arises when the traffic demand on the highway surpasses its usable capacity. There are two forms of congestion which are: recurrent and non-recurrent. Recurrent congestion, during peak hours, is associated with the physical configuration of the highway, which means that it is mainly caused by high traffic volume in the capacity of the roadway. Moreover, non-recurrent traffic congestion is caused by unplanned events on the highway such as incidents, stranded vehicles, public manifestations, weather, and work zones. Since highway work zones associated with patching, flooring, lane marking, rubble removal, and weeding are followed by temporary reduction of capacity on the highway, and the congestion caused by them can be extremely high portion of the whole traffic congestion [1], [2]. Although non-recurrent congestion is difficult to predict as a result of its random nature, the researches on impact and duration of the traffic incidents are still one of the main focuses for the traffic operators due to the serious social and economic losses generated [3]. Thus, different studies have been initiated to establish mitigation strategies which minimize non-recurrent congestion due to parkway incidents, where traffic incidents DOI: https://doi.org/10.33103/uot.ijccce. 21.1.1 are the main causes of non-recurrent congestion that cause travel delay and can lead to secondary crashes. Therefore, accurate estimation of traffic incident duration plays an essential role to mitigate these effects [4], [5]. To assist a timely response, traffic management centers build workflows are consisting of data collection, analyzing it and implementing the chosen strategy, constantly the usage of updated statistics to monitor traffic, publish information and control incident response resources [6]. The purpose of this study is to investigate various machine learning methods to predict the duration of the incident.
The remainder of this paper is structured as follows: Section II. introduces a review of previous researches on the prediction of incidents duration. Section III. provides an overview of the duration of the incident and its phases. A methodology of the proposed models with the three machine learning methods, namely Support Vector Machine (SVM), Random Forest (RF), Neural Network Multi-Layer Perceptron (MLP) that used to develop the incident duration prediction models is shown in Section IV. Study area details are given in section V. Moreover, the results obtained from the three models are discussed in Section VI. Finally, the conclusion is provided in Section VII.

III. INCIDENT DURATION
Duration is one of the characteristics of incidents that determines the magnitude of congestion and, thus, has been extensively researched [17]. The incidents duration is generally defined as the time between incident occurrence and clearance of the roadway, which means it's the time elapsed from the occurrence of the incident until all proof of the incident has been removed from the incident scene. This time can be divided into three phases [3] Also, there is an additional phase that can be added to the total time of incident duration which is the recovery time (the time between clearance of the incident and return to normal status). Compared with other phases, incident clearance is the most time-consuming stage in the overall incident management process, a severe incident which was not efficaciously cleared can double or even triple the total duration of the incident [13]. Clearing an incident quickly needs an appropriate managerial assistance in order to make some successful decisions about the resources required to deploy the team and clean the incident scene in a timely manner. This can frequently be accomplished with the aid of operators through among others, a good understanding of the factors influencing incident duration and the efficient use of the predicted incident duration information. Furthermore, any information that is forecasted about the duration of the incident is important for the prevention of congestion caused by the traffic incident where that information warns motorists of the need to re-route or re-schedule their trips. Therefore, incident duration estimation constitutes one of the most essential steps in the incident management process [4].

IV. METHODOLOGY
Three machine learning (ML) methods were used in this study, these includes Support Vector Machine (SVM), Random Forest (RF), and Neural Network Multi-Layer Perceptron (MLP). Measure Indexes, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) were used to assess the performance of the developed models. Three models from each one of these three methods were developed with the use of the hyperparameter tuning process and by changing the data split ratio. The dataset was split into (70&30), and (80&20) for the training and validation processes, where the 80%, and 70% were used as the training datasets, while the 30%, and 20% were the testing datasets. The training dataset was used for calibrating the chosen model, while the testing dataset was used to evaluate the performance of the models using a dataset independent from the one used to train the models. Where the testing dataset is the unseen data which means these data are not used in the training process but only the training dataset is available and the testing dataset will only be available during the evaluation of the developed model. Next, a brief description of each one of the three methods is given.

A. Support Vector Machine
In ML, support vector machines (SVMs, which also called support-vector networks) are supervised learning models with related learning algorithms that analyze data used for classification and regression analysis. SVM is one of the most popular models in Machine Learning in the context of statistical learning theory, based on the Structural Risk Minimization Theory (SRM) developed by Vapnik [18], [19].
SVM is a very robust and flexible Machine Learning model, that can perform linear or nonlinear classification, regression, as well as outlier detection. SVMs are particularly suitable for small-or medium-sized datasets. An SVM model is an illustration of the examples as points in space, mapped so that the examples of the individual categories are split by a clear gap which is as large as possible. New examples are then arranged into that same space and forecast to belong to a category based on the side of the gap on which they fall. An SVM estimator (f) on regression can be expressed as: (1) where ∅ denotes a nonlinear transfer function that mapped the input vectors to a highdimensional feature space in which the sample data are linearly separable. W is the weight and b is the offset [18], [20].
The SVR models were trained by the independent variables. The Kernel parameter with type-rbf and Kernel with type-linear were selected for the development of the three DOI: https://doi.org/10.33103/uot.ijccce.21.1.1 SVR models. Fig. 2 illustrated the SVR prediction model diagram. The SVR algorithm procedure for predicting the incidents duration is as follows: Step 1: Determine the input and output of SVR prediction model, where the independent variables , the target variable and the duration of the incidents, were considered as the input and the output of SVR prediction model.
Step 2: Construct the SVR prediction model, this is done by first: determine the training sample set, T = {(xi, yi), i= 1,2, …} according to Step 1, where xi denotes the ith sample of the independent variables and yi denotes the ith sample of the dependent variable. Then, select the parameters of SVR algorithm to build and train the model, and finally, validate the model using the test sample set.
Step 3: Predict the incidents duration by the prediction model generated in Step 2.

B. Random Forest
Random forests or random decision forests are an ensemble learning method for classification, regression, and other tasks that function by constructing a multitude of decision trees at training time and producing the class in the case of classification or mean prediction in the case of regression of the individual trees. The first algorithm for random decision forests was produced by Tin Kam Ho using the random subspace method.
Random forest, as the name implies, is a combination of decision trees, where each tree is trained using a randomly selected subset of available training data, as it combined the randomness that is used to take the subset of data with having a bunch of decision trees, hence a forest. The RF regression prediction can be expressed as: where (x) is the value predicted by the b th tree of the forest, x is the predictor variables, B is the number of trees in the forest, and ( ) is the average of the values predicted by the trees, as this average is the final value predicted by the forest [21], [22]. Fig. 3 illustrated how RF algorithm works, where the creation of this algorithm can be presented as follows: Step 1: Creating a bootstrapped subset, where it can be performed by:  Randomly select "v" variables from the training dataset "t", where v << t.
Step 2: Creating a decision tree, this can be performed by:  Among the "v" variables, create the node "n", root node, using the best split and the bootstrapped subset created in the previous step.  Branch the node into daughter nodes using the best split.  Repeat the previous steps until the desired number of nodes has been reached.
Step 3: Building a forest, where it can be performed by:  Repeat steps 1 and 2 for "n" number times to create the desired number of trees "B". After the RF model has been created, then the prediction is performed by taking the average of the decision trees outcome, where each tree predicts different incidents duration for the same dataset, and consider that average as the final prediction from the RF model. As shown in equation (2).

C. Neural Network Multi-Layer Perceptron
A multilayer perceptron (MLP) is a deep, artificial neural network. It consists of more than one perceptron. It consists of an input layer for receiving the signal, an output layer which makes a decision or prediction about the input, and in between these two, a random number of hidden layers which represent the true computational engine of the MLP. MLPs are able to approximate any continuous function with one hidden layer.
MLPs are also applied to supervised learning problems, where they train on a set of input-output pairs and learn to model the correlation (or dependencies) between these inputs and outputs. The training process includes changing the parameters, or weights and biases, of the model to reduce errors. Backpropagation is used to make these weight and bias changes relative to the error, and the same error may be calculated in a number of ways, included by root mean squared error (RMSE) [23].
Two hidden layers were used to construct each MLP model with the input and output layers. The number of neurons in each hidden layer was either 100 or 200 neurons. The DOI: https://doi.org/10.33103/uot.ijccce.21.1.1 rectified function was used as the activation function. As illustrated in Fig. 4, the learning process of the perceptrons in the MLP model is demonstrated in the following steps: Step 1: Takes the traffic and incidents dataset inputs in the input layer that are fed into the perceptron, and multiplies these inputs by their weights, and calculates the sum.
Step 2: Adds the bias weight to the sum.
Step 3: Feeds the sum through the activation function, the relu function was used as the activation function.
Step 4: The result of the relu function is the output.
This paper proposed a framework to develop models which can predict the incident duration to assist the transportation agencies and first responders to trigger the required preventative actions. Datasets were split into two parts, the first part was used to train the model with the best selected hyperparameter, the second unseen part of the dataset was used to evaluate the developed model. This section presented the steps that are used in the models building process, which can be illustrated as: First, data input, which means loading the data. Then, variables identification, this step includes the identification of the (X, Y) variables, where the Xs are the predictor variables, also called the independent variables.
Whereas Y is the target, it is also called the response variable or the dependent variable.
Then, data split, this step involves the division of the data based on a specific ratio, for example (80% & 20%), or (70% & 30%). Furthermore, method selection, this step involves choosing one of the machine learning methods that have been used in this work, which are: SVR, RF, and MLP methods. Also, parameters initialization, determining the method's parameters is done in this step, where each one of the methods has its own parameters that differ from the other methods. Then, model training, based on the concept of how machine learning methods work, as it trained a model using specific input data and output data (in case of supervised learning) to perform a particular task, this step considers the training of the model by using the training dataset that is provided in the splitting phase. And, model testing (evaluation), it is also called model validation; this phase tests the models. The MSE, RMSE, and MAE error indexes were used to evaluate the model's performance.
Finally, the prediction step, which gives the predicted incident duration results. Also there can be a model optimization, where this phase involves the optimization of the model, which can be done either by the changing of the data split ratio, or by the hyperparameter tuning process.

V. STUDY AREA
Traffic management center incident logs and traffic sensors data collected over a 7.3-mile segment of Eastbound Interstate 70 (I-70) in Missouri, United States from January 2015 to January 2017 were used to develop incident duration estimation models. In total, 352 incidents were reported during this period. The average of the incident duration time was 27.6 minutes.
The incident logs included incident type, location and time of the incident, and weather conditions when the incident was reported. The I-70 segment was partitioned into 7 sections. A remote traffic microwave sensor was installed at upstream of each section to collect the traffic flow rate, speed, and occupancy in each lane. Traffic data were collected over 5-min intervals. Traffic data were added to the incident logs based on the location of the incidents and the time when the incidents were reported. Traffic data from a traffic sensor upstream of the incident location were added to the incident logs. Table 1 illustrated all the data variables that are used to build the incident duration prediction models, with description of each variable. Where the target variable is the incident duration variable, also called the response variable, whereas the other variables were the predictor variables.

Peak Hours
Indicates whether the accident occurred during peak hours (Morning Peak: from 6:00 AM to 9:00 AM, Afternoon Peak: from 3:00 PM to 6:00 PM) or non-peak hours. Values: AM Peak, PM Peak, and Off-Peak.

Day of Week
The day of the week when the accident occurred.

Traffic Pattern
Indicates whether the accident occurred on a weekday or a weekend. Values: Weekday, Weekend. Total Vehicle Count The number of involved vehicles in an accident.

Weather
The weather condition at the time of the accident. Values: Normal, Winter Storm, and Rain. Accident Overturned Car Values: yes, no.

VI. RESULTS
Incident duration prediction models were developed by using SVM, RF, and MLP methods. Tables 2,3, and 4 illustrated the RMSE, and MAE indexes that were used to evaluate each model for training, testing, and prediction datasets. Next, a brief description for each model will be discussed in details. DOI: https://doi.org/10.33103/uot.ijccce.21.1.1

A. Support Vector Machines Models
Three models were built by using the SVM method, these include SVM-Model-1, SVM-Model-2, SVM-Model-3. In SVM-Model-1, the dataset was split into two parts, 70% for training and 30% for validation, the Kernel parameter with type-rbf was selected in this model. The regularization parameter (C) was equal to 1.0, and the size of the kernel cache was 200 (in MB).
SVM-Model-2 was also built based on a 70%-30% split ratio for training and validation datasets, this model was enhanced by tuning one of the model hyperparameters, this improvement includes the use of Kernel with type-Linear. The regularization parameter (C) was equal to 1.0. The size of the kernel cache was 200 MB.
In SVM-Model-3, the dataset was split into 80%, and 20% for the training and testing process, respectively, with the use of Kernel parameter type-rbf. The size of the kernel cache was also 200 MB, and the regularization parameter (C) was equal to 1.0.
The analysis of these three models have shown that the best performing SVM model was SVM-Model-2, given RMSE of 23.42 min, and MAE of 14.23 min (for the prediction category). The RMSE and MAE indexes decreased by 6.87 min and 0.52 min in the testing in comparison with training. Not falling much behind, the worst SVM model was SVM-Model-1, where the RMSE, and MAE indexes for the prediction model were 23.65 min, and 15.06 min, respectively.

B. Random Forest Models
Three RF models were constructed by using the RF method, which are, RF-Model-1, RF-Model-2, RF-Model-3.
In RF-Model-1 the forest was constructed with 100 trees, and the dataset was split into 70% for training the model, and 30% for the validation process. The maximum depth of the tree was 2, and the minimum number of samples required to be at a leaf node was 1. The prediction RMSE, and MAE indexes were 19.12 min, and 14.61 min, respectively.
While in RF-Model-2, the model included 50 trees, with the same split ratio as the previous model. The maximum depth of the tree was 2, and the minimum number of samples required to be at a leaf node was 1. The minimum number of samples required to split an internal node was 2. RMSE index for the prediction model was 19.13 min, and the prediction MAE index was 14.67 min.   The results have shown that all the three models that built depending on RF method were comparable with slight differences, where RF-Model-3 outperforms the other two. The RMSE and MAE indexes increased by 2.17 min and 2.2 min in the testing in comparison with training.

C. Neural Network Multi-Layer Perceptron Models
Three MLP models have been investigated, these include MLP-Model-1, MLP-Model-2, MLP-Model-3, and each one of these models composed of two hidden layers. Activation function type-relu was used in these three models. In MLP-Model-1, the number of neurons in the hidden layer was 100 neurons, and the dataset was split into 70% for training the model, and 30% for the validation process. A learning rate type-constant was used. The maximum number of iterations was 200. While in MLP-Model-2, with the same split ratio as the previous model, the model was enhanced using 200 neurons for each hidden layer. A learning rate type-constant was used. The maximum number of iterations was 200.
MLP-Model-3 was built based on an 80%-20% split ratio for training and validation datasets. The hidden layers included 100 neurons. A learning rate type-constant was used. The maximum number of iterations was 200.
The results of these three MLP models indicated that the best performing MLP model was  The SVM, RF, and MLP models, for the training, testing, and prediction datasets, have almost the same error range in terms of RMSE and MAE. When comparing the best-performing models of these three techniques, it can be found that the SVM model slightly outperformed the other two models, for the prediction dataset, the best-performing SVM model scored MAE of 14.23 min as compared to 14.58 min, and 15.42 min MAE for the best-performing RF and MLP models, respectively. On the other hand, in terms of RMSE indexes for the prediction dataset of these three methods, it has been shown that RF model slightly outperformed the SVM and MLP models, given RMSE of 18.91 min for the best-performing RF model as compared to 23.42 min and 22.64 min RMSE for the best-performing SVM and MLP models, respectively.

VII. CONCLUSION
Traffic incidents are one of the primary reasons for the Non-recurrent congestion which in turn can lead to secondary accidents. Accurately predicting the duration of an incident plays an important role in reducing the impact of the Non-recurrent congestion on road capacity reduction and massive travel time loss. Traffic incident logs and traffic sensors data from Eastbound Interstate 70 (I-70) in Missouri, United States were collected over two years with a total of 352 incidents. This study investigated different machine learning algorithms to predict the duration of the incident. Support Vector Machine, Random Forest, and Neural Network Multi-Layer Perceptron were developed for the estimation of the duration. Root mean squared error and mean absolute error were used to assess the models' performance and it was found that the SVM model slightly outperformed the other two models, where the best-performing SVM model scored MAE of 14.23 min. While, in terms of RMSE indexes for the prediction dataset of these three methods, ithas been shown that the RF model slightly outperformed the SVM and MLP models, achieving RMSE of 18.91 min for the best-performing RF model.