[Audio] Good morning, everybody. I am hojat Behrooz and would like to thank you for giving me opportunities to present you my literature review on Machine learning application in surface transportation systems that have been done by tremendous supports from my advisor Prof. Hayeri and the inspiration and support of Prof. Lipizzi. The study focuses on the most important question,, "why have ML methods not been applied to surface transportation systems as much as they must be, and how could it be improved?.
[Audio] For presenting this study, first, I explain the problem in a clear context. I will focus on similar existing publications, books, and literature reviews about applying ML and AI technics in transportation systems. Then I explain the methodology that I've applied for this study. It follows by presenting the results and findings. Then, I concentrate on the outcomes and discuss a root-cause analysis to examine the outcomes. Finally, I present you a summary of conclusions, followed by the opportunities for future works and recommendations..
[Audio] There is a question in front of us: " ML evolution is in which period right now?" is that in another rising period?. Is there another winter coming soon?. Or we are on primary steps toward a big jump to the future. To investigate the current position, I look at the research paper that has been published at www. arxive.org. As you know the arxive.org is an open-access un peer-reviewed research and study knowledge base in the fields such as mathematics, physics, astronomy, electrical engineering, computer science. Although It was established by cornel university in 1992, it has been swiftly getting popularity in the academic and research world since that time. arXiv.org passed the half-million-article milestone in 2008 and had hit a million by the end of 2014. while the current submission rate is about 16,000 articles per month. Most of its popularity comes from the open access and quick submission methods that make the knowledge circulation fast enough to be more aligned with the rapid pace of internet and network societies age in 21 century. The ML has a specific categorize in arXive.org as CS.LG. as you can see, the ML articles from 200- 300 articles in 2014 jumped to more than 3000 articles in only 2020, about 10 times more in 6 years. and the fraction of the ML articles considerably extended during the last 3 years compared to other computer science branches. It shows serious consideration of the research world to ML as a data-driven to solve many current problems..
[Audio] This study is based on two verticals. First ML and second Surface transportation systems. But maybe you asked, "Why have we chosen surface transportation?" Although transportation has improved during its pass in history and many new mobility modes such as air, water, underground, train, walking, cycling are more and more developed; surface transportation has a crucial role in human life with a significant share of the entire transportation sector. In the US, 86% of the total cargo and passengers move with vehicles. Additionally, the surface transportation system significantly impacts human life by its secondary harms such as traffic accidents and air pollution. Traffic accidents are one of the four most causes of mortality in the US. Also, it has a significant role in air pollution in many megacities in the world and the US. Therefore, research on surface transportation can help more to improve human life. ICT has also impacted the transportation systems and specific surface transportation systems in the last three decades throughout the multidisciplinary knowledge and technological areas named Intelligent transportation systems. Many applications and improvements such as Adaptive traffic signal control, Variable message signs, advanced travelers' information systems, Violation detectors such as speed detectors, traffic control and management centers, Automatic toll collections are the output of this specific interdisciplinary knowledge area. It has improved the transportation system by improving the travel time, reducing congestion, informing the travelers, finding the most cost-effective route, reducing traffic accidents, and improving the efficiency of transportation networks. However, the transportation sector has not got enough benefits from AI and ML compared to other industries such as pharmaceutical, health, agricultural, airspace. This is happening while we know the transportation systems have a significant impact on human life. Therefore, we decided to employ This Literature review to answer a crucial question: "why specifically the surface transportation systems have not been exploited enough from ML compared to other parallel industries?. In this regard, we have developed some more specific questions. For instance, by focusing on the ML algorithms variation, a question arises: which ML algorithms are more used for which STS application. Then the next question is "those applied ML methods what type of input data were used? Do they apply external factors or spatial attributes of traffic data? We also planned to explore which ML-specific properties make them preferable for applying in STS application areas and why?. The answer to this question helps to find the gaps that slow down using ML algorithms in STS application areas. And finally, we want to find resolutions that pave the road to employing ML algorithms in STS application areas more..
[Audio] Now I prefer to explain the existing publication in applying AI/ ML in transportation systems. As I have researched for literature review, books, and publications, there are limited publications in transportation systems and ML applications. I have found 11 publications that have been published more recently. They cover a wide range of ML methods used in the transportation systems and compare them through various measures such as their accuracy, applied practices, available data as input, generalizing the context of the problem, and available experiences in both verticals. They generally compare the outcomes from one ML application to other ML methods or analytical methods ( such as multivariate time series analysis) and show how ML algorithms improve the outcomes in both processing costs and accuracy. For instance, Varghese et al. study summarized the relationship between accuracy and influenced factors of the DL prediction models by a meta-analysis on prediction accuracy. It reports that DL models, particularly those considering spatiotemporal dependencies, show better prediction accuracy than conventional ML models. "Data-driven solutions to transportation problems" is another book written by WANG and ZENG in 2019 and is all about various ML methods applied to the transportation systems. It includes a sample example of actual codes for those applications using sample data. It is a good source for training ML algorithms implementation to Transportation systems problems. There is also particular ML methods application in transportation systems presented by Jiang W, Luo J., which concentrated on GNN. It is a very sophisticated recent source for the GNN application in transportation systems problem solving that categorized and compared the solution from various angles, including data input. It has also tried to collect and organize the available open access input traffic data for future research. Although the reviews and books are excellent sources for ML/AI methodology application into transportation systems, they do not specifically consider surface transportation systems. They are also not specific for ML and generally cover a wider variety of AI and analytical methods. They also do not answer to question of why developer more prefers some specific applied ML methods. And some other ones are rarely used despite their potential. The publication occasionally points to the obstacles in transportation sectors, making the industry less appealing for broad applying ML methods. And therefore, they don't give an explicit recommendation to remove those obstacles. And last but not least, they don't systematically present the application of ML into STS to establish the least considered STS parts and least considered ML methods. These gaps and obstacles were the inspiration of this study..
[Audio] Now, by understanding the gaps and inspiration for this study, I present the applied methodology and approach for this research. Based on the context of the review problem, a systematic literature review is used by adapting and customizing a version of Tsafnet et al. systematic review method. This customized approach presented in the figure gives us the capability to identify the purpose of the review, the needs for revision and finally give us a development strategy in a procedural framework. ··By following the presented procedure's steps, I develop an initial list of questions to be answered as explained before. Then I looked for previous similar publications, books, and literature revised in the context of AI/ ML and transportation systems. And I have found 11 literature reviews and publications that I have already explained. A strategy has been planed by applying bibliometric evaluation using a web of science and googles scholar databases to collect literature in the related field. Then based on that strategy, search phrases have been created and applied. Phrases such as " Surface Transportation" AND ' machine learning' or ' road transportation' AND 'ML' have been used.·· Many alternative phrases have also been used, such as popular ML methods like LSTM and terms such as ' traffic management. More than 400 papers and research are collected by applying this procedure. Then the duplicated results were ignored. ·For the reminded papers, their s are read one by one, and those not in STS or specific to the ML application are disregarded again. The reminding papers' full text is retrieved and saved for a further in-depth examination. By examining about full text of 150 articles, unrelated ones are ignored. ·The next step was focusing on the citation. By applying the snowballing concept, the most referred citation related to the goal of our study is added to the reviewed paper if they are not already on our list. This gave us a precious resource of a new legitimate research paper. After all, those adding and deleting about 100 papers remind. ·For these 100 papers, Their attributes data are extracted and fed to a small database. The dataset consisted of fields such as published year, the web source link, the paper title, of the paper, a short description of the paper's methodology, a summary result, a list of applied ML algorithms, and a list of the transportation areas which the paper considered. This database helped us to synthesize better and categorize the documents based on their attributes. This was the primary material for the entire reviewing process..
[Audio] After targeting the papers, we need a framework for categorizing and grouping them based on their STS application areas and the employed ML methods. I also considered that the papers usually cover more than one STS application and more than one ML method. For dimming, presenting, and analyzing this broad relationship, a graph presentation has been planed. The graph has three verticals. The first one categorizes the STS components into seven main categories includes mode, demand, supply, externality, traffic management, and finally, safety/ security. With These main categories, the STS are subcategorized into 4 levels and 39 minor categories. For instance, the passenger is a subcategory of demand of STS. Or Bus is a subcategory of transit of motorized of mode subcategory of STS. The STS application areas are the research or technologies that perform a service or solve a problem in one or many STS components. STS application areas have five main category includes AV L0-L5 technology advancements, Forecasting, Optimization, ITS technology advancements. For example, arrival time is a subcategory of the forecasting category of STS application areas. On the other side, the ML methods were categorized based on their strategies, input, and output data. They ranked in 4 main category include categories reinforcement learning, classical learning, Neural networks and deep learning, and finally, ensemble learning methods. for example, XGBoost is a subcategory of Boosting of Ensemble learning of ML methods And finally, we have research papers covering one or more STS application areas and one or more ML methodologies. This makes a many-to-many multilayer graph representation. The whole graph can answer any question about the relationship between specific ML methods and specific STS application areas. It also can give us the statistical result about the frequency of applied ML methods into one particular STS category. For instance, we can find out papers that cover reinforcement learning and mode category in STS. This is a powerful presentation to finding the gaps, strengths, weaknesses, and obstacles of applying ML methods into STS. This analysis reveals all the opportunities and threats. For instance, it swiftly shows untouched STS application areas such as Simulation and new ML methods such as VAE. These are only the first inferences from the graph. There are many tremendous analyzing opportunities also in this type of presentation..
[Audio] Now we know why the STS is defined as a complex interconnected system with several internal interactions. Each component of STS has several dimensions that are interconnected and interact. For instance, a decision for a passenger to plan a trip has several dimensions, including mode choice, travel time that is impacted by traffic management, supply, demand, and implemented policies. Moreover, the safety and externality such a Fuel consumption also impact this decision. These are only internal STS parameters that interact to make a mobility happen. This interconnected system is already presented in the graph model. On the other hand, the technology, research, and application developed in STS are categorized as described before, and you can see a symbolic presentation here. The application, as a result of technological and research advancement, although make some improvement in traffic and mobility, they have an impact by itself on the entire surface transportation system. As a simple example, the automobile was invented as a technological advancement to improve the travel time of the surface transportation system. However, it truly does not have that improvement, it interacted with other dimensions of surface transportation systems and changed some of their properties. The automobile application makes traffic accidents, traffic congestion, and air pollution that change other surface transportation systems parameters. Therefore, the STS not only interacts internally it is impacted by technological application as well. But it is not the end of the story. Although the entire STS and applied technologies make already an complex system, those components are not only actors. Any part of the STS is also imposed by environmental factors. As an example, a simple decision influence diagram for a passenger private vehicle mode choice is presented here. As it shows, this decision not only depends on internal STS components and applied technologies, but it also depends on some other environmental factors such as employment status, car ownership, workplace, population, and so on. other factors such as time per commute trip that depends on the network topology format also is an active impacting factor. There are also some other environmental factors such as weather conditions or day of the week that impact that decision. In sum, the STS is a multidimensional complex system that interacts internally and with some outside factors as well. Many STS problem-solving frameworks ignore those environmental factors and only focus on the traffic measures such as speed, flow, and capacity. Those measures, although are the main contributor to the traffic system; they are secondary measures that is affected through the primary factors. Often accessing or measuring those primary factors is impossible or not accurate or consistent enough..
[Audio] Now we have a good sense of why the STS are complex, and their measurements are generally intended as stochastic variables. Most of the attempts in transportation system analysis and examinations are follow Wardrop's first and second principles as the best available approach. Those principals fundamentally focus on speed, capacity, flow, and equilibrium. However, the principals could hardly describe how and to what degree environmental factors impact traffic measures. The primary factors which influence the transportation system could be categorized into two main groups. First, the spatial transportation behavior presents the graphical network context—this properties group represents network context format, connectivity, and routing options. Second, external factors include a set of features such as weather conditions, calendar times, accidents, and socioeconomic status. This transportation system's spatial behavior and external factors make a spatial-external matrix as I present them in the figure. To evaluate the applied ML methods into STS application areas by the papers, I used this matrix to categorize them into four different groups based on their using spatial-external factors. The graph representations of the STS's decomposition, STS application areas, ML categorization, the Spatial-external matrix together with reviewed papers make a multidimensional multilayer framework in the form of a hypergraph for this reviewing and analysis. In the following slides, the result of this framing of the papers is presented..
[Audio] With the explained methodology that creates a framework for this review, the 100 papers from several filtering processes have been extracted and read. Their main attributes are extracted and fielded into the database and the graph representation of the STS components, application areas, and the ML methods were created. The results of the process are presented here. First, I present you the # of paper applying specific ML algorithms. As you can see, The most applied ML algorithms are MLP, followed by LSTM and random forest. After those methods, supervised and unsupervised classical learners are the most applied algorithms in STS application areas (i.e., random forest, SVM, KNN, Fuzzy, and linear regression). It seems that those algorithms are more considered because of their simple hyperparameters definition and their capability to be used in regression and clustering. Furthermore, their modeling initiation is minimal, and they don't need the input data graph representation. Those algorithms are powerful in examining input data, sketching their internal relationships, and predicting outputs based on the recognized patterns. These capabilities make the algorithms easy to use and powerful for predicting transportation parameters such as speed, travel time, demand, mode choice, collision, parking demand, and operation cost optimization. Those factors are geographically local and quantitative and follow the demand/ supply equilibrium concept. That makes the unsupervised ML model ideal for prediction them. XGBoost, as an ensemble learning method, is also famous for forecasting and clustering generally. The researchers generally report XGBoost has better accuracy and performance compared to similar practices. Of course, XGBoost also proves its ability in other application fields and presents the same capability in forecasting and clustering in traffic and transportation systems problem-solving..
[Audio] From the STS angle of view, the reviewed papers primarily focus on forecasting and prediction by 74%, followed by optimization by 11% share. As presented in Figure, service advancements, ITS technology advancements, and autonomous vehicles level 0 to 5 ( AV L0-L5) technology advancements altogether have a 15% share. One step forward with focusing on STS application areas shows that the research world is more concentrated on the prediction and forecasting sectors, as it is presented here. As you can see at the bottom of the table, some areas are barely touched by the ML methods, such as AV L0-L5 technology advancements that focus on the applications planned to use in future vehicles. The ITS technology advancements also have been less focused, and applications such as simulation with all its potential and needs are completely disregarded in surface transportation ML application. From the spatial-external matrix view, each STS application area is specifically examined to see how many papers used the network behavior or external factors of the STS problem as input. As you can see in forecasting as most focused application area by ML application, although some use those factors, they mainly ignore them. As I examined the input, I found out that from those which use the network behavior, some of them only use the conceptual presentation of the road network, such as the length between the ODs. Or, as an external factor, only morning, evening times of day are used as calendar time input data. However, those also are counted as applied factors. The figures show that the spatial-external factors are mainly ignored in forecasting. In optimization, the applied ML algorithm needs external data such as the cost or the distance for optimization purposes. Then they use one or two types of spatial-external data as input. The papers in this part focus on different aspects of STS compared to other factors. For instance, a paper concentrates on pavement condition index optimization by applying falling weight deflectometer data. It is significantly different from the widespread application and can show the power of ML application in other STS points. The service advancements also depend on external and spatial data input. They mainly focus on providing services such as toll collection, parking services that depend on external parameters, and possibly geographical attributes. There is the same situation for ITS technology advancements as they also depend on external factors as well AV L0-L5 technology advancements are an exception in this categorization. Although there are few papers in this category, they all use spatial and external data as input. It is clear that Autonomous Vehicles services and application advancements are all about the future of transportation and they seriously depend on the environmental and network behavior of the transportation systems. Many of them use the weather condition, and even a live picture from their acting environment by collecting data through the sensors such as LIDAR, cameras, and positioning systems. Then they heavily depend on those primary factors of the STS dynamic..
[Audio] Now we take a look back at the list of applied ML algorithms to the STS application areas. The first table ranks the most popular ML application into the most popular selected STS application areas according to the reviewed papers. Limited ML algorithms are generally used for a set of narrow STS applications. It shows that LSTM (most prevalent RNN algorithm) and popular classical supervised algorithms (e.g., SVM, Random Forrest, KNN, K-Mean, SVR, and Fuzzy) are the most preferred ML algorithm for prediction goals in surface transportation. This popularity could be caused by ML algorithms' primary nature that is extensively applied for forecasting purposes. Additionally, data scientists are very familiar with implementing a digital model of the real-world problem for estimating some system behavior as a typical ML problem-solving method. The following table rank the least applied ML algorithms into the STS application areas. Of course, several ML algorithms and STS application areas don't have any implementation, and then they don't come into the picture. They are a long list of untouched from each side..
[Audio] To interpret the outcome of our study to find out why some of the ML methods are more applied to the STS application areas and why some others are ignored or hardly used, we need to have a framework for that. 3 main stakeholders impact a decision criterion. Includes available experiences in that specific area, problem context, available data within that particular area. These three parameters are explored to determine why and how ML algorithms are selected for STS application areas. Several internal and external factors impact transportation systems. Those impacts have been only examined by measuring influencing factors such as mode choice, demand, traffic volume, speed, and flow changes. Although a rainy day impacts the mode choice of the passengers, demand prediction models rarely consider weather conditions as input and concentrate on influencing factors only. The traffic measures such as speed, flow, and demand are time-series data. Therefore, it is practical that the RNN and LSTM are used primarily for STS's forecasting objectives. Moreover, many practices experience RNN algorithms employment in other time-series prediction knowledge areas (first pillar). Traffic-related data with time series context are also appropriate for using the RNN algorithm (second pillar). The third pillar is the accessible data resources in the STS context. There are few publicly available data sources in the STS, such as the performance measurement system ( peMS). The dataset contains speed and volume data in 5-minute intervals of 18,000 various vehicle detectors on the highway of California between 2001 and 2019. Additionally, other trajectory data such as taxi and ride-hailing data are also rarely available..
[Audio] Now I present what we learned. The findings indicate that 80% of research papers focus on four surface transportation application areas out of 25 includes demand/ mode forecasting, speed/ capacity/ flow/travel time forecasting, crash/ accident/ incident/ congestion forecasting, and operation cost optimization. The finding also reveals that LSTM and popular classical supervised methods (i.e., SVM, Random Forrest, KNN, K-Mean, SVR, and Fuzzy) are preferred algorithms for surface transportation problem-solving. Moreover, MLP algorithms are used extensively in various surface transportation applications caused by the nonlinearity property and extensive applied experience history. However, any ML algorithms barely touch 17 Out of 25 well-known surface transportation application areas in research papers ( zero or one article). And finally, we found out that the network behavior of STS problems (geographical attributes) and external factors such as weather conditions are shallowly used in only 27% of the applied ML algorithms. And Why dose it happen? And the examination of the root cause analysis of those facts by applying selection/ decision criteria framework analysis presents that there are a sort of obstacles and facts that limited and slowdown the ML employment in STS application areas. Includes following. The research world poorly considers spatial behavior and external factors impacting traffic and transportation systems as input data in applying ML models. This problem is also mainly caused by a lack of reliable, open, integrated, and accessible transportation systems data-bases. The transportation sector not clearly state their problems employment of ML methods in STS application areas are mostly initiated by ML experts. And they prefer to use the most adaptable available data sources besides the popular problem contexts such as forecasting and optimization. Additionally, they apply the most practiced ML algorithms into the STS application areas problems. With those limitation and facts, this would be an apparent result that most sophisticated recent developed ML algorithms such as V.A.E, G.AN, GNN, are not used generally in STS application areas..
[Audio] There is a sort of recommendation based on the pointed ML strength and available experiences, STS opportunities and needs, and the obstacles and limitations for the STS sector to exploit more from ML capabilities as much as other sectors. There should be a collaboration of transportation authorities at the federal, state, and city-level from one hand and research and the academic world such as universities from other hand and finally private sector, which acting as a technology provider, to perform followings: First, implement an open access share platform for presenting accurate, integrated, valid, high-frequency traffic and environmental data. This platform must be updated and maintained correctly to be trustable to the ML research world. Second, they should define their problem and priority clearly in this platform to consider the ML research world. Third, there should be some encouraging tools for ML experts to apply a more modern and sophisticated model to transportation problems..
[Audio] Thnak you very much for your attention and I am ready to answer any question if you have..