BIG DATA TECHNOLOGY IN SMART CITIES.. A Taxi EU Company Case Study..
What is Big Data?. Big data refers to massive data sets with large, varied, and complex structures that are difficult to store, analyze, and visualize for subsequent processes or results..
Big Data Analytics:. R Programming Language: serves as an interface to other software developed in compiled languages such as C, C++, and Fortran and to give the user an interactive tool to analyze data. SQL: extracting data from databases in traditional data warehouses and big data technologies. Statistical modeling: supervised and unsupervised classification or regression problems. Machine learning: pattern recognition, computer vision, speech recognition, text analytics, statistics and mathematical optimization. Applications include: development of search engines, spam filtering, Optical Character Recognition (OCR) among others..
Big Data Life Cycle.. Data Understanding − initial data collection, getting familiar with the data, identify data quality problems, discover first insights into the data, detect interesting subsets to form hypotheses for hidden information. Business Understanding − This initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition. Modeling − In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values Deployment- generating a report or as complex as implementing a repeatable data scoring (e.g. segment allocation) or data mining process..
TAXI EU. Taxi.eu is a Company that enables users to find and book taxis in Europe. Users can track their car’s approach in real time through GPS and calculate the costs of their rides before booking the taxi. The key features are: according to your individual wishes (Eco, XXL, Business and many more) according to your personal time schedule (short arrival times, comfortable reservations) according to your fair taxi fare (estimation of the taxi fare before departure).
Objectives. We know the following about our system requirements: Drivers must be able to tell the service of their current location and availability on a regular basis. Passengers should have real-time visibility of all close drivers. Customers can order a ride by specifying their location and time of pickup. When a consumer wants to be picked up, nearby drivers should be alerted. Once a ride is accepted, both the driver and the consumer must be able to see each other's current location for the length of the journey. The driver completes the ride and should then be ready for another customer after the drive is completed..
Current Taxi EU Infrastructure:. Kafka Logs Schemaless Databases RDBMS Tables Bulk Uploader Celery/Python ETL OLAP Warehouse EMR Amazon S3 Applications Adhoc SQL.
What is Apache Kafka?. Apache Kafka is an open-source streaming platform that was initially built by LinkedIn. It was later handed over to Apache foundation and open sourced it in 2011. It aims at providing a unified, high-throughput, low-latency platform for handling real-time data feeds ..
Proposed Hadoop Infrastructure:. Proposed HDFS: Apache Spark is the proposed Big Data infrastructure that will power many critical aspects within Taxi EU. The solution includes replacing the Celery/Python ETL system with a new Spark-based system. The new technology essentially separates the relational data warehouse table model from the raw data input..
What is Apache Hadoop & Spark?. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters..
TAXI EU BACKEND SYSTEM DESIGN. We would need to make changes to conform to our Taxi EU system and its standards. For example, our QuadTree must be modified to accommodate frequent upgrades. When we employ the Dynamic Grid approach, we have a few issues: Every three seconds, we must update data structures to reflect active drivers' reported whereabouts. To move a driver to a new place, we must first locate the appropriate grid depending on the driver's prior position. If the new location does not correspond to the current grid, the driver is removed from the current grid and reinserted into the correct grid. If the new grid reaches its full capacity, we must repartition it. We require a fast technique for communicating the current position of nearby drivers to clients in the region. Throughout the voyage, our system must tell both the driver and the customer of the car's whereabouts. A QuadTree is not appropriate in these instances because we cannot ensure that the tree will be refreshed as soon as our system demands. The QuadTree must be updated with each driver update in order for the system to use only fresh data representing everyone's current position. We could use a hash table to store the most current driver position and update our QuadTree less frequently. We want to ensure that the current position of a driver is represented in the QuadTree within 15 seconds. We have a hash table that stores the current.
Driver location broadcasting. Customers will need to be informed of driver locations. We may utilize a Push Model to have the server push positions to the appropriate users. We may implement a Notification Service based on the publisher/subscriber approach. When a passenger launches the Taxi EU app, it queries the server to locate nearby drivers. We subscribe the consumer to all updates from neighboring drivers on the server. Every update to a driver's location in DriverLocationHT is broadcast to all subscribers. This guarantees that the current position of each driver is presented. Every day, we estimate one million active consumers and 500 thousand active drivers. Assume five consumers subscribe to a single driver. This data will be stored in a hash table for rapid updates..
Notification Service. We may utilize HTTP long polling or push notifications to efficiently construct the Notification service. When customers use the Taxi EU app for the first time, they are subscribed to local drivers. When new drivers reach their zones, we must dynamically add a new customer/driver subscription. To do so, we monitor the region in which a consumer is looking. However, this would become quite difficult. Rather than pushing this information, we may build the system such that clients retrieve it from the server. Customers will transmit their current position to the server, which will locate nearby drivers from our QuadTree. The customer may then update their screen to reflect the updated locations of the drivers..
Case in point: request a ride. We'll go through how this use case works in more detail below. The consumer requests a ride. One of the Aggregator servers accepts the request and requests that neighboring drivers be returned by the QuadTree servers. The Aggregator server collects the results and ranks them. The Aggregator server concurrently sends a notice to the top drivers. The ride will be allocated to the first driver that accepts. The other drivers' reservations will be canceled. If none of the drivers answer, the Aggregator will request a trip from the next driver on our list. When a driver accepts a request, the customer is alerted..
Advantages that come along with the Hadoop & Spark Big data technology Solution to Taxi EU:.
Considerations. Fault tolerance: Server Replicas incase Notification and driver locations systems die. Ranking: Rank search results by popularity, relevance and proximity..
CONCLUSION:. Upon this implementation, Taxi EU will reach a stage with the fundamentals in place [where] they can invest more into a strong , longer-term ingestion system employing Spark and Spark Streaming thanks to this big data technology-solution, which will also tackle a number of other challenges that are currently present. By guaranteeing quick expansion and a centralized system for managing the company's data , this will increase Taxi EU's revenue while creating employment opportunities to those involved in the operations..