[Audio] BIG DATA TECHNOLOGY IN SMART CITIES. A Taxi EU Company Case Study..
[Audio] What is Big Data? Big data refers to massive data sets with large, varied, and complex structures that are difficult to store, analyze, and visualize for subsequent processes or results..
[Audio] R Programming Language: serves as an interface to other software developed in compiled languages such as C, C++, and Fortran and to give the user an interactive tool to analyze data. SQL: extracting data from databases in traditional data warehouses and big data technologies. Statistical modeling: supervised and unsupervised classification or regression problems. Machine learning: pattern recognition, computer vision, speech recognition, text analytics, statistics and mathematical optimization. Applications include: development of search engines, spam filtering, Optical Character Recognition ( OCR) among others..
[Audio] Data Understanding − initial data collection, getting familiar with the data, identify data quality problems, discover first insights into the data, detect interesting subsets to form hypotheses for hidden information. Business Understanding − This initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition. Modeling − In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values Deployment- generating a report or as complex as implementing a repeatable data scoring ( e.g. segment allocation) or data mining process..
[Audio] Taxi.eu is a Company that enables users to find and book taxis in Europe. Users can track their car's approach in real time through GPS and calculate the costs of their rides before booking the taxi. The key features are: according to your individual wishes( Eco, XXL, Business and many more) according to your personal time schedule(short arrival times, comfortable reservations) according to your fair taxi fare( estimation of the taxi fare before departure).
[Audio] Current Taxi EU Infrastructure: Taxi EU currently leverages Kafka data feeds to bulk-load log data into Amazon S3 and handle that data using EMR. The System limits expansion and thus affects scalability of the business..
[Audio] What is Apache Kafka? Apache Kafka is an open-source streaming platform that was initially built by LinkedIn. It was later handed over to Apache foundation and open sourced it in 2011. It aims at providing a unified, high-throughput, low-latency platform for handling real-time data feeds..
[Audio] Proposed Hadoop Infrastructure: Proposed HDFS: Apache Spark is the proposed Big Data infrastructure that will power many critical aspects within Taxi EU. The solution includes replacing the Celery/ Python ETL system with a new Spark-based system. The new technology essentially separates the relational data warehouse table model from the raw data input..
[Audio] What is Apache Hadoop & Spark? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters..
[Audio] Adchnology Solution to Taxi EU: 1. Optimized , Seamless and reliable user experience/ customer experience. 2. Expansion of Taxi EU business/ Company & Services. 3. A system that is more dependable, extensible, and Capable of long-term growth. 4. A data querying system that can keep up with Taxi EU's rapid expvantages that come along with the Hadoop & Spark Big data technology..
[Audio] CONCLUSION: Upon this implementation, Taxi EU will reach a stage with the fundamentals in place [where] they can invest more into a strong, longer-term ingestion system employing Spark and Spark Streaming thanks to this big data technology-solution, which will also tackle a number of other challenges that are currently present. By guaranteeing quick expansion and a centralized system for managing the company's data, this will increase Taxi EU's revenue while creating employment opportunities to those involved in the operations..