PowerPoint Presentation

Published on Sep 26, 2023

Scene 1 (0s)

Especialistas en Big Data y Analítica - Morris & Opazo.

Scene 2 (13s)

Module Objectives. In this module, you will be able, To understand the Lambda functions, SQS/ SNS To understand the EMR/EC2 To understand the AWS Glue To understand the ECS/Fargate.

Scene 3 (26s)

Module Outcomes. At the end of the module, you will be able, To understand the Lambda functions, SQS/ SNS To understand the EMR/EC2 To understand the AWS Glue To understand the ECS/Fargate.

Scene 4 (40s)

AWS Lambda Introduction.

Scene 5 (12m 55s)

lambda Functions. Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring, and logging. With Lambda, you can run code for virtually any type of application or backend service. All you need to do is supply your code in one of the languages that Lambda supports..

Scene 6 (13m 8s)

You can invoke your Lambda functions using the Lambda API, or Lambda can run your functions in response to events from other AWS services. For example, you can use Lambda to Build data-processing triggers for AWS services such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Process streaming data stored in Amazon Kinesis. Create your own backend that operates at AWS scale, performance, and security..

Scene 7 (13m 22s)

When should I use Lambda? Lambda is an ideal compute service for many application scenarios, as long as you can run your application code using the Lambda standard runtime environment and within the resources that Lambda provides. When using Lambda, you are responsible only for your code. Lambda manages the compute fleet that offers a balance of memory, CPU, network, and other resources to run your code. Because Lambda manages these resources, you cannot log in to compute instances or customize the operating system on provided runtimes. Lambda performs operational and administrative activities on your behalf, including managing capacity, monitoring, and logging your Lambda functions..

Scene 8 (13m 35s)

lambda Functions. Lambda features Concurrency and scaling controls Functions defined as container images Code signing Lambda extensions Function blueprints Database access File systems access Accessing Lambda.

Scene 9 (13m 49s)

lambda Functions. You can create, invoke, and manage your Lambda functions using any of the following interfaces: AWS Management Console AWS Command Line Interface (AWS CLI) AWS SDKs AWS CloudFormation AWS Serverless Application Model (AWS SAM).

Scene 10 (14m 2s)

lambda Functions. Language Options You can create your AWS Lambda function using any of a growing list of supported languages: C# Go Java Node.js Python.

Scene 11 (14m 15s)

AWS SQS. What is SQS? SQS stands for Simple Queue Service. SQS was the first service available in AWS. Amazon SQS is a web service that gives you access to a message queue that can be used to store messages while waiting for a computer to process them. Amazon SQS is a distributed queue system that enables web service applications to quickly and reliably queue messages that one component in the application generates to be consumed by another component where a queue is a temporary repository for messages that are awaiting processing. With the help of SQS, you can send, store and receive messages between software components at any volume without losing messages..

Scene 12 (14m 29s)

[image] How to Efficiently Scale AWS SQS Listener to Process Millions of Messages Concurrently in Spring Cloud AWS java by saurabh bhatia AWS Tip.

Scene 13 (15m 42s)

There are two types of Queue: Standard Queues (default) FIFO Queues (First-In-First-Out).

Scene 14 (15m 56s)

[image] SQS. Standard Queue SQS offers a standard queue as the default queue type. It allows you to have an unlimited number of transactions per second. It guarantees that a message is delivered at least once. However, sometime, more than one copy of a message might be delivered out of order. It provides best-effort ordering which ensures that messages are generally delivered in the same order as they are sent but it does not provide a guarantee..

Scene 15 (16m 9s)

AWS SQS. FIFO Queue. [image] SQS. The FIFO Queue complements the standard Queue. It guarantees ordering, i.e., the order in which they are sent is also received in the same order. The most important features of a queue are FIFO Queue and exactly-once processing, i.e., a message is delivered once and remains available until consumer processes and deletes it. FIFO Queue does not allow duplicates to be introduced into the Queue. It also supports message groups that allow multiple ordered message groups within a single Queue. FIFO Queues are limited to 300 transactions per second but have all the capabilities of standard queues..

Scene 16 (16m 22s)

SQS Visibility Timeout The visibility timeout is the amount of time that the message is invisible in the SQS Queue after a reader picks up that message. If the provided job is processed before the visibility time out expires, the message will then be deleted from the Queue. If the job is not processed within that time, the message will become visible again and another reader will process it. This could result in the same message being delivered twice. The Default Visibility Timeout is 30 seconds. Visibility Timeout can be increased if your task takes more than 30 seconds. The maximum Visibility Timeout is 12 hours..

Scene 17 (16m 36s)

Important points to remember SQS is pull-based, not push-based. Messages are 256 KB in size. Messages are kept in a queue from 1 minute to 14 days. The default retention period is 4 days. It guarantees that your messages will be processed at least once..

Scene 18 (16m 49s)

What is SNS? SNS stands for Simple Notification Service. It is a web service which makes it easy to set up, operate, and send a notification from the cloud. It provides developers with the highly scalable, cost-effective, and flexible capability to publish messages from an application and sends them to other applications. It is a way of sending messages. When you are using AutoScaling, it triggers an SNS service which will email you that "your EC2 instance is growing". SNS can also send the messages to devices by sending push notifications to Apple, Google, Fire OS, and Windows devices, as well as Android devices in China with Baidu Cloud Push. Besides sending the push notifications to the mobile devices, Amazon SNS sends the notifications through SMS or email to an Amazon Simple Queue Service (SQS), or to an HTTP endpoint..

Scene 19 (17m 3s)

SNS notifications can also trigger the Lambda function. When a message is published to an SNS topic that has a Lambda function associated with it, Lambda function is invoked with the payload of the message. Therefore, we can say that the Lambda function is invoked with a message payload as an input parameter and manipulate the information in the message and then sends the message to other SNS topics or other AWS services. Amazon SNS allows you to group multiple recipients using topics where the topic is a logical access point that sends the identical copies of the same message to the subscribe recipients. Amazon SNS supports multiple endpoint types. For example, you can group together IOS, Android and SMS recipients. Once you publish the message to the topic, SNS delivers the formatted copies of your message to the subscribers..

Scene 20 (17m 16s)

SNS Publishers and Subscribers. [image] SNS. AWS SNS.

Scene 21 (17m 29s)

[image] SNS. AWS SNS.

Scene 22 (18m 43s)

Publishers Publishers are also known as producers that produce and send the message to the SNS which is a logical access point. Subscribers Subscribers such as web servers, email addresses, Amazon SQS queues, and AWS Lambda functions receive the message or notification from the SNS over one of the supported protocols (Amazon SQS, email, Lambda, HTTP, SMS)..

Scene 23 (19m 6s)

Benefits. AWS SNS. Instantaneous delivery SNS is based on push-based delivery. This is the key difference between SNS and SQS. SNS is pushed once you publish the message in a topic and the message is delivered to multiple subscribers. Flexible SNS supports multiple endpoint types. Multiple endpoint types can receive the message over multiple transport protocols such as email, SMS, Lambda, Amazon SQS, HTTP, etc..

Scene 24 (19m 20s)

Inexpensive SNS service is quite inexpensive as it is based on pay-as-you-go model, i.e., you need to pay only when you are using the resources with no up-front costs. Ease of use SNS service is very simple to use as Web-based AWS Management Console offers the simplicity of the point-and-click interface. Simple Architecture SNS is used to simplify the messaging architecture by offloading the message filtering logic from the subscribers and message routing logic from the publishers. Instead of receiving all the messages from the topic, SNS sends the message to subscriber-only of their interest..

Scene 25 (19m 33s)

SNS stands for Simple Notification Service while SQS stands for Simple Queue Service. SQS is a pull-based delivery, i.e., messages are not pushed to the receivers. Users have to pull the messages from the Queue. SNS is a push-based delivery, i.e., messages are pushed to multiple subscribers. In SNS service, messages are pushed to the multiple receivers at the same time while in SQS service, messages are not received by the multiple receivers at the same time. SQS polling introduces some latency in message delivery while SQS pushing pushed the messages to the subscribers immediately.

Scene 26 (19m 46s)

Amazon Elastic MapReduce Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data. Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open-source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. Amazon EMR makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters and uses Hadoop, an open-source framework, to distribute your data and processing across a resizable cluster of Amazon EC2 instances. Amazon EMR is used in a variety of applications, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics..

Scene 27 (20m 0s)

Benefits of Amazon EMR Easy to use Amazon EMR is easy to use, i.e., it is easy to set up cluster, Hadoop configuration, node provisioning, etc. Reliable It is reliable in the sense that it retries failed tasks and automatically replaces poorly performing instances. Elastic Amazon EMR allows computing large number of instances to process data at any scale. It easily increases or decreases the number of instances..

Scene 28 (20m 13s)

Difference between ec2 vs ecs vs lambda.

Scene 29 (36m 31s)

Secure It automatically configures Amazon EC2 firewall settings, controls network access to instances, launch clusters in an Amazon VPC, etc. Flexible It allows complete control over the clusters and root access to every instance. It also allows installation of additional applications and customizes your cluster as per requirement. Cost-efficient Its pricing is easy to estimate. It charges hourly for every instance used A fully managed service to extract, transform, and load (ETL) your data for analytics. Discover and search across different AWS data sets without moving your data..

Scene 30 (36m 44s)

Amazon ECS. What is Amazon ECS? Amazon Elastic Container (Amazon ECS) is an AWS cloud service used for managing containers. Using Amazon ECS, developers can run their apps on the cloud without configuring an environment to run the code. With the help of AWS accounts, deployment and management of scalable apps can be done by running them on a group of servers called clusters via API and task definitions. It can be accessed through AWS Management Consoles and SDKs. We need Amazon ECS to solve the issues related to memory or full storage, unavailability of CPU, high utilization of CPU, and many more, for preventing our server from going down..

Scene 31 (36m 58s)

Here are few advantages of Amazon ECS. Amazon ECS.

Scene 32 (37m 11s)

What is Docker? Docker allows developers to build applications based on small and lightweight containers. It shares the operating system kernel but still runs isolated from each other. The containers combine the app source code with OS libraries and needed dependencies for executing the code in different environments. Feature of Docker High Scalability and efficiency Short boot-up time Reusable data volume Isolated Application.

Scene 33 (37m 25s)

Click on Get Started Button. Select the Container definition which you want to deploy and click on the Next button. (Here we’ll go with Nginx) Select Application Load Balancer if you want a load balancer; otherwise, select None and click Next Button. (Here we will go without Load balancer) Enter your cluster name and click on Next. Review all the details of the cluster and click on the Create button.

Scene 34 (37m 38s)

Your cluster will be created. Click the View Service button to view your cluster. Go to Task and click on Task ID. Copy Public IP and Paste in the browser to run your Nginx application..

Scene 35 (37m 51s)

AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers. AWS Fargate is compatible with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS)..

Scene 36 (38m 5s)

Need For AWS Fargate When container services still didn’t exist, users used to launch their applications on virtual machines. In AWS cloud they deployed applications on EC2 instances. They packaged their application with OS into what we call an Amazon Machine Image(AMI) and then run it on AWS EC2 instance. Then docker introduced containers, and people started deploying their applications on these containers. Containers resemble VM. One major difference is that unlike VM’s, containers share the host system’s kernel with other containers.

Scene 37 (38m 18s)

[image] containervm AWS Fargate Edureka. AWS FARGATE.

Scene 38 (38m 42s)

Working Of AWS Fargate let’s look at some general terms that you encounter frequently when you are dealing with AWS Fargate. Container: A Docker container is a standardized unit of software development, containing everything that your software application needs to run code, runtime, system tools, system libraries, etc. These containers are created from a read-only template called a container image. Container Image: Images are typically built from a docker-file which is a plain text file that specifies all of the components that are included in the container. These images are stored in a registry from which they can be downloaded & run in the container..

Scene 39 (38m 55s)

AWS FARGATE.

Scene 40 (40m 8s)

AWS GLUE AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months..

Scene 41 (41m 22s)

Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes. These tasks are often handled by different types of users that each use different products. AWS Glue provides both visual and code-based interfaces to make data integration easier. Users can easily find and access data using the AWS Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can visually create, run, and monitor ETL workflows with a few clicks in AWS Glue Studio. Data analysts and data scientists can use AWS Glue Data Brew to visually enrich, clean, and normalize data without writing code. With AWS Glue Elastic Views, application developers can use familiar Structured Query Language (SQL) to combine and replicate data across different data stores..

Scene 42 (41m 35s)

Benefits Faster data integration Different groups across your organization can use AWS Glue to work together on data integration tasks, including extraction, cleaning, normalization, combining, loading, and running scalable ETL workflows. This way, you reduce the time it takes to analyze your data and put it to use from months to minutes. Automate your data integration at scale AWS Glue automates much of the effort required for data integration. AWS Glue crawls your data sources, identifies data formats, and suggests schemas to store your data. It automatically generates the code to run your data transformations and loading processes. You can use AWS Glue to easily run and manage thousands of ETL jobs or to combine and replicate data across multiple data stores using SQL. No servers to manage AWS Glue runs in a serverless environment. There is no infrastructure to manage, and AWS Glue provisions, configures, and scales the resources required to run your data integration jobs. You pay only for the resources your jobs use while running. AWS Glue can run your ETL jobs as new data arrives. For example, you can use an AWS Lambda function to trigger your ETL jobs to run as soon as new data becomes available in Amazon S3. You can also register this new dataset in the AWS Glue Data Catalog as part of your ETL jobs..

Scene 43 (41m 49s)

[image] Create and run ELT jobs in AWS Glue. Create a unified catalog to find data across multiple data stores.

Scene 44 (43m 2s)

AWS Glue Studio AWS Glue Studio makes it easy to visually create, run, and monitor AWS Glue ETL jobs. You can compose ETL jobs that move and transform data using a drag-and-drop editor, and AWS Glue automatically generates the code.

Scene 45 (44m 15s)

Explore data with self-service visual data preparation AWS Glue Data Brew enables you to explore and experiment with data directly from your data lake, data warehouses, and databases, including Amazon S3, Amazon Redshift, AWS Lake Formation, Amazon Aurora, and Amazon RDS. You can choose from over 250 prebuilt transformations in AWS Glue Data Brew to automate data preparation tasks, such as filtering anomalies, standardizing formats, and correcting invalid values. After the data is prepared, you can immediately use it for analytics and machine learning..

Scene 46 (44m 29s)

AWS Glue Elastic Views enables you to use familiar SQL to create materialized views.

Scene 47 (44m 42s)

AWS Glue consists of Central metadata repository ETL engine Flexible scheduler.

Scene 48 (44m 56s)

A crawler accesses your data store, extracts metadata, and creates table definitions in the AWS Glue Data Catalog. A script allows you to extract the data from sources, transform it, and load the data into the targets. Trigger allows you to manually or automatically start one or more crawlers or ETL jobs AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries Glue jobs can be set and called on a flexible schedule, either by event-based triggers or on demand. Several jobs can be started in parallel, and users can specify dependencies between jobs..

Scene 49 (45m 9s)

AWS Glue use cases Data extraction. Glue extracts data in a variety of formats. Data transformation. Glue reformats data for storage. Data integration. Glue integrates data into enterprise data lakes and warehouses..

Scene 50 (45m 22s)

Glue can integrate with the Snowflake data warehouse to help manage the data integration process. AWS data lake can integrate with Glue. AWS Glue can integrate with Athena to create schemas. ETL code can be used for Glue on GitHub as well..

Scene 51 (45m 36s)

Pricing You are charged at an hourly rate based on the number of DPUs used to run your ETL job. You are charged at an hourly rate based on the number of DPUs used to run your crawler. Data Catalog storage and requests: You will be charged per month if you store more than a million objects. You will be charged per month if you exceed a million requests in a month..

Scene 52 (45m 49s)

Summary. Lambda functions, SQS/ SNS Concepts of EMR/EC2 AWS Glue ECS/Fargate.