AWS Analytics Explained

July 4, 2022

View all articles

Why do we need to analyze our data?

Data analysis allows us to understand how consumers are using our products and services. This can help us to improve our products and services and cater to our customers better.

What is AWS Analytics?

AWS Analytics is a group of services that allows you to analyze any data that is stored within the cloud. For customer data, AWS analytics is responsible for forwarding data to AWS services that can process and extract information and trends. For cloud usage, it provides you with a dashboard that shows you the monitoring metrics one has set up regarding AWS usage and can provide a means to act on the corresponding metrics.

What are the services that AWS Analytics provides?

  1. Amazon Athena
  2. Amazon Elasticsearch
  3. Amazon EMR
  4. Amazon Data Streams
  5. Amazon Kinesis Data Firehose
  6. Amazon MSK
  7. Amazon Redshift
  8. AWS Glue
  9. AWS Lake Formation

Amazon Athena

What is it?

Amazon Athena is an interactive serverless service used to analyze data directly in Amazon Simple Storage Service (S3) by using standard SQL queries when needed.

What are the functions of Athena?

What is the pricing of Athena?

Use case: Streaming Analytics

In the following use case, we are analyzing streaming data from Amazon Kinesis Data Firehose. The data initially is the clickstream from a user on a website. Firehose takes the data, converts the data to columnar format, then stores it into an S3 bucket to be analyzed. This data is stored in Amazon S3 by using the Kinesis Data Firehose. The data is then analyzed using Athena from the S3 bucket Finally, Amazon Quicksight is used to visualize the data and provide crucial insights for business intelligence services.

Amazon Elasticsearch

What is it?

What are the functions of Elasticsearch?

What is the pricing of Elasticsearch?

Use case:

In this usecase, we will be monitoring the actions taken by customer support agents in a company. Amazon connect is a cloud contact center that allows the use of AI / ML to automate customer support interactions. It is important to monitor decisions made by these AI / ML agents as they are made autonomously without human intervention. If there was an issue with an action taken by an agent, one would want to know what the action was, the time it was taken, and the outcome of the action. Amazon connect forwards the logs using Kinesis data streams, Kinesis data firehose, and Lambda functions to Amazon Elasticsearch and an S3 bucket. Elasticsearch can be used to analyze the data and visualize the data with Kabana. An example of a Kabana visualization is shown below.

Amazon EMR

What is it?

Amazon EMR (Elastic Map Reduce) is a service for processing and analyzing large amounts of data in the cloud using apache hive, hadoop, apache flink, and apache spark.

What are the functions of Amazon EMR?

What is the pricing of Amazon EMR?

Use case:

Amazon EMR is commonly used in Machine Learning, Big Data, and Bioinformatics. A common example would be a smart watch sending all data and running some select EC2 instances everytime new data is sent.

Amazon Data Streams

What is it?

Realtime data streaming service. Captures GBs of data from sources like website clickstreams, events streams (database and location tracking), and social media feeds. The Kinesis family is made up of the following: Datastreams, Firehose, Data Analytics, and Video Streams.

What are the functions of Amazon Data Streams?

What is the pricing of Amazon Data Streams?

Use case:

This can be used for a variety of use cases, for example: fraud detection, live leaderboards, and video processing. In this use case we will cover the latter. Using a security camera, we can forward the feed with Kinesis data streams. This feed can fed into AWS Sagemaker / Rekognition Video to automatically detect objects in the video. This has a variety of use cases ranging from detecting a firearm in a video, to detecting a person in a video.

Amazon Kinesis Data Firehose

What is it?

Serverless service that loads data stores and analytics services by capturing, transforming, and load streaming data.

What are the functions of Amazon Kinesis Data Firehose?

What is the pricing of Kinesis Data Firehose?

Use case:

The majority of the use cases for this service is to serve as a data transfer service to S3, Redshift, Elasticsearch, and Splunk.

Amazon MSK
(Managed Streaming for Apache Kafka)

What is it?

Amazon MSk is a managed cluster service used to build and execute Apache Kafka Applications for the processing of streaming data.

What are the functions of Amazon MSK?

What is the pricing of Amazon MSK?

How is it different from Kinesis?

Kinesis has at least once delivery, whereas MSK guarantees exactly once.

Use case: Integration

Amazon MSK's sole purpose is to integrate well with AWS Glue, Kinesis Data Analytics, and Lambda. Glue executes an Apache Spark job on a MSK Cluster, whereas RDA executes Apache Flink job on a Cluster.

Amazon Redshift

What is it?

Fast and petabyte scale, SQL based, data warehouse to analyze data easily. Along with this, it is also commonly used to perform large scale data migrations.

What are the functions of Amazon Redshift

What is the pricing of Amazon Redshift?

Use case:

The goal for Amazon Redshift is to allow for the building of an entire business intelligence to occur over a weekend. Anytime one has data that needs to be analyzed from a data lake, they can use Amazon Redshift to analyze the data.

AWS Glue

What is it?

AWS Glue is a serverless extract, transform, and load service used to categorize data and move data between various data stores and streams.

What are the functions of AWS Glue?

What is the pricing of AWS Glue?

Use case:

In a data centric world, we need ways to centralize data and make it easy to access for analysis and business intelligence. AWS Glue solves the problem of centralizing data by providing a service to merge data from multiple sources into a single data store. This allows for easy access to data from multiple sources, and allows for easy data analysis.

AWS Lake Formation

What is it?

AWS Lake Formation is a managed service that allows you to create, manage, and access data lakes. A data lake is a repository that stores all data in its original form and is used for analysis.

What are the functions of AWS Lake Formation

What is the pricing of AWS Lake Formation?

Use case:

Fanatics uses Amazon Simple Storage Service (Amazon S3) to provide secure, durable, and highly scalable storage for its analytical data. Using the Amazon S3 web service interface, the Fanatics data science team can easily store and quickly retrieve any amount of data. Taking advantage of its new AWS data lake solution, Fanatics is now able to analyze the huge volumes of data from its transactional, e-commerce, and back-office systems, and make this data available to its data scientists immediately for analytics.