Data analysis allows us to understand how consumers are using our products and services. This can help us to improve our products and services and cater to our customers better.
AWS Analytics is a group of services that allows you to analyze any data that is stored within the cloud. For customer data, AWS analytics is responsible for forwarding data to AWS services that can process and extract information and trends. For cloud usage, it provides you with a dashboard that shows you the monitoring metrics one has set up regarding AWS usage and can provide a means to act on the corresponding metrics.
Amazon Athena is an interactive serverless service used to analyze data directly in Amazon Simple Storage Service (S3) by using standard SQL queries when needed.
In the following use case, we are analyzing streaming data from Amazon Kinesis Data Firehose.
The data initially is the clickstream from a user on a website.
Firehose takes the data, converts the data to columnar format, then stores it into an S3 bucket to be analyzed.
This data is stored in Amazon S3 by using the Kinesis Data Firehose.
The data is then analyzed using Athena from the S3 bucket
Finally, Amazon Quicksight is used to visualize the data and provide crucial insights for business intelligence services.
In this usecase, we will be monitoring the actions taken by customer support agents in a company.
Amazon connect is a cloud contact center that allows the use of AI / ML to automate customer support interactions.
It is important to monitor decisions made by these AI / ML agents as they are made autonomously without human intervention.
If there was an issue with an action taken by an agent, one would want to know what the action was, the time it was taken, and the outcome of the action.
Amazon connect forwards the logs using Kinesis data streams, Kinesis data firehose, and Lambda functions to Amazon Elasticsearch and an S3 bucket.
Elasticsearch can be used to analyze the data and visualize the data with Kabana.
An example of a Kabana visualization is shown below.
Amazon EMR (Elastic Map Reduce) is a service for processing and analyzing large amounts of data in the cloud using apache hive, hadoop, apache flink, and apache spark.
Amazon EMR is commonly used in Machine Learning, Big Data, and Bioinformatics. A common example would be a smart watch sending all data and running some select EC2 instances everytime new data is sent.
Realtime data streaming service. Captures GBs of data from sources like website clickstreams, events streams (database and location tracking), and social media feeds. The Kinesis family is made up of the following: Datastreams, Firehose, Data Analytics, and Video Streams.
This can be used for a variety of use cases, for example: fraud detection, live leaderboards, and video processing. In this use case we will cover the latter. Using a security camera, we can forward the feed with Kinesis data streams. This feed can fed into AWS Sagemaker / Rekognition Video to automatically detect objects in the video. This has a variety of use cases ranging from detecting a firearm in a video, to detecting a person in a video.
Serverless service that loads data stores and analytics services by capturing, transforming, and load streaming data.
The majority of the use cases for this service is to serve as a data transfer service to S3, Redshift, Elasticsearch, and Splunk.
Amazon MSk is a managed cluster service used to build and execute Apache Kafka Applications for the processing of streaming data.
Kinesis has at least once delivery, whereas MSK guarantees exactly once.
Amazon MSK's sole purpose is to integrate well with AWS Glue, Kinesis Data Analytics, and Lambda. Glue executes an Apache Spark job on a MSK Cluster, whereas RDA executes Apache Flink job on a Cluster.
Fast and petabyte scale, SQL based, data warehouse to analyze data easily. Along with this, it is also commonly used to perform large scale data migrations.
The goal for Amazon Redshift is to allow for the building of an entire business intelligence to occur over a weekend. Anytime one has data that needs to be analyzed from a data lake, they can use Amazon Redshift to analyze the data.
AWS Glue is a serverless extract, transform, and load service used to categorize data and move data between various data stores and streams.
In a data centric world, we need ways to centralize data and make it easy to access for analysis and business intelligence. AWS Glue solves the problem of centralizing data by providing a service to merge data from multiple sources into a single data store. This allows for easy access to data from multiple sources, and allows for easy data analysis.
AWS Lake Formation is a managed service that allows you to create, manage, and access data lakes. A data lake is a repository that stores all data in its original form and is used for analysis.
Fanatics uses Amazon Simple Storage Service (Amazon S3) to provide secure, durable, and highly scalable storage for its analytical data. Using the Amazon S3 web service interface, the Fanatics data science team can easily store and quickly retrieve any amount of data. Taking advantage of its new AWS data lake solution, Fanatics is now able to analyze the huge volumes of data from its transactional, e-commerce, and back-office systems, and make this data available to its data scientists immediately for analytics.