Which AWS service provides the ability to quickly run one-time queries on data in Amazon S3

Amazon S3

Object storage built to retrieve any amount of data from anywhere

5 GB of S3 standard storage

How it works

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements.

 Enlarge and read image description.

Use cases

Build a data lake

Back up and restore critical data

Archive data at the lowest cost

Run cloud-native applications

Customers

How to get started

Find out how Amazon S3 works

Learn more about analytics, data management, query in place, storage classes, security, and more.

Explore Amazon S3 features »

Sign up for a free account

Pay nothing or try for free while learning the fundamentals and building on AWS.

Try the AWS Free Tier »

Connect with an expert

From development to enterprise-level programs, get the right support at the right time.

Explore support options »

Explore more of AWS

AWS support for Internet Explorer ends on 07/31/2022. Supported browsers are Chrome, Firefox, Edge, and Safari. Learn more »

Q: How do I create tables and schemas for my data on S3?

Athena uses Apache Hive DDL to define tables. You can run DDL statements using the Athena console, with an ODBC or JDBC driver, through the API, or using the Athena create table wizard. If you use the Data Catalog with Athena, you can also use AWS Glue crawlers to automatically infer schemas and partitions. An AWS Glue crawler connects to a data store, progresses through a prioritized list of classifiers to extract the schema of your data and other statistics, and then populates the Data Catalog with this metadata. Crawlers can run periodically to detect the availability of new data and changes to existing data, including table definition changes. Crawlers automatically add new tables, new partitions to existing table, and new versions of table definitions. You can customize AWS Glue crawlers to classify your own file types. 

When you create a new table schema in Athena, the schema is stored in the Data Catalog and used when running queries, but it does not modify your data in S3. Athena uses an approach known as schema-on-read, which allows you to project your schema onto your data when you run a query. This decreases the need for any data loading or transformation. Learn more about creating tables. 

Q: Which data formats does Athena support?

Athena supports various data formats like CSV, TSV, JSON, or Textfiles and also supports open-source columnar formats, such as ORC and Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats. You can improve performance and reduce your costs by compressing, partitioning, and using columnar formats. 

Q: Which kinds of data types does Athena support?

Athena supports both simple data types, such as INTEGER, DOUBLE, and VARCHAR, and complex data types, such as MAPS, ARRAY, and STRUCT.   

Q: Can I run any Hive Query on Athena?

Athena uses Hive only for DDL and creation/modification and deletion of tables and/or partitions. For a complete list of statements supported, review the Amazon Athena User Guide: DDL statements. Athena uses Presto when you run SQL queries on S3. You can run ANSI-compliant SQL SELECT statements to query your data on S3.

Q: What is a SerDe?

SerDe stands for Serializer/Deserializer, which are libraries that tell Hive how to interpret data formats. Hive DDL statements require you to specify a SerDe so that the system knows how to interpret the data that you’re pointing to. Athena uses SerDes to interpret the data read from S3. The concept of SerDes in Athena is the same as the concept used in Hive. Amazon Athena supports the following SerDes:

  • Apache Web Logs: "org.apache.hadoop.hive.serde2.RegexSerDe"
  • CSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
  • TSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
  • Custom Delimiters: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
  • Parquet: "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
  • Orc: "org.apache.hadoop.hive.ql.io.orc.OrcSerde"
  • JSON: “org.apache.hive.hcatalog.data.JsonSerDe” OR org.openx.data.jsonserde.JsonSerDe

Q: Can I add my own SerDe to Athena?

Currently, you cannot add your own SerDe to Athena. We appreciate your feedback, so if there are any SerDes that you would like to see added, contact the Athena team at .

Q: If I created Parquet/ORC files using Spark/Hive, will I be able to query them in Athena?

Yes, Parquet and ORC files created with Spark can be read in Athena.

Q: If I have data from Amazon Kinesis Data Firehose, how can I query it using Athena?

If your Kinesis Data Firehose data is stored in S3, you can query it using Athena. Create a schema for your data in Athena and start querying. We recommend that you organize the data into partitions to enhance performance. You can add partitions created by Data Firehose using ALTER TABLE DDL statements. Learn more about partitioning data. 

Q: Does Athena support data partitioning?

Yes. You can partition your data on any column with Athena. Partitions allow you to limit the amount of data that each query scans, leading to cost savings and faster performance. You can specify your partitioning scheme using the PARTITIONED BY clause in the CREATE TABLE statement. Learn more about partitioning data. 

Q: How do I add new data to an existing table in Athena?

If your data is partitioned, you will need to run a metadata query (ALTER TABLE ADD PARTITION) to add the partition to Athena after new data becomes available on S3. If your data is not partitioned, adding the new data (or files) to the existing prefix automatically adds the data to Athena. Learn more about partitioning data.

Q: If I already have large quantities of log data on S3, can I use Athena to query it?

Yes, Athena streamlines the running of standard SQL queries on your existing log data. Athena queries data directly from S3, so there’s no data movement or loading required. Define your schema using DDL statements and start querying your data right away.

Which AWS service enables you to swiftly conduct one time queries on Amazon S3 data?

Athena is serverless. You can quickly query your data without having to setup and manage any servers or data warehouses. Just point to your data in Amazon S3, define the schema, and start querying using the built-in query editor.

Which AWS service enables conventional SQL queries against stored datasets straight from Amazon S3?

Amazon Athena leverages Presto, a low-latency, interactive SQL query engine. You can perform ANSI SQL queries against big Amazon S3 datasets with support for massive joins, window functions, and arrays. Athena supports CSV, JSON, ORC, Avro, and Parquet.

Which AWS service provides a quick and automated way to create and manage AWS accounts?

AWS CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts.

Which AWS service acts as a data extract transform and load ETL tool to make it easy to prepare data for analytics?

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives.

Toplist

Neuester Beitrag

Stichworte