Amazon S3
Object storage built to retrieve any amount of data from anywhere
5 GB of S3 standard storage
How it works
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements.
Enlarge and read image description.
Use cases
Build a data lake
Back up and restore critical data
Archive data at the lowest cost
Run cloud-native applications
Customers
How to get started
Find out how Amazon S3 works
Learn more about analytics, data management, query in place, storage classes, security, and more.
Explore Amazon S3 features »
Sign up for a free account
Pay nothing or try for free while learning the fundamentals and building on AWS.
Try the AWS Free Tier »
Connect with an expert
From development to enterprise-level programs, get the right support at the right time.
Explore support options »
Explore more of AWS
AWS support for Internet Explorer ends on 07/31/2022. Supported browsers are Chrome, Firefox, Edge, and Safari. Learn more »
Q: How do I create tables and schemas for my data on S3?
Athena uses Apache Hive DDL to define tables. You can run DDL statements using the Athena console, with an ODBC or JDBC driver, through the API, or using the Athena create table wizard. If you use the Data Catalog with
Athena, you can also use AWS Glue crawlers to automatically infer schemas and partitions. An AWS Glue crawler connects to a data store, progresses through a prioritized list of classifiers to extract the schema of your data and other statistics, and then populates the Data Catalog with this metadata. Crawlers can run periodically to detect the availability of new data and changes to existing data, including table definition changes. Crawlers automatically add new tables, new partitions to
existing table, and new versions of table definitions. You can customize AWS Glue crawlers to classify your own file types.
When you create a new table schema in Athena, the schema is stored in the Data Catalog and used when running queries, but it does not modify your data in S3. Athena uses an approach known as schema-on-read, which allows you to project your schema onto your data when you run a query. This decreases the need for any data loading or transformation. Learn more
about creating tables.
Q: Which data formats does Athena support?
Athena supports various data formats like CSV, TSV, JSON, or Textfiles and also supports open-source columnar formats, such as ORC and Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats. You can improve performance and reduce your costs by compressing, partitioning, and
using columnar formats.
Q: Which kinds of data types does Athena support?
Athena supports both simple data types, such as INTEGER, DOUBLE, and VARCHAR, and complex data types, such as MAPS, ARRAY, and STRUCT.
Q: Can I run any Hive Query on Athena?
Athena uses Hive only for DDL and creation/modification and deletion of tables and/or partitions. For a complete list of statements supported, review the Amazon
Athena User Guide: DDL statements. Athena uses Presto when you run SQL queries on S3. You can run ANSI-compliant SQL SELECT statements to query your data on S3.
Q: What is a SerDe?
SerDe stands for Serializer/Deserializer, which are libraries that tell Hive how to interpret data formats. Hive DDL statements require you to specify a SerDe so that the system knows how to interpret the data that you’re pointing to. Athena uses SerDes to interpret the data read from S3. The concept of SerDes in Athena is the same as the concept used in Hive. Amazon Athena supports the following SerDes:
- Apache Web Logs: "org.apache.hadoop.hive.serde2.RegexSerDe"
- CSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
- TSV: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
- Custom
Delimiters: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
- Parquet: "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
- Orc: "org.apache.hadoop.hive.ql.io.orc.OrcSerde"
- JSON: “org.apache.hive.hcatalog.data.JsonSerDe” OR org.openx.data.jsonserde.JsonSerDe
Q: Can I add my own SerDe to Athena?
Currently, you cannot add your own SerDe to Athena. We appreciate your
feedback, so if there are any SerDes that you would like to see added, contact the Athena team at .
Q: If I created Parquet/ORC files using Spark/Hive, will I be able to query them in Athena?
Yes, Parquet and ORC files created with Spark can be read in Athena.
Q: If I have data from Amazon Kinesis Data Firehose, how can I query it using Athena?
If your Kinesis Data Firehose data is stored in S3, you can query it using Athena. Create a schema for your data in Athena and start querying. We recommend that you organize the data into partitions to enhance performance. You can add partitions created by Data Firehose using ALTER TABLE DDL statements. Learn more about partitioning data.
Q: Does Athena support data partitioning?
Yes. You can partition your data on any column with Athena. Partitions allow you to limit the amount of data that each query scans, leading to cost savings and faster performance. You can specify your partitioning scheme using the PARTITIONED BY clause in the CREATE TABLE statement. Learn more about partitioning data.
Q: How do I add new data to an existing table in Athena?
If your data is partitioned, you will need to run a metadata query (ALTER TABLE ADD PARTITION) to add the partition to Athena after new data becomes available on S3. If your data is not partitioned, adding the new data (or files) to the existing prefix automatically adds the data to Athena. Learn more about partitioning data.
Q: If I already have large quantities of log data on S3, can I use Athena to query it?
Yes, Athena streamlines the running of standard SQL queries on your existing log data. Athena queries data directly from S3, so there’s no data movement or loading required. Define your schema using DDL statements and start querying your data right away.