This document provides basic guidelines for configuration properties and cluster architecture considerations related to performance tuning of an Apache Druid deployment. Please note that this document provides general guidelines and rules-of-thumb: these are not absolute, universal rules for cluster tuning, and this introductory guide is not an exhaustive description of all Druid tuning properties, which are described in the
configuration reference. If you have questions on tuning Druid for specific use cases, or questions on configuration properties not covered in this guide, please ask the Druid user mailing list or other community channels. The biggest contributions to
heap usage on Historicals are: A general rule-of-thumb for sizing the Historical heap is This rule-of-thumb scales using the number of CPU cores as a convenient proxy for hardware size and level of concurrency (note: this formula is not a hard rule for sizing
Historical heaps). Having a heap that is too large can result in excessively long GC collection pauses, the ~24GiB upper limit is imposed to avoid this. If caching is enabled on Historicals, the cache is stored on heap, sized by Running out of heap on the Historicals can indicate misconfiguration or usage patterns that are overloading the cluster. If you are using lookups, calculate the total size of the lookup maps being loaded. Druid
performs an atomic swap when updating lookup maps (both the old map and the new map will exist in heap during the swap), so the maximum potential heap usage from lookup maps will be (2 * total size of all loaded lookups). Be sure to add Please see the
General Guidelines for Processing Threads and Buffers section for an overview of processing thread/buffer configuration. On Historicals: The processing and merge buffers described above are direct memory buffers. When a historical processes a query, it must open a set of segments for reading. This also requires some direct memory space, described in
segment decompression buffers. A formula for estimating direct memory usage follows: ( The Please see the
General Connection Pool Guidelines section for an overview of connection pool configuration. For Historicals, Tuning the cluster so that each Historical can accept 50 queries and 10 non-queries is a reasonable starting point. For better query performance, do not allocate segment data to a Historical in excess of the system free memory. When Druid uses the Number of HistoricalsThe number of Historicals needed
in a cluster depends on how much data the cluster has. For good performance, you will want enough Historicals such that each Historical has a good ( Having a smaller number of big servers is generally better than having a large number of small servers, as long as you have enough fault tolerance for your use case. SSD storageWe recommend using SSDs for storage on the Historicals, as they handle segment data stored on disk. Total memory usageTo estimate total memory usage of the Historical under these guidelines:
The Historical will use any available free system memory (i.e., memory not used by the Historical JVM and heap/direct memory buffers or other processes on the system) for memory-mapping of segments on disk. For better query performance, you will want to ensure a good
( Segment sizes matterBe sure to check out segment size optimization to help tune your Historical processes for maximum performance. BrokerHeap sizingThe biggest contributions to heap usage on Brokers are:
The Broker heap requirements scale based on the number of segments in the cluster, and the total data size of the segments. The heap size will vary based on data size and usage patterns, but 4GiB to 8GiB is a good starting point for a small or medium cluster (~15 servers or less). For a rough estimate of memory requirements on the high end, very large clusters with a node count on the order of ~100 nodes may need Broker heaps of 30GiB-60GiB. If caching is enabled on the Broker, the cache is stored on heap, sized by Direct memory sizingOn the Broker, the amount of direct memory needed depends on how many merge buffers (used for merging GroupBys) are configured. The Broker does not generally need processing threads or processing buffers, as query results are merged on-heap in the HTTP connection threads instead.
Connection pool sizingPlease see the General Connection Pool Guidelines section for an overview of connection pool configuration. On the Brokers, please ensure that the sum of
Tuning the cluster so that each Historical can accept 50 queries and 10 non-queries, adjusting the Brokers accordingly, is a reasonable starting point. Broker backpressureWhen retrieving query results from Historical processes or Tasks, the Broker can optionally specify a maximum buffer size for queued, unread data, and exert backpressure on the channel to the Historical or Tasks when limit is reached (causing writes to the channel to block on the Historical/Task side until the Broker is able to drain some data from the channel). This buffer size is controlled by the The limit is divided across the number of Historicals/Tasks that a query would hit: suppose I have You can generally set this to a value of approximately
Number of brokersA 1:15 ratio of Brokers to Historicals is a reasonable starting point (this is not a hard rule). If you need Broker HA, you can deploy 2 initially and then use the 1:15 ratio guideline for additional Brokers. Total memory usageTo estimate total memory usage of the Broker under these guidelines:
MiddleManagerThe MiddleManager is a lightweight task controller/manager that launches Task processes, which perform ingestion work. MiddleManager heap sizingThe MiddleManager itself does not require much resources, you can set the heap to ~128MiB generally. SSD storageWe recommend using SSDs for storage on the MiddleManagers, as the Tasks launched by MiddleManagers handle segment data stored on disk. Task CountThe number of tasks a MiddleManager can launch is controlled by the The number of workers needed in your cluster depends on how many concurrent ingestion tasks you need to run for your use cases. The number of workers that can be launched on a given machine depends on the size of resources allocated per worker and available system resources. You can allocate more MiddleManager machines to your cluster to add task capacity. Task configurationsThe following section below describes configuration for Tasks launched by the MiddleManager. The Tasks can be queried and perform ingestion workloads, so they require more resources than the MM. Task heap sizingA 1GiB heap is usually enough for Tasks. LookupsIf you are using lookups, calculate the total size of the lookup maps being loaded. Druid performs an atomic swap when updating lookup maps (both the old map and the new map will exist in heap during the swap), so the maximum potential heap usage from lookup maps will be (2 * total size of all loaded lookups). Be sure to add Task processing threads and buffersFor Tasks, 1 or 2 processing threads are often enough, as the Tasks tend to hold much less queryable data than Historical processes.
Direct memory sizingThe processing and merge buffers described above are direct memory buffers. When a Task processes a query, it must open a set of segments for reading. This also requires some direct memory space, described in segment decompression buffers. An ingestion Task also needs to merge partial ingestion results, which requires direct memory space, described in segment merging. A formula for estimating direct memory usage follows: ( The Connection pool sizingPlease see the General Connection Pool Guidelines section for an overview of connection pool configuration. For Tasks, Tuning the cluster so that each Task can accept 50 queries and 10 non-queries is a reasonable starting point. Total memory usageTo estimate total memory usage of a Task under these guidelines:
The total memory usage of the MiddleManager + Tasks:
Configuration guidelines for specific ingestion typesKafka/Kinesis ingestionIf you use the Kafka Indexing Service or Kinesis Indexing Service, the number of tasks required will depend on the number of partitions and your taskCount/replica settings. On top of those requirements, allocating more task slots in your cluster is a good idea, so that you have free task slots available for other tasks, such as compaction tasks. Hadoop ingestionIf you are only using Hadoop-based batch ingestion with no other ingestion types, you can lower the amount of resources allocated per Task. Batch ingestion tasks do not need to answer queries, and the bulk of the ingestion workload will be executed on the Hadoop cluster, so the Tasks do not require much resources. Parallel native ingestionIf you are using parallel native batch ingestion, allocating more available task slots is a good idea and will allow greater ingestion concurrency. CoordinatorThe main performance-related setting on the Coordinator is the heap size. The heap requirements of the Coordinator scale with the number of servers, segments, and tasks in the cluster. You can set the Coordinator heap to the same size as your Broker heap, or slightly smaller: both services have to process cluster-wide state and answer API requests about this state. Dynamic Configuration
OverlordThe main performance-related setting on the Overlord is the heap size. The heap requirements of the Overlord scale primarily with the number of running Tasks. The Overlord tends to require less resources than the Coordinator or Broker. You can generally set the Overlord heap to a value that's 25-50% of your Coordinator heap. RouterThe Router has light resource requirements, as it proxies requests to Brokers without performing much computational work itself. You can assign it 256MiB heap as a starting point, growing it if needed. Guidelines for processing threads and buffersProcessing threadsThe Processing buffers
One buffer is allocated for each processing thread. A size between 500MiB and 1GiB is a reasonable choice for general use. The TopN and GroupBy queries use these buffers to store intermediate computed results. As the buffer size increases, more data can be processed in a single pass. GroupBy merging buffersIf you plan to issue GroupBy V2 queries, GroupBy V2 queries use an additional pool of off-heap buffers for merging query results. These buffers have the same size as the processing buffers described above, set by the Non-nested GroupBy V2 queries require 1 merge buffer per query, while a nested GroupBy V2 query requires 2 merge buffers (regardless of the depth of nesting). The number of merge buffers determines the number of GroupBy V2 queries that can be processed concurrently. Connection pool guidelinesEach Druid process has a configuration property for the number of HTTP connection handling threads, The number of HTTP server threads limits how many concurrent HTTP API requests a given process can handle. Sizing the connection pool for queriesThe Broker has a setting These connections are used to send queries to the Historicals or Tasks, with one connection per query; the value of Suppose we have a cluster with 3
Brokers and This means that each Broker in the cluster will open up to 10 connections to each individual Historical or Task (for a total of 30 incoming query connections per Historical/Task). On the Historical/Task side, this means that In practice, you will want to allocate additional server threads for non-query API requests such as status
checks; adding 10 threads for those is a good general guideline. Using the example with 3 Brokers in the cluster and As a starting point, allowing for 50 concurrent queries (requests that read segment data from datasources) + 10 non-query requests (other requests like status checks) on Historicals and Tasks is reasonable (i.e., set
Per-segment direct memory buffersSegment decompressionWhen opening a segment for reading during segment merging or query processing, Druid allocates a 64KiB off-heap decompression buffer for each column being read. Thus, there is additional direct memory overhead of (64KiB * number of columns read per segment * number of segments read) when reading segments. Segment mergingIn addition to the segment decompression overhead described above, when a set of segments are merged during ingestion, a direct buffer is allocated for every String typed column, for every segment in the set to be merged. The size of these buffers are equal to the cardinality of the String column within its segment, times 4 bytes (the buffers store integers). For example, if two segments are being merged, the first segment having a single String column with cardinality 1000, and the second segment having a String column with cardinality 500, the merge step would allocate (1000 + 500) * 4 = 6000 bytes of direct memory. These buffers are used for merging the value dictionaries of the String column across segments. These "dictionary merging buffers" are independent of the "merge buffers" configured by General recommendationsJVM tuningGarbage CollectionWe recommend using the G1GC garbage collector:
Enabling process termination on out-of-memory errors is useful as well, since the process generally will not recover from such a state, and it's better to restart the process:
Other generally useful JVM flags
Additionally, for large JVM heaps, here are a few Garbage Collection efficiency guidelines that have been known to help in some cases.
Use UTC timezoneWe recommend using UTC timezone for all your events and across your hosts, not just for Druid, but for all data infrastructure. This can greatly mitigate potential query problems with inconsistent timezones. To query in a non-UTC timezone see query granularities System configurationSSDsSSDs are highly recommended for Historical, MiddleManager, and Indexer processes if you are not running a cluster that is entirely in memory. SSDs can greatly mitigate the time required to page data in and out of memory. JBOD vs RAIDHistorical processes store large number of segments on Disk and support specifying multiple paths for storing those. Typically, hosts have multiple disks configured with RAID which makes them look like a single disk to OS. RAID might have overheads specially if its not hardware controller based but software based. So, Historicals might get improved disk throughput with JBOD. Swap spaceWe recommend not using swap space for Historical, MiddleManager, and Indexer processes since due to the large number of memory mapped segment files can lead to poor and unpredictable performance. Linux limitsFor Historical, MiddleManager, and Indexer processes (and for really large clusters, Broker processes), you might need to adjust some Linux system limits to account for a large number of open files, a large number of network connections, or a large number of memory mapped files. ulimitThe limit on the number of
open files can be set permanently by editing max_map_countHistorical processes and to a lesser extent, MiddleManager and Indexer processes memory map segment files, so depending on the number of segments per server, Which of the following is the generally accepted rule for manual setting the initial and maximum paging file size?Your paging file size should be 1.5 times your physical memory at a minimum and up to 4 times the physical memory at most to ensure system stability.
Which of the following RAID configurations uses disk striping and includes the ability to replace a failed drive and rebuild the raid without shutting down the server?RAID 5 is a redundant array of independent disks configuration that uses disk striping with parity. Because data and parity are striped evenly across all of the disks, no single disk is a bottleneck. Striping also allows users to reconstruct data in case of a disk failure.
Which of the following commands can you use to determine your computer's ability to connect to the network?The ping command is used to determine the ability of a user's computer to reach a destination computer. The main purpose of using this command is to verify if the computer can connect over the network to another computer or network device.
Which of the following file attributes tells the Windows operating system to backup the file?For example, IBM compatible computers running MS-DOS or Microsoft Windows have capabilities of having read, archive, system, and hidden attributes. Read-only - Allows a file to be read, but nothing can be written to the file or changed. Archive - Tells Windows Backup to back up the file.
|