Production Checklist

On this page

This page provides important recommendations for production deployments of CockroachDB.

Topology

When planning your deployment, it's important to carefully review and choose the topology patterns that best meet your latency and resiliency requirements. This is especially crucial for multi-region deployments.

Also keep in mind some basic topology recommendations:

Do not run multiple node processes on the same VM or machine. This defeats CockroachDB's replication and causes the system to be a single point of failure. Instead, start each node on a separate VM or machine.
To start a node with multiple disks or SSDs, you can use either of these approaches:
- Configure the disks or SSDs as a single RAID volume, then pass the RAID volume to the --store flag when starting the cockroach process on the node.
- Provide a separate --store flag for each disk when starting the cockroach process on the node. For more details about stores, see Start a Node.
Warning:
If you start a node with multiple --store flags, it is not possible to scale back down to only using a single store on the node. Instead, you must decommission the node and start a new node with the updated --store.
When starting each node, use the --locality flag to describe the node's location, for example, --locality=region=west,zone=us-west-1. The key-value pairs should be ordered from most to least inclusive, and the keys and order of key-value pairs must be the same on all nodes.
When deploying in a single availability zone:
- To be able to tolerate the failure of any 1 node, use at least 3 nodes with the default 3-way replication factor. In this case, if 1 node fails, each range retains 2 of its 3 replicas, a majority.
- To be able to tolerate 2 simultaneous node failures, use at least 5 nodes and increase the default replication factor for user data to 5. The replication factor for important internal data is 5 by default, so no adjustments are needed for internal data. In this case, if 2 nodes fail at the same time, each range retains 3 of its 5 replicas, a majority.
When deploying across multiple availability zones:
- To be able to tolerate the failure of 1 entire AZ in a region, use at least 3 AZs per region and set --locality on each node to spread data evenly across regions and AZs. In this case, if 1 AZ goes offline, the 2 remaining AZs retain a majority of replicas.
- To ensure that ranges are split evenly across nodes, use the same number of nodes in each AZ. This is to avoid overloading any nodes with excessive resource consumption.
When deploying across multiple regions:
- To be able to tolerate the failure of 1 entire region, use at least 3 regions.

For optimal cluster performance, Cockroach Labs recommends that all nodes use the same hardware and operating system.

Software

We recommend running a glibc-based Linux distribution and Linux kernel version from the last 5 years, such as Ubuntu, Red Hat Enterprise Linux (RHEL), CentOS, or Container-Optimized OS.

Hardware

Note:

In our sizing and production guidance, 1 vCPU is considered equivalent to 1 core in the underlying hardware platform.

Sizing

The size of your cluster corresponds to its total number of vCPUs. This number depends holistically on your application requirements: total storage capacity, SQL workload response time, SQL workload concurrency, and database service availability.

Working from your total vCPU count, you should then decide how many vCPUs to allocate to each machine. The larger the nodes (i.e., the more vCPUs on the machine), the fewer nodes will be in your cluster.

Carefully consider the following tradeoffs:

A smaller number of larger nodes emphasizes cluster stability.
- Larger nodes tolerate hot spots more effectively than smaller nodes.
- Queries operating on large data sets may strain network transfers if the data is spread widely over many smaller nodes. Having fewer and larger nodes enables more predictable workload performance.
- A cluster with fewer nodes may be easier to operate and maintain.
A larger number of smaller nodes emphasizes resiliency across failure scenarios.
- The loss of a small node during failure or routine maintenance has a lesser impact on workload response time and concurrency.
- Having more and smaller nodes allows backup and restore jobs to complete more quickly, since these jobs run in parallel and less data is hosted on each individual node.
- Recovery from a failed node is faster when data is spread across more nodes. A smaller node will also take a shorter time to rebalance to a steady state.

In general, distribute your total vCPUs into the largest possible nodes and smallest possible cluster that meets your fault tolerance goals.

For cluster stability, Cockroach Labs recommends a minimum of 8 vCPUs , and strongly recommends no fewer than 4 vCPUs per node. In a cluster with too few CPU resources, foreground client workloads will compete with the cluster's background maintenance tasks. For more information, see capacity planning issues.

Note:

Clusters deployed in CockroachDB Cloud can be created with a minimum of 2 vCPUs per node on AWS and GCP or 4 vCPUs per node on Azure.
Avoid "burstable" or "shared-core" virtual machines that limit the load on CPU resources.
Cockroach Labs does not extensively test clusters with more than 32 vCPUs per node. This is the recommended maximum threshold.

Basic hardware recommendations

After you size your cluster, you can determine the amount of RAM, storage capacity, and disk I/O from the number of vCPUs.

This hardware guidance is meant to be platform agnostic and can apply to bare-metal, containerized, and orchestrated deployments. Also see our cloud-specific recommendations.

Value	Recommendation	Reference
RAM per vCPU	4 GiB	Memory
Capacity per vCPU	320 GiB per vCPU	Storage
IOPS per vCPU	500	Disk I/O
MB/s per vCPU	30	Disk I/O

Before deploying to production, test and tune your hardware setup for your application workload. For example, read-heavy and write-heavy workloads will place different emphases on CPU, RAM, storage, I/O, and network capacity.

Memory

Provision at least 4 GiB of RAM per vCPU for consistency across a variety of workload complexities. The minimum acceptable ratio is 2 GiB of RAM per vCPU, which is only suitable for testing.

Note:

The benefits to having more RAM decrease as the number of vCPUs increases.

To optimize for the support of large numbers of tables, increase the amount of RAM. For more information, see Quantity of tables and other schema objects. Supporting a large number of rows is a matter of Storage.
To ensure consistent SQL performance, make sure all nodes have a uniform configuration.
Disable Linux memory swapping. Over-allocating memory on production machines can lead to unexpected performance issues when pages have to be read back into memory.
To help guard against out-of-memory (OOM) crashes, consider tuning the cache and SQL memory for cluster nodes. Refer to the section Cache and SQL memory size.
Monitor CPU and memory usage. Ensure that they remain within acceptable limits.

Note:

Under-provisioning RAM results in reduced performance (due to reduced caching and increased spilling to disk), and in some cases can cause OOM crashes. For more information, see memory issues.

Storage

We recommend provisioning volumes with 320 GiB per vCPU. It's fine to have less storage per vCPU if your workload does not have significant capacity needs.

The maximum recommended storage capacity per node is 10 TiB, regardless of the number of vCPUs.
Use dedicated volumes for the CockroachDB store. Do not share the store volume with any other I/O activity.
The recommended Linux filesystems are ext4 and XFS.
Always keep some of your disk capacity free on production. Doing so accommodates fluctuations in routine database operations and supports continuous data growth.
Monitor your storage utilization and rate of growth, and take action to add capacity well before you hit the limit.
CockroachDB will automatically provision an emergency ballast file at node startup. In the unlikely case that a node runs out of disk space and shuts down, you can delete the ballast file to free up enough space to be able to restart the node.
Use zone configs to increase the replication factor from 3 (the default) to 5 (across at least 5 nodes).

This is especially recommended if you are using local disks with no RAID protection rather than a cloud provider's network-attached disks that are often replicated under the hood, because local disks have a greater risk of failure. You can do this for the entire cluster or for specific databases, tables, or rows (Enterprise-only).

Note:

Under-provisioning storage leads to node crashes when the disks fill up. Once this has happened, it is difficult to recover from. To prevent your disks from filling up, provision enough storage for your workload, monitor your disk usage, and use a ballast file. For more information, see capacity planning issues and storage issues.

Tip:

For instructions on how to free up disk space as quickly as possible after deleting data, see How can I free up disk space quickly?

Disk I/O

Disks must be able to achieve 500 IOPS and 30 MB/s per vCPU .

Monitor IOPS using the DB Console and iostat. Ensure that they remain within acceptable values.
Use sysbench to benchmark IOPS on your cluster. If IOPS decrease, add more nodes to your cluster to increase IOPS.
Do not use LVM in the I/O path. Dynamically resizing CockroachDB store volumes can result in significant performance degradation. Using LVM snapshots in lieu of CockroachDB backup and restore is also not supported.
The optimal configuration for striping more than one device is RAID 10. RAID 0 and 1 are also acceptable from a performance perspective.

Note:

Disk I/O especially affects performance on write-heavy workloads. For more information, see capacity planning issues and node liveness issues.

Node density testing configuration

In a narrowly-scoped test, we were able to successfully store 4.32 TiB of logical data per node. The results of this test may not be applicable to your specific situation; testing with your workload is strongly recommended before using it in a production environment.

These results were achieved using the "bank" workload running on AWS using 6x c5d.4xlarge nodes, each with 5 TiB of gp2 EBS storage.

Results:

Value	Result
vCPU per Node	16
Logical Data per Node	4.32 TiB
RAM per Node	32 GiB
Data per Core	~270 GiB / vCPU
Data per RAM	~135 GiB / GiB

Cloud-specific recommendations

Cockroach Labs creates a yearly cloud report focused on evaluating hardware performance. For more information, see the 2022 Cloud Report.

Based on our internal testing, we recommend the following cloud-specific configurations. Before using configurations not recommended here, be sure to test them exhaustively. Also consider the following workload-specific observations:

For OLTP applications, small instance types may outperform larger instance types.
Larger, more complex workloads will likely see more consistent performance from instance types with more available memory.
Unless your workload requires extremely high IOPS or very low storage latency, the most cost-effective volumes are general-purpose rather than high-performance volumes.
- Because storage cost influences the cost of running a workload much more than instance cost, larger nodes offer a better price-for-performance ratio at the same workload complexity.

AWS

Use general-purpose m6i or m6a VMs with SSD-backed EBS volumes. For example, Cockroach Labs has used m6i.2xlarge for performance benchmarking. If your workload requires high throughput, use network-optimized m5n instances. To simulate bare-metal deployments, use m5d with SSD Instance Store volumes.
- m5 and m5a instances, and compute-optimized c5, c5a, and c5n instances, are also acceptable.
Warning:

Do not use burstable performance instances, which limit the load on a single core.
General Purpose SSD gp3 volumes are a cost-effective storage option. gp3 volumes provide 3,000 IOPS and 125 MiB/s throughput by default. If your deployment requires more IOPS or throughput, per our hardware recommendations, you must provision these separately. Provisioned IOPS SSD-backed (io2) EBS volumes also need to have IOPS provisioned.
A typical deployment uses EC2 together with key pairs, load balancers, and security groups. For an example, refer to Deploy CockroachDB on AWS EC2.

Azure

Use general-purpose Dsv5-series and Dasv5-series or memory-optimized Ev5-series and Easv5-series VMs. For example, Cockroach Labs has used Standard_D8s_v5, Standard_D8as_v5, Standard_E8s_v5, and Standard_e8as_v5 for performance benchmarking.
- Compute-optimized F-series VMs are also acceptable.
Warning:

Do not use "burstable" B-series VMs, which limit the load on CPU resources. Also, Cockroach Labs has experienced data corruption issues on A-series VMs, so we recommend avoiding those as well.
Use Premium Storage or local SSD storage with a Linux filesystem such as ext4 (not the Windows ntfs filesystem). Note that the size of a Premium Storage disk affects its IOPS.
If you choose local SSD storage, on reboot, the VM can come back with the ntfs filesystem. Be sure your automation monitors for this and reformats the disk to the Linux filesystem you chose initially.

Digital Ocean

Use any droplets except standard droplets with only 1 GiB of RAM, which is below our minimum requirement. All Digital Ocean droplets use SSD storage.

GCP

Use general-purpose t2d-standard, n2-standard, or n2d-standard VMs, or use custom VMs. For example, Cockroach Labs has used t2d-standard-8, n2-standard-8, and n2d-standard-8 for performance benchmarking.

Warning:

Do not use f1 or g1 shared-core machines, which limit the load on CPU resources.
Use pd-ssd SSD persistent disks or local SSDs. Note that the IOPS of SSD persistent disks depends both on the disk size and number of vCPUs on the machine.
nobarrier can be used with SSDs, but only if it has battery-backed write cache. Without one, data can be corrupted in the event of a crash.

Cockroach Labs conducts most of our internal performance tests using nobarrier to demonstrate the best possible performance, but understand that not all use cases can support this option.

Security

An insecure cluster comes with serious risks:

Your cluster is open to any client that can access any node's IP addresses.
Any user, even root, can log in without providing a password.
Any user, connecting as root, can read or write any data in your cluster.
There is no network encryption or authentication, and thus no confidentiality.

Therefore, to deploy CockroachDB in production, it is strongly recommended to use TLS certificates to authenticate the identity of nodes and clients and to encrypt data in transit between nodes and clients. You can use either the built-in cockroach cert commands or openssl commands to generate security certificates for your deployment. Regardless of which option you choose, you'll need the following files:

A certificate authority (CA) certificate and key, used to sign all of the other certificates.
A separate certificate and key for each node in your deployment, with the common name node.
A separate certificate and key for each client and user you want to connect to your nodes, with the common name set to the username. The default user is root.

Alternatively, CockroachDB supports password authentication, although we typically recommend using client certificates instead.

For more information, see the Security Overview.

Networking

Networking flags

When starting a node, two main flags are used to control its network connections:

--listen-addr determines which address(es) to listen on for connections from other nodes and clients.
--advertise-addr determines which address to tell other nodes to use.

The effect depends on how these two flags are used in combination:

	`--listen-addr` not specified	`--listen-addr` specified
`--advertise-addr` not specified	Node listens on all of its IP addresses on port `26257` and advertises its canonical hostname to other nodes.	Node listens on the IP address/hostname and port specified in `--listen-addr` and advertises this value to other nodes.
`--advertise-addr` specified	Node listens on all of its IP addresses on port `26257` and advertises the value specified in `--advertise-addr` to other nodes. Recommended for most cases.	Node listens on the IP address/hostname and port specified in `--listen-addr` and advertises the value specified in `--advertise-addr` to other nodes. If the `--advertise-addr` port number is different than the one used in `--listen-addr`, port forwarding is required.

Tip:

When using hostnames, make sure they resolve properly (e.g., via DNS or etc/hosts). In particular, be careful about the value advertised to other nodes, either via --advertise-addr or via --listen-addr when --advertise-addr is not specified.

Cluster on a single network

When running a cluster on a single network, the setup depends on whether the network is private. In a private network, machines have addresses restricted to the network, not accessible to the public internet. Using these addresses is more secure and usually provides lower latency than public addresses.

Private? Recommended setup

Yes Start each node with --listen-addr set to its private IP address and do not specify --advertise-addr. This will tell other nodes to use the private IP address advertised. Load balancers/clients in the private network must use it as well.

No Start each node with --advertise-addr set to a stable public IP address that routes to the node and do not specify --listen-addr. This will tell other nodes to use the specific IP address advertised, but load balancers/clients will be able to use any address that routes to the node.

If load balancers/clients are outside the network, also configure firewalls to allow external traffic to reach the cluster.

Private?	Recommended setup
Yes	Start each node with `--listen-addr` set to its private IP address and do not specify `--advertise-addr`. This will tell other nodes to use the private IP address advertised. Load balancers/clients in the private network must use it as well.
No	Start each node with `--advertise-addr` set to a stable public IP address that routes to the node and do not specify `--listen-addr`. This will tell other nodes to use the specific IP address advertised, but load balancers/clients will be able to use any address that routes to the node. If load balancers/clients are outside the network, also configure firewalls to allow external traffic to reach the cluster.

Cluster spanning multiple networks

When running a cluster across multiple networks, the setup depends on whether nodes can reach each other across the networks.

Nodes reachable across networks? Recommended setup

Yes This is typical when all networks are on the same cloud. In this case, use the relevant single network setup above.

No This is typical when networks are on different clouds. In this case, set up a VPN, VPC, NAT, or another such solution to provide unified routing across the networks. Then start each node with --advertise-addr set to the address that is reachable from other networks and do not specify --listen-addr. This will tell other nodes to use the specific IP address advertised, but load balancers/clients will be able to use any address that routes to the node.

Also, if a node is reachable from other nodes in its network on a private or local address, set --locality-advertise-addr to that address. This will tell nodes within the same network to prefer the private or local address to improve performance. Note that this feature requires that each node is started with the --locality flag. For more details, see this example.

Nodes reachable across networks?	Recommended setup
Yes	This is typical when all networks are on the same cloud. In this case, use the relevant single network setup above.
No	This is typical when networks are on different clouds. In this case, set up a VPN, VPC, NAT, or another such solution to provide unified routing across the networks. Then start each node with `--advertise-addr` set to the address that is reachable from other networks and do not specify `--listen-addr`. This will tell other nodes to use the specific IP address advertised, but load balancers/clients will be able to use any address that routes to the node. Also, if a node is reachable from other nodes in its network on a private or local address, set `--locality-advertise-addr` to that address. This will tell nodes within the same network to prefer the private or local address to improve performance. Note that this feature requires that each node is started with the `--locality` flag. For more details, see this example.

Load balancing

Each CockroachDB node is an equally suitable SQL gateway to a cluster, but to ensure client performance and reliability, it's important to use load balancing:

Performance: Load balancers spread client traffic across nodes. This prevents any one node from being overwhelmed by requests and improves overall cluster performance (queries per second).
Reliability: Load balancers decouple client health from the health of a single CockroachDB node. To ensure that traffic is not directed to failed nodes or nodes that are not ready to receive requests, load balancers should use CockroachDB's readiness health check.
Tip:
With a single load balancer, client connections are resilient to node failure, but the load balancer itself is a point of failure. It's therefore best to make load balancing resilient as well by using multiple load balancing instances, with a mechanism like floating IPs or DNS to select load balancers for clients.

For guidance on load balancing, see the tutorial for your deployment environment:

Environment	Featured Approach
On-Premises	Use HAProxy.
AWS	Use Amazon's managed load balancing service.
Azure	Use Azure's managed load balancing service.
Digital Ocean	Use Digital Ocean's managed load balancing service.
GCP	Use GCP's managed TCP proxy load balancing service.

Connection pooling

Creating the appropriate size pool of connections is critical to gaining maximum performance in an application. Too few connections in the pool will result in high latency as each operation waits for a connection to open up. But adding too many connections to the pool can also result in high latency as each connection thread is being run in parallel by the system. The time it takes for many threads to complete in parallel is typically higher than the time it takes a smaller number of threads to run sequentially.

The total number of workload connections across all connection pools should not exceed 4 times the number of vCPUs in the cluster by a large amount..

For guidance on sizing, validating, and using connection pools with CockroachDB, see Use Connection Pools.

To control the maximum number of non-superuser (root user or other admin role) connections a gateway node can have open at one time, use the server.max_connections_per_gateway cluster setting. If a new non-superuser connection would exceed this limit, the error message "sorry, too many clients already" is returned, along with error code 53300. This may be useful in addition to your connection pool settings.

Monitoring and alerting

Despite CockroachDB's various built-in safeguards against failure, it is critical to actively monitor the overall health and performance of a cluster running in production and to create alerting rules that promptly send notifications when there are events that require investigation or intervention.

For details about available monitoring options and the most important events and metrics to alert on, see Monitoring and Alerting.

Backup and restore

CockroachDB is purpose-built to be fault-tolerant and to recover automatically, but sometimes disasters happen. Having a disaster recovery plan enables you to recover quickly, while limiting the consequences.

Taking regular backups of your data in production is an operational best practice. You can create full or incremental backups of a cluster, database, or table. We recommend taking backups to cloud storage and enabling object locking to protect the validity of your backups. CockroachDB supports Amazon S3, Azure Storage, and Google Cloud Storage for backups.

For details about available backup and restore types in CockroachDB, see Backup and restore types.

Clock synchronization

CockroachDB requires moderate levels of clock synchronization to preserve data consistency. For this reason, when a node detects that its clock is out of sync with at least half of the other nodes in the cluster by 80% of the maximum offset allowed, it spontaneously shuts down. This offset defaults to 500ms but can be changed via the --max-offset flag when starting each node.

While serializable consistency is maintained regardless of clock skew, skew outside the configured clock offset bounds can result in violations of single-key linearizability between causally dependent transactions. It's therefore important to prevent clocks from drifting too far by running NTP or other clock synchronization software on each node.

In very rare cases, CockroachDB can momentarily run with a stale clock. This can happen when using vMotion, which can suspend a VM running CockroachDB, migrate it to different hardware, and resume it. This will cause CockroachDB to be out of sync for a short period before it jumps to the correct time. During this window, it would be possible for a client to read stale data and write data derived from stale reads. By enabling the server.clock.forward_jump_check_enabled cluster setting, you can be alerted when the CockroachDB clock jumps forward, indicating it had been running with a stale clock. To protect against this on vMotion, however, use the --clock-device flag to specify a PTP hardware clock for CockroachDB to use when querying the current time. When doing so, you should not enable server.clock.forward_jump_check_enabled because forward jumps will be expected and harmless. For more information on how --clock-device interacts with vMotion, see this blog post.

Warning:

In CockroachDB versions prior to v22.2.13, and in v23.1 versions prior to v23.1.9, the --clock-device flag had a bug that could cause it to generate timestamps in the far future. This could cause nodes to crash due to incorrect timestamps, or in the worst case irreversibly advance the cluster's HLC clock into the far future. This bug is fixed in CockroachDB v23.2.

Considerations

When setting up clock synchronization:

All nodes in the cluster must be synced to the same time source, or to different sources that implement leap second smearing in the same way. For example, Google and Amazon have time sources that are compatible with each other (they implement leap second smearing in the same way), but are incompatible with the default NTP pool (which does not implement leap second smearing).
For nodes running in AWS, we recommend Amazon Time Sync Service. For nodes running in GCP, we recommend Google's internal NTP service. For nodes running elsewhere, we recommend Google Public NTP. Note that the Google and Amazon time services can be mixed with each other, but they cannot be mixed with other time services (unless you have verified leap second behavior). Either all of your nodes should use the Google and Amazon services, or none of them should.
If you do not want to use the Google or Amazon time sources, you can use chrony and enable client-side leap smearing, unless the time source you're using already does server-side smearing. In most cases, we recommend the Google Public NTP time source because it handles smearing the leap second. If you use a different NTP time source that doesn't smear the leap second, you must configure client-side smearing manually and do so in the same way on each machine.
Do not run more than one clock sync service on VMs where cockroach is running.
For new clusters using the multi-region SQL abstractions, Cockroach Labs recommends lowering the --max-offset setting to 250ms. This setting is especially helpful for lowering the write latency of global tables. Nodes can run with different values for --max-offset, but only for the purpose of updating the setting across the cluster using a rolling upgrade.

Tutorials

For guidance on synchronizing clocks, see the tutorial for your deployment environment:

Environment	Featured Approach
On-Premises	Use NTP with Google's external NTP service.
AWS	Use the Amazon Time Sync Service.
Azure	Disable Hyper-V time synchronization and use NTP with Google's external NTP service.
Digital Ocean	Use NTP with Google's external NTP service.
GCE	Use NTP with Google's internal NTP service.

Cache and SQL memory size

CockroachDB manages its own memory caches, independently of the operating system. These are configured via the --cache and --max-sql-memory flags.

Each node has a default cache size of 128MiB that is passively consumed. The default was chosen to facilitate development and testing, where users are likely to run multiple CockroachDB nodes on a single machine. Increasing the cache size will generally improve the node's read performance.

Each node has a default SQL memory size of 25%. This memory is used as-needed by active operations to store temporary data for SQL queries.

Increasing a node's cache size will improve the node's read performance.
Increasing a node's SQL memory size will increase the number of simultaneous client connections it allows, as well as the node's capacity for in-memory processing of rows when using ORDER BY, GROUP BY, DISTINCT, joins, and window functions.

Note:

SQL memory size applies a limit globally to all sessions at any point in time. Certain disk-spilling operations also respect a memory limit that applies locally to a single operation within a single query. This limit is configured via a separate cluster setting. For details, see Disk-spilling operations.

If a node runs out of its allocated SQL memory, a memory budget exceeded error occurs and the cockroach process may be at risk of crashing due to an out-of-memory (OOM) error. To mitigate this issue, refer to `memory budget exceeded.

To manually increase a node's cache size and SQL memory size, start the node using the --cache and --max-sql-memory flags. As long as all machines are provisioned with sufficient RAM, you can experiment with increasing each value up to 35%.

cockroach start --cache=.35 --max-sql-memory=.35 {other start flags}

The default value for --cache is 128 MiB. For production deployments, set --cache to 25% or higher. To determine appropriate settings for --cache and --max-sql-memory, use the following formula:

(2 * --max-sql-memory) + --cache <= 80% of system RAM

To help guard against OOM events, CockroachDB sets a soft memory limit using mechanisms in Go. Depending on your hardware and workload, you may not need to manually tune --max-sql-memory.

Test the configuration with a reasonable workload before deploying it to production.

Note:

On startup, if CockroachDB detects that --max-sql-memory or --cache are set too aggressively, a warning is logged.

Because CockroachDB manages its own memory caches, Cockroach Labs recommends that you disable Linux memory swapping or allocate sufficient RAM to each node to prevent the node from running low on memory. Writing to swap is significantly less performant than writing to memory.

Dependencies

The CockroachDB binary for Linux depends on the following libraries:

Library	Description
`glibc`	The standard C library. If you build CockroachDB from source, rather than use the official CockroachDB binary for Linux, you can use any C standard library, including MUSL, the C standard library used on Alpine.
`libncurses`	Required by the built-in SQL shell.
`tzdata`	Required by certain features of CockroachDB that use time zone data, for example, to support using location-based names as time zone identifiers. This library is sometimes called `tz` or `zoneinfo`. The CockroachDB binary now includes Go's copy of the `tzdata` library. If CockroachDB cannot find the `tzdata` library externally, it will use this built-in copy.

These libraries are found by default on nearly all Linux distributions, with Alpine as the notable exception, but it's nevertheless important to confirm that they are installed and kept up-to-date. For the time zone data in particular, it's important for all nodes to have the same version; when updating the library, do so as quickly as possible across all nodes.

Note:

In Docker-based deployments of CockroachDB, these dependencies do not need to be manually addressed. The Docker image for CockroachDB includes them and keeps them up to date with each release of CockroachDB.

File descriptors limit

CockroachDB can use a large number of open file descriptors, often more than is available by default. Therefore, note the following recommendations.

For each CockroachDB node:

At a minimum, the file descriptors limit must be 1956 (1700 per store plus 256 for networking). If the limit is below this threshold, the node will not start.
It is recommended to set the file descriptors limit to unlimited; otherwise, the recommended limit is at least 15000 (10000 per store plus 5000 for networking). This higher limit ensures performance and accommodates cluster growth.
When the file descriptors limit is not high enough to allocate the recommended amounts, CockroachDB allocates 10000 per store and the rest for networking; if this would result in networking getting less than 256, CockroachDB instead allocates 256 for networking and evenly splits the rest across stores.

Yosemite and later

To adjust the file descriptors limit for a single process in Mac OS X Yosemite and later, you must create a property list configuration file with the hard limit set to the recommendation mentioned above. Note that CockroachDB always uses the hard limit, so it's not technically necessary to adjust the soft limit, although we do so in the steps below.

For example, for a node with 3 stores, we would set the hard limit to at least 35000 (10000 per store and 5000 for networking) as follows:

Check the current limits:
```
$ launchctl limit maxfiles
```
```
maxfiles    10240          10240
```
The last two columns are the soft and hard limits, respectively. If unlimited is listed as the hard limit, note that the hidden default limit for a single process is actually 10240.

Create /Library/LaunchDaemons/limit.maxfiles.plist and add the following contents, with the final strings in the ProgramArguments array set to 35000:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
  <plist version="1.0">
    <dict>
      <key>Label</key>
        <string>limit.maxfiles</string>
      <key>ProgramArguments</key>
        <array>
          <string>launchctl</string>
          <string>limit</string>
          <string>maxfiles</string>
          <string>35000</string>
          <string>35000</string>
        </array>
      <key>RunAtLoad</key>
        <true/>
      <key>ServiceIPC</key>
        <false/>
    </dict>
  </plist>

Make sure the plist file is owned by root:wheel and has permissions -rw-r--r--. These permissions should be in place by default.

Restart the system for the new limits to take effect.
Check the current limits:
```
$ launchctl limit maxfiles
```
```
maxfiles    35000          35000
```

Older versions

To adjust the file descriptors limit for a single process in OS X versions earlier than Yosemite, edit /etc/launchd.conf and increase the hard limit to the recommendation mentioned above. Note that CockroachDB always uses the hard limit, so it's not technically necessary to adjust the soft limit, although we do so in the steps below.

For example, for a node with 3 stores, we would set the hard limit to at least 35000 (10000 per store and 5000 for networking) as follows:

Check the current limits:
```
$ launchctl limit maxfiles
```
```
maxfiles    10240          10240
```
The last two columns are the soft and hard limits, respectively. If unlimited is listed as the hard limit, note that the hidden default limit for a single process is actually 10240.
Edit (or create) /etc/launchd.conf and add a line that looks like the following, with the last value set to the new hard limit:
```
limit maxfiles 35000 35000
```
Save the file, and restart the system for the new limits to take effect.
Verify the new limits:
```
$ launchctl limit maxfiles
```
```
maxfiles    35000          35000
```

Per-Process Limit
System-Wide Limit

Per-Process Limit

To adjust the file descriptors limit for a single process on Linux, enable PAM user limits and set the hard limit to the recommendation mentioned above. Note that CockroachDB always uses the hard limit, so it's not technically necessary to adjust the soft limit, although we do so in the steps below.

For example, for a node with 3 stores, we would set the hard limit to at least 35000 (10000 per store and 5000 for networking) as follows:

Make sure the following line is present in both /etc/pam.d/common-session and /etc/pam.d/common-session-noninteractive:
```
session    required   pam_limits.so
```
Set a limit for the number of open file descriptors. The specific limit you set depends on your workload and the hardware and configuration of your nodes.
- If you use systemd, manually-set limits set using the ulimit command or a configuration file like /etc/limits.conf are ignored for services started by systemd. To limit the number of open file descriptors, add a line like the following to the service definition for the cockroach process. To allow an unlimited number of files, you can optionally set LimitNOFILE to INFINITY. Cockroach Labs recommends that you carefully test this configuration with a realistic workload before deploying it in production.
```
LimitNOFILE=35000
```
  Reload systemd for the new limit to take effect:
```
systemctl daemon-reload
```
- If you do not use systemd: Edit /etc/security/limits.conf and append the following lines to the file:
```
*              soft     nofile          35000
*              hard     nofile          35000
```
  The * can be replaced with the username that will start CockroachDB.
  
  Save and close the file, then restart the system for the new limits to take effect. After the system restarts, verify the new limits:
```
ulimit -a
```

System-Wide Limit

You should also confirm that the file descriptors limit for the entire Linux system is at least 10 times higher than the per-process limit documented above (e.g., at least 150000).

If you use systemd, add a line like the following to the service definition for the Manager service. To allow an unlimited number of files, set LimitNOFILE to INFINITY.
```
LimitNOFILE=35000
```
Reload systemd for the new limit to take effect:
```
systemctl daemon-reload
```
If you do not use systemd:
1. Check the system-wide limit:
```
cat /proc/sys/fs/file-max
```
2. If necessary, increase the system-wide limit in the proc file system:
```
echo 150000 > /proc/sys/fs/file-max
```

CockroachDB for Windows is experimental and not supported in production. To learn about configuring limits on Windows, refer to the Microsoft community blog post Pushing the Limits of Windows: Handles.

CockroachDB does not yet provide a Windows binary. Once that's available, we will also provide documentation on adjusting the file descriptors limit on Windows.

Attributions

This section, "File Descriptors Limit", is in part derivative of the chapter Open File Limits From the Riak LV 2.1.4 documentation, used under Creative Commons Attribution 3.0 Unported License.

Orchestration / Kubernetes

When running CockroachDB on Kubernetes, making the following minimal customizations will result in better, more reliable performance:

Use SSDs instead of traditional HDDs.
Configure CPU and memory resource requests and limits.

For more information and additional customization suggestions, see our full detailed guide to CockroachDB Performance on Kubernetes.

Transaction retries

When several transactions try to modify the same underlying data concurrently, they may experience contention that leads to transaction retries. To avoid failures in production, your application should be engineered to handle transaction retries using client-side retry handling.

Was this helpful?