Skip to main content

Transactional (ACID) support

Case 1: INSERT into one partition, of one table, of the MergeTree* family

This is transactional (ACID) if the inserted rows are packed and inserted as a single block (see Notes):

  • Atomic: an INSERT succeeds or is rejected as a whole: if a confirmation is sent to the client, then all rows were inserted; if an error is sent to the client, then no rows were inserted.
  • Consistent: if there are no table constraints violated, then all rows in an INSERT are inserted and the INSERT succeeds; if constraints are violated, then no rows are inserted.
  • Isolated: concurrent clients observe a consistent snapshot of the table–the state of the table either as it was before the INSERT attempt, or after the successful INSERT; no partial state is seen
  • Durable: a successful INSERT is written to the filesystem before answering to the client, on a single replica or multiple replicas (controlled by the insert_quorum setting), and ClickHouse can ask the OS to sync the filesystem data on the storage media (controlled by the fsync_after_insert setting).
  • INSERT into multiple tables with one statement is possible if materialized views are involved (the INSERT from the client is to a table which has associate materialized views).

Case 2: INSERT into multiple partitions, of one table, of the MergeTree* family

Same as Case 1 above, with this detail:

  • If table has many partitions and INSERT covers many partitions–then insertion into every partition is transactional on its own

Case 3: INSERT into one distributed table of the MergeTree* family

Same as Case 1 above, with this detail:

  • INSERT into Distributed table is not transactional as a whole, while insertion into every shard is transactional

Case 4: Using a Buffer table

  • insert into Buffer tables is neither atomic nor isolated nor consistent nor durable

Case 5: Using async_insert

Same as Case 1 above, with this detail:

  • atomicity is ensured even if async_insert is enabled and wait_for_async_insert is set to 1 (the default), but if wait_for_async_insert is set to 0, then atomicity is not ensured.

Notes

  • rows inserted from the client in some data format are packed into a single block when:
    • the insert format is row-based (like CSV, TSV, Values, JSONEachRow, etc) and the data contains less then max_insert_block_size rows (~1 000 000 by default) or less then min_chunk_bytes_for_parallel_parsing bytes (10 MB by default) in case of parallel parsing is used (enabled by default)
    • the insert format is column-based (like Native, Parquet, ORC, etc) and the data contains only one block of data
  • the size of the inserted block in general may depend on many settings (for example: max_block_size, max_insert_block_size, min_insert_block_size_rows, min_insert_block_size_bytes, preferred_block_size_bytes, etc)
  • if the client did not receive an answer from the server, the client does not know if the transaction succeeded, and it can repeat the transaction, using exactly-once insertion properties
  • ClickHouse is using MVCC with snapshot isolation internally
  • all ACID properties are valid even in the case of server kill/crash
  • either insert_quorum into different AZ or fsync should be enabled to ensure durable inserts in the typical setup
  • "consistency" in ACID terms does not cover the semantics of distributed systems, see https://jepsen.io/consistency which is controlled by different settings (select_sequential_consistency)
  • this explanation does not cover a new transactions feature that allow to have full-featured transactions over multiple tables, materialized views, for multiple SELECTs, etc. (see the next section on Transactions, Commit, and Rollback)

Transactions, Commit, and Rollback

In addition to the functionality described at the top of this document, ClickHouse has experimental support for transactions, commits, and rollback functionality.

Requirements

  • Deploy ClickHouse Keeper or ZooKeeper to track transactions
  • Atomic DB only (Default)
  • Non-Replicated MergeTree table engine only
  • Enable experimental transaction support by adding this setting in config.d/transactions.xml:
    <clickhouse>
    <allow_experimental_transactions>1</allow_experimental_transactions>
    </clickhouse>

Notes

  • This is an experimental feature, and changes should be expected.
  • If an exception occurs during a transaction, you cannot commit the transaction. This includes all exceptions, including UNKNOWN_FUNCTION exceptions caused by typos.
  • Nested transactions are not supported; finish the current transaction and start a new one instead

Configuration

These examples are with a single node ClickHouse server with ClickHouse Keeper enabled.

Enable experimental transaction support

/etc/clickhouse-server/config.d/transactions.xml
<clickhouse>
<allow_experimental_transactions>1</allow_experimental_transactions>
</clickhouse>

Basic configuration for a single ClickHouse server node with ClickHouse Keeper enabled

note

See the deployment documentation for details on deploying ClickHouse server and a proper quorum of ClickHouse Keeper nodes. The configuration shown here is for experimental purposes.

/etc/clickhouse-server/config.d/config.xml
<clickhouse replace="true">
<logger>
<level>debug</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>3</count>
</logger>
<display_name>node 1</display_name>
<listen_host>0.0.0.0</listen_host>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<zookeeper>
<node>
<host>clickhouse-01</host>
<port>9181</port>
</node>
</zookeeper>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>information</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>clickhouse-keeper-01</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>

Example

Verify that experimental transactions are enabled

Issue a BEGIN TRANSACTION followed by a ROLLBACK to verify that experimental transactions are enabled, and that ClickHouse Keeper is enabled as it is used to track transactions.

BEGIN TRANSACTION
Ok.
tip

If you see the following error, then check your configuration file to make sure that allow_experimental_transactions is set to 1 (or any value other than 0 or false).

Code: 48. DB::Exception: Received from localhost:9000.
DB::Exception: Transactions are not supported.
(NOT_IMPLEMENTED)

You can also check ClickHouse Keeper by issuing

echo ruok | nc localhost 9181

ClickHouse Keeper should respond with imok.

ROLLBACK
Ok.

Create a table for testing

tip

Creation of tables is not transactional. Run this DDL query outside of a transaction.

CREATE TABLE mergetree_table
(
`n` Int64
)
ENGINE = MergeTree
ORDER BY n
Ok.

Begin a transaction and insert a row

BEGIN TRANSACTION
Ok.
INSERT INTO mergetree_table FORMAT Values (10)
Ok.
SELECT *
FROM mergetree_table
┌──n─┐
│ 10 │
└────┘
note

You can query the table from within a transaction and see that the row was inserted even though it has not yet been committed.

Rollback the transaction, and query the table again

Verify that the transaction is rolled back:

ROLLBACK
Ok.
SELECT *
FROM mergetree_table
Ok.

0 rows in set. Elapsed: 0.002 sec.

Complete a transaction and query the table again

BEGIN TRANSACTION
Ok.
INSERT INTO mergetree_table FORMAT Values (42)
Ok.
COMMIT
Ok. Elapsed: 0.002 sec.
SELECT *
FROM mergetree_table
┌──n─┐
│ 42 │
└────┘

Transactions introspection

You can inspect transactions by querying the system.transactions table, but note that you cannot query that table from a session that is in a transaction–open a second clickhouse client session to query that table.

SELECT *
FROM system.transactions
FORMAT Vertical
Row 1:
──────
tid: (33,61,'51e60bce-6b82-4732-9e1d-b40705ae9ab8')
tid_hash: 11240433987908122467
elapsed: 210.017820947
is_readonly: 1
state: RUNNING

More Details

See this meta issue to find much more extensive tests and to keep up to date with the progress.