Skip to main content

· 2 min read

This is a step by step example on how to start using Python with ClickHouse Cloud service.

note

Keep in mind that Python versions and libraries dependencies are constantly evolving. Make also sure to use the latest supported versions of both the driver and Python environment when trying this.

At the time of writing this article, we're using the clickhouse-connect driver version 0.5.23 and python 3.11.2 respectively.

Steps

  1. Check the Python version:
$  python -V
Python 3.11.2
  1. We'll assemble the project in a folder called ch-python:
$ mkdir ch-python
$ cd ch-python
  1. Create a dependencies file named requirements.txt with:
clickhouse-connect==0.5.23
  1. Create a python source file named main.py:
import clickhouse_connect
import sys
import json

CLICKHOUSE_CLOUD_HOSTNAME = 'HOSTNAME.clickhouse.cloud'
CLICKHOUSE_CLOUD_USER = 'default'
CLICKHOUSE_CLOUD_PASSWORD = 'YOUR_SECRET_PASSWORD'

client = clickhouse_connect.get_client(
host=CLICKHOUSE_CLOUD_HOSTNAME, port=8443, username=CLICKHOUSE_CLOUD_USER, password=CLICKHOUSE_CLOUD_PASSWORD)

print("connected to " + CLICKHOUSE_CLOUD_HOSTNAME + "\n")
client.command(
'CREATE TABLE IF NOT EXISTS new_table (key UInt32, value String, metric Float64) ENGINE MergeTree ORDER BY key')

print("table new_table created or exists already!\n")

row1 = [1000, 'String Value 1000', 5.233]
row2 = [2000, 'String Value 2000', -107.04]
data = [row1, row2]
client.insert('new_table', data, column_names=['key', 'value', 'metric'])

print("written 2 rows to table new_table\n")

QUERY = "SELECT max(key), avg(metric) FROM new_table"

result = client.query(QUERY)

sys.stdout.write("query: ["+QUERY + "] returns:\n\n")
print(result.result_rows)
  1. Create the virtual environment:
chpython$ python -m venv venv
  1. Load the virtual environment:
chpython$ source venv/bin/activate

Once loaded, your terminal prompt should be prefixed with (venv), install dependencies:

(venv) ➜  chpython$ pip install -r requirements.txt
Collecting certifi
Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Collecting urllib3>=1.26
Using cached urllib3-2.0.2-py3-none-any.whl (123 kB)
Collecting pytz
Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB)
Collecting zstandard
Using cached zstandard-0.21.0-cp311-cp311-macosx_11_0_arm64.whl (364 kB)
Collecting lz4
Using cached lz4-4.3.2-cp311-cp311-macosx_11_0_arm64.whl (212 kB)
Installing collected packages: pytz, zstandard, urllib3, lz4, certifi, clickhouse-connect
Successfully installed certifi-2023.5.7 clickhouse-connect-0.5.23 lz4-4.3.2 pytz-2023.3 urllib3-2.0.2 zstandard-0.21.0
  1. Launch the code!
(venv) chpython$ venv/bin/python main.py

connected to HOSTNAME.clickhouse.cloud

table new_table created or exists already!

written 2 rows to table new_table

query: [SELECT max(key), avg(metric) FROM new_table] returns:

[(2000, -50.9035)]
tip

If using an older Python version (e.g. 3.9.6) you might be getting an ImportError related to urllib3 library. In that case either upgrade your Python environment to a newer version or pin the urllib3 version to 1.26.15 in your requirements.txt file.

· 5 min read

How can I use API to manage clusters on ClickHouse Cloud?

Answer

We will use Terraform to configure our infra and ClickHouse Provider

Steps:

1). Create an API Key on Cloud. Follow the docs here - https://clickhouse.com/docs/en/cloud/manage/openapi

Save the creds locally.

2). Install Terraform using - https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli

You can use Homebrew package manager if you're on Mac.

3). Create a directory anywhere you like:

mkdir test
➜ test pwd
/Users/jaijhala/Desktop/terraform/test

4). Create 2 files: main.tf and secret.tfvars

Copy the following:

main.tf file would be:

terraform {
required_providers {
clickhouse = {
source = "ClickHouse/clickhouse"
version = "0.0.2"
}
}
}

variable "organization_id" {
type = string
}

variable "token_key" {
type = string
}

variable "token_secret" {
type = string
}

provider clickhouse {
environment = "production"
organization_id = var.organization_id
token_key = var.token_key
token_secret = var.token_secret
}


variable "service_password" {
type = string
sensitive = true
}

resource "clickhouse_service" "service123" {
name = "jai-terraform"
cloud_provider = "aws"
region = "us-east-2"
tier = "development"
idle_scaling = true
password = var.service_password
ip_access = [
{
source = "0.0.0.0/0"
description = "Anywhere"
}
]
}

output "CLICKHOUSE_HOST" {
value = clickhouse_service.service123.endpoints.0.host
}

You can replace your own parameters like service name, region etc.. in the resources section above.

secret.tfvars is where you'll put all the API Key related info that you downloaded earlier. The idea behind this file is that all your secret credentials will be hidden from the main config file.

It would be something like (replace these parameters):

organization_id = "e957a5f7-4qe3-4b05-ad5a-d02b2dcd0593"
token_key = "QWhhkMeytqQruTeKg"
token_secret = "4b1dNmjWdLUno9lXxmKvSUcPP62jvn7irkuZPbY"
service_password = "password123!"

5). Run terraform init from this directory

Expected output:

Initializing the backend...

Initializing provider plugins...
- Finding clickhouse/clickhouse versions matching "0.0.2"...
- Installing clickhouse/clickhouse v0.0.2...
- Installed clickhouse/clickhouse v0.0.2 (self-signed, key ID D7089EE5C6A92ED1)

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

6). Run terraform apply -var-file=secret.tfvars command.

Something like:

➜  test terraform apply -var-file=secret.tfvars

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
+ create

Terraform will perform the following actions:

# clickhouse_service.service123 will be created
+ resource "clickhouse_service" "service123" {
+ cloud_provider = "aws"
+ endpoints = (known after apply)
+ id = (known after apply)
+ idle_scaling = true
+ ip_access = [
+ {
+ description = "Anywhere"
+ source = "0.0.0.0/0"
},
]
+ last_updated = (known after apply)
+ name = "jai-terraform"
+ password = (sensitive value)
+ region = "us-east-2"
+ tier = "development"
}

Plan: 1 to add, 0 to change, 0 to destroy.

Changes to Outputs:
+ CLICKHOUSE_HOST = (known after apply)

Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value: yes

Type yes and hit enter

Side note: Notice it says password = (sensitive value) above. This is because we set sensitive = true for the password in the main.tf file.

7). It will take a couple of mins to create the service but eventually it should come up like:

  Enter a value: yes

clickhouse_service.service123: Creating...
clickhouse_service.service123: Still creating... [10s elapsed]
clickhouse_service.service123: Still creating... [20s elapsed]
clickhouse_service.service123: Still creating... [30s elapsed]
clickhouse_service.service123: Still creating... [40s elapsed]
clickhouse_service.service123: Still creating... [50s elapsed]
clickhouse_service.service123: Still creating... [1m0s elapsed]
clickhouse_service.service123: Still creating... [1m10s elapsed]
clickhouse_service.service123: Still creating... [1m20s elapsed]
clickhouse_service.service123: Still creating... [1m30s elapsed]
clickhouse_service.service123: Still creating... [1m40s elapsed]
clickhouse_service.service123: Creation complete after 1m41s [id=aa8d8d63-1878-4600-8470-630715af38ed]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

CLICKHOUSE_HOST = "h3ljlaqez6.us-east-2.aws.clickhouse.cloud"
➜ test

8). Check Cloud Console, you should be able to see the service created.

9). To clean up/destroy the service again, run terraform destroy -var-file=secret.tfvars

Something like:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with
the following symbols:
- destroy

Terraform will perform the following actions:

# clickhouse_service.service123 will be destroyed
- resource "clickhouse_service" "service123" {
- cloud_provider = "aws" -> null
- ............

Plan: 0 to add, 0 to change, 1 to destroy.

Changes to Outputs:
- CLICKHOUSE_HOST = "h3ljlaqez6.us-east-2.aws.clickhouse.cloud" -> null

Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.

Enter a value:

Type yes and hit enter

10).

clickhouse_service.service123: Destroying... [id=aa8d8d63-1878-4600-8470-630715af38ed]
clickhouse_service.service123: Still destroying... [id=aa8d8d63-1878-4600-8470-630715af38ed, 10s elapsed]
clickhouse_service.service123: Still destroying... [id=aa8d8d63-1878-4600-8470-630715af38ed, 20s elapsed]
clickhouse_service.service123: Destruction complete after 27s

Destroy complete! Resources: 1 destroyed.

And it should be gone from the Cloud Console.

More details about the Cloud API can be found here - https://clickhouse.com/docs/en/cloud/manage/api/api-overview

· One min read

Question

How do I change my Billing Contact in ClickHouse Cloud?

Answer

To change the Billing Contact as an Admin, follow the steps below:

  1. Invite a new user as an Admin to the Cloud Organization.
  2. Once they accept the invite, go into the ClickHouse Cloud Console billing page (Admin->Billing) and find the section on "Billing contacts".
  3. Use the Edit button to select the new Admin user as a Billing Contact.

· 3 min read

Question

How to check users assigned to roles and viceversa?

Answer

-- LOGGED IN AS default (admin privileges)

clickhouse-cloud :) SELECT user()

SELECT user()

Query id: 9bc02d8b-ab05-4a63-b2dd-3e0093f36d31

┌─currentUser()─┐
default
└───────────────┘

1 row in set. Elapsed: 0.001 sec.



-- create user 'foo'

clickhouse-cloud :) CREATE USER foo IDENTIFIED WITH sha256_password BY 'secretPassword123!'

CREATE USER foo IDENTIFIED WITH sha256_hash BY '4338B66A5F04244574CB9C872829F1FD8F696C658EC7A4BD22FEFBBCF331B665' SALT 'C2911CA1E4787227BBD0EBEF43066EF2EC4C54172C1AB3616E88050F2EC13475'

Query id: 9711f5fc-2b5c-43f0-a760-0c67764919a2

Ok.

0 rows in set. Elapsed: 0.102 sec.



-- create user 'bar'

clickhouse-cloud :) CREATE USER bar IDENTIFIED WITH sha256_password BY 'secretPassword123!'

CREATE USER bar IDENTIFIED WITH sha256_hash BY '14A1401822566260191F51BAE85C4740E650E1F9D02DEFFF086CD6A6A8B3164F' SALT '276AE4A32353D579894C83C230775568E501CCD696531EEF0006761D3BEE3F75'

Query id: 11a78bf5-f5e1-4f1d-bfe8-cf2aa0a1b15d

Ok.

0 rows in set. Elapsed: 0.103 sec.



-- create role 'role_a'

clickhouse-cloud :) CREATE ROLE role_a;

CREATE ROLE role_a

Query id: 13ccc007-fa5a-4110-9a05-48e284cea45f

Ok.

0 rows in set. Elapsed: 0.104 sec.



-- create role 'role_b'

clickhouse-cloud :) CREATE ROLE role_b;

CREATE ROLE role_b

Query id: 43f84376-76fa-4cd2-b8e2-2dcfbe41ec1b

Ok.

0 rows in set. Elapsed: 0.103 sec.



-- grant 'role_a' to users 'foo' and 'bar'


clickhouse-cloud :) GRANT role_a to foo,bar

GRANT role_a TO foo, bar

Query id: 4fe91624-efb3-4091-b680-b6905ab445b4

Ok.

0 rows in set. Elapsed: 0.107 sec.



-- grant 'role_b' to user 'bar'

clickhouse-cloud :) GRANT role_b TO bar

GRANT role_b TO bar

Query id: 7ea38b28-2719-4dd6-8abd-0241f7b34d5c

Ok.

0 rows in set. Elapsed: 0.102 sec.



-- What users have assigned 'role_a'?

clickhouse-cloud :) SELECT * FROM system.role_grants WHERE granted_role_name='role_a';

SELECT *
FROM system.role_grants
WHERE granted_role_name = 'role_a'

Query id: bf088776-f450-4150-b2e8-197b400573c1

┌─user_name─┬─role_name─┬─granted_role_name─┬─granted_role_is_default─┬─with_admin_option─┐
│ bar │ ᴺᵁᴸᴸ │ role_a │ 10
│ foo │ ᴺᵁᴸᴸ │ role_a │ 10
└───────────┴───────────┴───────────────────┴─────────────────────────┴───────────────────┘

2 rows in set. Elapsed: 0.001 sec.



-- What roles are assigned to users 'foo' and 'bar'?

clickhouse-cloud :) SELECT * FROM system.role_grants WHERE user_name IN ('foo','bar');

SELECT *
FROM system.role_grants
WHERE user_name IN ('foo', 'bar')

Query id: b81dbe1c-42f0-43bd-b237-1a6b1d81ae3d

┌─user_name─┬─role_name─┬─granted_role_name─┬─granted_role_is_default─┬─with_admin_option─┐
│ bar │ ᴺᵁᴸᴸ │ role_b │ 10
│ bar │ ᴺᵁᴸᴸ │ role_a │ 10
│ foo │ ᴺᵁᴸᴸ │ role_a │ 10
└───────────┴───────────┴───────────────────┴─────────────────────────┴───────────────────┘

3 rows in set. Elapsed: 0.001 sec.



-- logged in as user 'foo'

clickhouse-cloud :) SELECT user()

SELECT user()

Query id: eee6eaaa-11bc-42c1-9258-fa3079ee6f80

┌─currentUser()─┐
│ foo │
└───────────────┘

1 row in set. Elapsed: 0.001 sec.

clickhouse-cloud :) SHOW CURRENT ROLES

SHOW CURRENT ROLES

Query id: aa6a1ac1-3502-4960-bb34-f7d9f0d7986e

┌─role_name─┬─with_admin_option─┬─is_default─┐
│ role_a │ 01
└───────────┴───────────────────┴────────────┘

1 row in set. Elapsed: 0.002 sec.



-- logged in as user 'bar'

clickhouse-cloud :) SELECT user()

SELECT user()

Query id: fa9ba47f-efcf-4491-9b4e-2f1130dfa84b

┌─currentUser()─┐
│ bar │
└───────────────┘

1 row in set. Elapsed: 0.001 sec.

clickhouse-cloud :) SHOW CURRENT ROLES

SHOW CURRENT ROLES

Query id: fb3f2941-a8ce-481d-8fad-b775bfc5b532

┌─role_name─┬─with_admin_option─┬─is_default─┐
│ role_a │ 01
│ role_b │ 01
└───────────┴───────────────────┴────────────┘

2 rows in set. Elapsed: 0.001 sec.

· One min read

When you try to alter a user's settings, you may encounter the above exception. Here are a few options to troubleshoot this error:

Edit users.xml directly

You can edit or add the desired settings for a specific user in users.xml directly in the file /etc/clickhouse-server/users.d.

Read more about users.xml here.

Create another user

You can create another user with the specified settings, then connect to ClickHouse using that new user.

View this page to learn how to create users.

Enable SQL-driven access control

You can enable SQL-drive access control and account management for the default user. The steps to enable this are specified in this page.

· One min read

ClickHouse and ClickHouse Cloud both support row and column-level access restrictions for read-only users necessary to achieve a role-based access control (RBAC) model.

Row Policies can be used to specify which rows will be returned to a read-only user when they query a table. ClickHouse Cloud is configured to enable the SQL-driven workflow by default. To leverage this workflow CREATE a user, GRANT the user privileges on a table, then set the appropriate ROW POLICY. When the user performs a SELECT * FROM table, only rows allowed by the policy will be displayed.

Column-level restrictions may be specified directly using the GRANT statement to enable table-level access for users and roles. Users may only include columns for which they have access in a query. Selecting restricted columns from a table in which the user does not have access to all the specified columns, such as in the query SELECT * FROM table, will return an error stating the user has insufficient permissions.

· 2 min read

Parametrised views can be handy to slice and dice data on the fly based on some parameters that can be fed at query execution time.

See this basic example:

1) create a table

clickhouse-cloud :) CREATE TABLE raw_data (id UInt8, data String) ENGINE = MergeTree ORDER BY id

CREATE TABLE raw_data
(
`id` UInt8,
`data` String
)
ENGINE = MergeTree
ORDER BY id

Query id: aa21e614-1e10-4bba-88ce-4c7183a9148e

Ok.

0 rows in set. Elapsed: 0.332 sec.

2) insert some sample random data

clickhouse-cloud :) INSERT INTO raw_data SELECT * FROM generateRandom('`id` UInt8,
`data` String',1,1) LIMIT 1000000;

INSERT INTO raw_data SELECT *
FROM generateRandom('`id` UInt8,
`data` String', 1, 1)
LIMIT 1000000

Query id: c552a34a-b72f-45e1-bed0-778923e1b5c9

Ok.

0 rows in set. Elapsed: 0.438 sec. Processed 1.05 million rows, 10.99 MB (2.39 million rows/s., 25.11 MB/s.)

3) create the parametrised view:

clickhouse-cloud :) CREATE VIEW raw_data_parametrized AS SELECT * FROM raw_data WHERE id BETWEEN {id_from:UInt8} AND {id_to:UInt8}

CREATE VIEW raw_data_parametrized AS
SELECT *
FROM raw_data
WHERE (id >= {id_from:UInt8}) AND (id <= {id_to:UInt8})

Query id: 45fb83a6-aa55-4197-a7cd-9e1ad2c76d48

Ok.

0 rows in set. Elapsed: 0.102 sec.

4) query the parametrised view by feeding the expected parameters in your FROM clause:

clickhouse-cloud :) SELECT count() FROM raw_data_parametrized(id_from=0, id_to=50000);

SELECT count()
FROM raw_data_parametrized(id_from = 0, id_to = 50000)

Query id: 5731aae1-3e68-4e63-b57f-d50f29055744

┌─count()─┐
317019
└─────────┘

1 row in set. Elapsed: 0.004 sec. Processed 319.49 thousand rows, 319.49 KB (76.29 million rows/s., 76.29 MB/s.)

For more info, please refer to https://clickhouse.com/docs/en/sql-reference/statements/create/view#parameterized-view

· One min read

Question

When executing a INSERT...SELECT statement, I am getting too many parts (TOO_MANY_PARTS) error.

How can I solve this?

Answer

Below are some of the settings to tune to avoid this error, this is expert level tuning of ClickHouse and these values should be set only after understanding the specifications of the ClickHouse cloud service or on-prem cluster where these will be used, so do not take these values as "one size fits all".

max_insert_block_size = 100_000_000 (default 1_048_576)

Increase from ~1M to 100M would allow larger blocks to form

Note: This setting only applies when the server forms the blocks. i.e. INSERT via the HTTP interface, and not for clickhouse-client

min_insert_block_size_rows = 100_000_000 (default 1_048_576)

Increase from ~1M to 100M would allow larger blocks to form.

min_insert_block_size_bytes = 500_000_000 (default 268_435_456)

Increase from 268.44 MB to 500 MB would allow larger blocks to form.

parts_to_delay_insert = 500 (default 150)

Increasing this so that INSERTs are not artificially slowed down when the number of active parts in a single partition is reached.

parts_to_throw_insert = 1500 (default 300)

Increasing this would generally affect query performance to the table, but this would be fine for data migration.

· One min read

Question

How to create a ClickHouse dictionary using string keys and string values from a MergeTree table source

Answer

  • Create the source table for the dictionary
CREATE TABLE db1.table1_dict_source
(
id UInt32,
email String,
name String
)
ENGINE = MergeTree()
ORDER BY id;
  • Insert rows
INSERT INTO db1.table1_dict_source
(id, email, name)
VALUES
(1, 'me@domain.com', 'me'),
(2, 'you@domain.com', 'you');
  • Create dictionary with key/value both as String
CREATE DICTIONARY db1.table1_dict
(
email String,
name String
)
PRIMARY KEY email
SOURCE(
CLICKHOUSE(
TABLE 'table1_dict_source'
USER 'default'
PASSWORD 'ClickHouse123!'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 0 MAX 1000);
  • Test the dictionary
clickhouse-cloud :) SELECT * from db1.table1_dict;

SELECT *
FROM db1.table1_dict

Query id: 098396ce-11dd-4c71-a0e1-40723dd67ddc

┌─email──────────┬─name─┐
│ me@domain.com │ me │
│ you@domain.com │ you │
└────────────────┴──────┘

2 rows in set. Elapsed: 0.001 sec.

You can also use dictGet function to retrieve values from it such as:

SELECT dictGet('db1.table1_dict', 'name', 'me@domain.com');

Response:

┌─dictGet('db1.table1_dict', 'name', 'me@domain.com')─┐
│ me │
└─────────────────────────────────────────────────────┘

More details - https://clickhouse.com/docs/en/sql-reference/functions/ext-dict-functions

· 2 min read

Question

I see other vendors providing their own builds of ClickHouse. What is the difference between official ClickHouse builds and these 3rd-party builds?

Answer

Here are some of the differences we have observed with other builds:

  • The strings "official" are replaced with the name of the vendor
  • They appear after several months of delay and don't include recent bug fixes, which means these builds can contain vulnerabilities that have been fixed in the official versions
  • The builds are not bit-identical, and the addresses in the code are different. As a result, stack traces from these builds cannot be analyzed, and the ClickHouse team cannot answer questions about these builds
  • The builds are not auditable or reproducible - there is no publicly accessible CI system with the same build logs
  • The ClickHouse test suite is not run on these builds, so they are not verified to work by the test suite
  • They might not be available for all architectures (like ARM, etc.)
  • Sometimes they include patches targeted for one particular customer that can break compatibility and introduce extra risk

We recommend running the latest version of ClickHouse using the official builds following the install instructions in the documentation:

  • We release a stable version every month, and three latest stable releases are supported in terms of diagnostics and backporting of bug fixes.
  • We also release a long-term support (LTS) version twice a year that is supported for a year after its initial release, which is really only meant for companies that do not allow for frequent upgrades or using non-LTS software. (We are big fans of the monthly stable builds!)

We have more details between stable vs. LTS releases in the docs.