Skip to main content

· One min read

When a user-level setting is specified in the wrong place, the server won't start and an exception message is sent to the log. However, you can tell ClickHouse to ignore the incorrect setting using the skip_check_for_incorrect_settings setting:

Add the following to config.xml:

<skip_check_for_incorrect_settings>1</skip_check_for_incorrect_settings>
Note

User-level settings should be specified in users.xml inside a <profile> section for the specific user profile, (or in <default> for default settings.

· One min read

A client can view the server logs - even at a different level than what the server log level is configured to - by setting the send_logs_level client setting.

For example, suppose the client runs:

SET send_logs_level = 'trace';

The client will receive trace logs even if the server has log level set to info.

One useful scenario is to use send_logs_level to monitor the insertion of rows into a Distributed table:

  • Enable logs in clickhouse-client using SET send_logs_level = 'trace';
  • Run your INSERT query
  • Inserts into a distributed table are asynchronous by default. The data is written into a local buffer on disk, then sent to remote servers in background.
  • Logs will be sent from all nodes participating in the query processing (distributed tracing)

To check the status of distributed inserts, check the system.distribution_queue table. This table contains information about local files that are in the queue to be sent to the shards. These local files contain new parts that are created by inserting new data into the Distributed table in asynchronous mode.

· 7 min read

To compare metrics between two queries, you must first obtain the query_id for both queries.

You can then run the following query:

WITH
initial_query_id = '9bc1b438-542e-450a-8284-0e3e2b3739c8' AS second,
initial_query_id = '4ce13192-2849-48f9-b426-e04790c4f4aa' AS first
SELECT
PE.1 AS metric,
sumIf(PE.2, first) AS v1,
sumIf(PE.2, second) AS v2,
10 * log10(v2 / v1) AS dB,
round(((v2 - v1) / if(v2 > v1, v2, v1)) * 100, 2) AS perc,
bar(abs(perc), 0, 100, 33) AS bar
FROM clusterAllReplicas(default, system.query_log)
ARRAY JOIN ProfileEvents AS PE
WHERE (first OR second) AND (event_date >= (today() - 3)) AND (type = 2)
GROUP BY metric
HAVING (v1 != v2) AND (abs(perc) >= 0)
ORDER BY
dB DESC,
v2 DESC,
metric ASC
FORMAT PrettyCompactMonoBlock

You will receive a table with metrics comparing the two queries:

┌─metric──────────────────────────────────────────────┬───────v1─┬───────v2─┬──────────────────dB─┬───perc─┬─bar───────────────────────────────┐
│ OSReadBytes │ 0528384 │ inf │ 100 │ █████████████████████████████████ │
│ OSIOWaitMicroseconds │ 010000 │ inf │ 100 │ █████████████████████████████████ │
│ QueryProfilerRuns │ 01 │ inf │ 100 │ █████████████████████████████████ │
│ RealTimeMicroseconds │ 5101998006112.83521165516512794.79 │ ███████████████████████████████▎ │
│ WaitPrefetchTaskMicroseconds │ 2432211.27644629984224992.55 │ ██████████████████████████████▌ │
│ ConcurrencyControlSlotsAcquired │ 55910.71882007306125691.53 │ ██████████████████████████████▏ │
│ DiskReadElapsedMicroseconds │ 752854910.55697476489603891.2 │ ██████████████████████████████ │
│ CachedReadBufferReadFromCacheMicroseconds │ 767859710.49551562789274791.08 │ ██████████████████████████████ │
│ RemoteFSUnusedPrefetches │ 2158.75061263391700186.67 │ ████████████████████████████▌ │
│ SelectedMarksTotal │ 875456340638.59901064126475386.19 │ ████████████████████████████▍ │
│ ThreadpoolReaderSubmitReadSynchronouslyMicroseconds │ 1614103358.06406950577255484.38 │ ███████████████████████████▊ │
│ ThreadpoolReaderTaskMicroseconds │ 1614103358.06406950577255484.38 │ ███████████████████████████▊ │
│ FilesystemCacheLockKeyMicroseconds │ 17897.1894108526663980.9 │ ██████████████████████████▋ │
│ GlobalThreadPoolLockWaitMicroseconds │ 281356.83175737152786979.26 │ ██████████████████████████▏ │
│ LocalThreadPoolLockWaitMicroseconds │ 17806.726410656136696578.75 │ █████████████████████████▉ │
│ RowsReadByPrewhereReaders │ 1064964833286.56908659335307477.97 │ █████████████████████████▋ │
│ FileSegmentUseMicroseconds │ 9406.47817481888637577.5 │ █████████████████████████▌ │
│ LocalThreadPoolShrinks │ 17746.38782798352702277.03 │ █████████████████████████▍ │
│ LocalThreadPoolExpansions │ 21785.69875307956561273.08 │ ████████████████████████ │
│ GlobalThreadPoolJobs │ 22795.55204410468235272.15 │ ███████████████████████▊ │
│ SoftPageFaults │ 91030785.29227223174366370.44 │ ███████████████████████▏ │
│ LocalThreadPoolThreadCreationMicroseconds │ 1956515.2354637720567470.05 │ ███████████████████████ │
│ SelectedBytes │ 3415687113395795.21118864161558969.88 │ ███████████████████████ │
│ FileSegmentHolderCompleteMicroseconds │ 7235.166297960033360569.57 │ ██████████████████████▉ │
│ SystemTimeMicroseconds │ 10124325255.06865168747407368.87 │ ██████████████████████▋ │
│ CompressedReadBufferBytes │ 284080586419874.83172184993074667.13 │ ██████████████████████▏ │
│ CompressedReadBufferBlocks │ 371014.36119649715647663.37 │ ████████████████████▉ │
│ OSCPUWaitMicroseconds │ 1674344.14773258364927461.52 │ ████████████████████▎ │
│ SelectedRows │ 1064962621443.912066260130692459.38 │ ███████████████████▌ │
│ SelectedMarks │ 13323.912066260130692459.38 │ ███████████████████▌ │
│ RowsReadByMainReader │ 853412096573.903516396636324459.29 │ ███████████████████▌ │
│ LocalThreadPoolJobs │ 721733.807136066975269658.38 │ ███████████████████▎ │
│ OSCPUVirtualTimeMicroseconds │ 30914741493.799500957632875858.31 │ ███████████████████▏ │
│ ThreadpoolReaderSubmitLookupInCacheMicroseconds │ 59413243.481015401224875755.14 │ ██████████████████▏ │
│ FilesystemCacheGetOrSetMicroseconds │ 2445353.4096395568249954.39 │ █████████████████▉ │
│ UserTimeMicroseconds │ 20850420763.04928387128246250.45 │ ████████████████▋ │
│ FileSegmentWaitReadBufferMicroseconds │ 69813362.819510355163657347.75 │ ███████████████▊ │
│ CachedReadBufferCreateBufferMicroseconds │ 67912982.814049181838487647.69 │ ███████████████▋ │
│ CachedReadBufferReadFromCacheHits │ 36682.76206411938949147.06 │ ███████████████▌ │
│ ReadBufferFromFileDescriptorRead │ 36682.76206411938949147.06 │ ███████████████▌ │
│ RemoteFSPrefetches │ 36682.76206411938949147.06 │ ███████████████▌ │
│ ThreadpoolReaderSubmitReadSynchronously │ 36682.76206411938949147.06 │ ███████████████▌ │
│ BackgroundLoadingMarksTasks │ 27502.676062401770314746 │ ███████████████▏ │
│ MarkCacheHits │ 27502.676062401770314746 │ ███████████████▏ │
│ OpenedFileCacheMicroseconds │ 38702.653144433974466345.71 │ ███████████████ │
│ FileOpen │ 18332.63241434774581445.45 │ ██████████████▉ │
│ OpenedFileCacheMisses │ 18332.63241434774581445.45 │ ██████████████▉ │
│ IOBufferAllocBytes │ 41297957748465172.582430274107813344.82 │ ██████████████▊ │
│ IOBufferAllocs │ 57992.397603389250585742.42 │ █████████████▉ │
│ MetadataFromKeeperCacheHit │ 1803002.218487496163563740 │ █████████████▏ │
│ CreatedReadBufferOrdinary │ 36602.218487496163563740 │ █████████████▏ │
│ FilesystemCacheHoldFileSegments │ 36602.218487496163563740 │ █████████████▏ │
│ FilesystemCacheUnusedHoldFileSegments │ 36602.218487496163563740 │ █████████████▏ │
│ RemoteFSBuffers │ 36602.218487496163563740 │ █████████████▏ │
│ Seek │ 36602.218487496163563740 │ █████████████▏ │
│ FunctionExecute │ 1472332.000385862778428336.91 │ ████████████▏ │
│ RemoteFSPrefetchedReads │ 34531.927969525585339335.85 │ ███████████▊ │
│ OpenedFileCacheHits │ 18271.760912590556812433.33 │ ██████████▉ │
│ PartsLockHoldMicroseconds │ 7101.549019599857431730 │ █████████▉ │
│ LoggerElapsedNanoseconds │ 361183251312831.524984260528184729.61 │ █████████▊ │
│ RemoteFSSeeks │ 55771.4612803567823828.57 │ █████████▍ │
│ OSWriteChars │ 40188546661.336209034340795726.48 │ ████████▋ │
│ ContextLock │ 2453291.280298135854417725.53 │ ████████▍ │
│ OSReadChars │ 4839656111741.013509156896205820.81 │ ██████▊ │
│ OSWriteBytes │ 16384204800.969100130080564220 │ ██████▌ │
│ SelectedParts │ 12150.969100130080564220 │ ██████▌ │
│ SelectedPartsTotal │ 12150.969100130080564220 │ ██████▌ │
│ SelectedRanges │ 12150.969100130080564220 │ ██████▌ │
│ LogTrace │ 60740.910804693473325818.92 │ ██████▏ │
│ LogDebug │ 17200.705810742857072715 │ ████▉ │
│ CachedReadBufferReadFromCacheBytes │ 4487705262070.691328042009023814.72 │ ████▊ │
│ ReadBufferFromFileDescriptorReadBytes │ 4487705262070.691328042009023814.72 │ ████▊ │
│ RemoteFSPrefetchedBytes │ 4487705262070.691328042009023814.72 │ ████▊ │
│ ThreadpoolReaderReadBytes │ 4487705262070.691328042009023814.72 │ ████▊ │
│ ThreadpoolReaderSubmitReadSynchronouslyBytes │ 4487705262070.691328042009023814.72 │ ████▊ │
│ WaitMarksLoadMicroseconds │ 348305-0.5727940459979509-12.36 │ ████ │
│ ArenaAllocBytes │ 4915240960-0.7918124604762482-16.67 │ █████▌ │
│ ArenaAllocChunks │ 1210-0.7918124604762482-16.67 │ █████▌ │
│ ReadCompressedBytes │ 441904344814-1.0774304213935042-21.97 │ ███████▎ │
│ FilesystemCacheLockMetadataMicroseconds │ 2317-1.3127891463931898-26.09 │ ████████▌ │
│ ConcurrencyControlQueriesDelayed │ 10-inf │ -100 │ █████████████████████████████████ │
│ ConcurrencyControlSlotsDelayed │ 540-inf │ -100 │ █████████████████████████████████ │
│ ContextLockWaitMicroseconds │ 80-inf │ -100 │ █████████████████████████████████ │
│ FileSegmentFailToIncreasePriority │ 10-inf │ -100 │ █████████████████████████████████ │
│ ThreadpoolReaderPrepareMicroseconds │ 80-inf │ -100 │ █████████████████████████████████ │
└─metric──────────────────────────────────────────────┴───────v1─┴───────v2─┴──────────────────dB─┴───perc─┴─bar───────────────────────────────┘

85 rows in set. Elapsed: 0.074 sec. Processed 2.16 million rows, 102.28 MB (29.19 million rows/s., 1.38 GB/s.)
Peak memory usage: 114.84 MiB.

· One min read

Pricing

For pricing information see the ClickHouse Cloud Pricing page. To understand what can affect your bill, and ways that you can manage your spend, keep reading.

Amazon Web Services(AWS) Examples

Note

Prices reflect AWS us-east-1 pricing.

Development: From $51 per month

Best for: Starter projects & staging

  • Development service
  • 16 GiB RAM, 2 vCPU
  • 1 TB Data

Pricing breakdown for this example:

10% active50% activeAlways on
Compute$16$79$158
Storage$35$35$35
Total$51$114$193
Note

Consumption can be even lower if less than 1TB disk is used

Production (Idling, Auto-scaling): From $172 per month

Best for: Cost-sensitive ad-hoc analytics applications

  • Production Service
  • Active workload ~25% time
  • Idling on with default settings
  • Auto-scaling maximum set to prevent runaway bills

Pricing breakdown for this example:

Example 1Example 2Example 3
Compute24 GiB RAM, 6 vCPU
$125
192 GiB RAM, 48 vCPU
$1000
720 GiB RAM, 180 vCPU
$3750
Storage1 TB Data
$47
5 TB Data
$235
10 TB Data
$470
Total$172$1,235$4,220

Production (Always-on, Reserved capacity): From $550 per month​

Best for: Latency-sensitive applications

  • Production Service
  • Active workload ~100% time
  • Auto-scaling minimum set to reserve capacity

Pricing breakdown for this example:

Example 1Example 2Example 3
Compute24 GiB RAM, 6 vCPU
$503
96 GiB RAM, 24 vCPU
$2,012
360 GiB RAM, 90 vCPU
$7,545
Storage1 TB Data
$47
4 TB Data
$188
8 TB Data
$376
Total$550$2,200$7,921

For help with further estimation, please contact support if you are already a ClickHouse Cloud user, or sales@clickhouse.com otherwise.

Google Cloud Platform(GCP) Examples

Note

Prices reflect GCP us-central-1 pricing.

Development: From $46 per month

Best for: Starter projects & staging

  • Development service
  • 16 GiB RAM, 2 vCPU
  • 1 TB Data

Pricing breakdown for this example:

10% active50% activeAlways on
Compute$15$74$147
Storage$31$31$31
Total$46$105$178
Note

Consumption can be even lower if less than 1TB disk is used

Production (Idling, Auto-scaling): From $146 per month

Best for: Cost-sensitive ad-hoc analytics applications

  • Production Service
  • Active workload ~25% time
  • Idling on with default settings
  • Auto-scaling maximum set to prevent runaway bills

Pricing breakdown for this example:

Example 1Example 2Example 3
Compute24 GiB RAM, 6 vCPU
$105
192 GiB RAM, 48 vCPU
$843
720 GiB RAM, 180 vCPU
$3162
Storage1 TB Data
$41
5 TB Data
$205
10 TB Data
$410
Total$146$1,048$3,572

Production (Always-on, Reserved capacity): From $463 per month​

Best for: Latency-sensitive applications

  • Production Service
  • Active workload ~100% time
  • Auto-scaling minimum set to reserve capacity

Pricing breakdown for this example:

Example 1Example 2Example 3
Compute24 GiB RAM, 6 vCPU
$422
96 GiB RAM, 24 vCPU
$1,686
360 GiB RAM, 90 vCPU
$6,342
Storage1 TB Data
$41
4 TB Data
$164
8 TB Data
$328
Total$463$1,850$6,652

For help with further estimation, please contact support if you are already a ClickHouse Cloud user, or sales@clickhouse.com otherwise.

FAQs

How is compute metered?

ClickHouse Cloud meters compute on a per-minute basis, in 8G RAM increments.

How is storage on disk calculated?

ClickHouse Cloud uses cloud object storage and is metered on the compressed size of data stored in ClickHouse tables.

Do backups count toward total storage?

ClickHouse Cloud offers two free backups for production services, and one free backup for development services. Backups do not count toward storage.

How do I estimate compression?

Compression can vary quite a bit by dataset. It is dependent on how compressible the data is in the first place (number of high vs. low cardinality fields), and how the user sets up the schema (using optional codecs or not, for instance). It can be on the order of 10x for common types of analytical data, but it can be significantly lower or higher as well. See the optimizing documentation for guidance and this Uber blog for a detailed logging use case example. The only practical way to know exactly is to ingest your dataset into ClickHouse and compare the size of the dataset with the size stored in ClickHouse.

You can use the query SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = <your table name>.

What tools does ClickHouse offer to estimate the cost of running a service in the cloud if I have a self-managed deployment?

The ClickHouse query log captures key metrics that can be used to estimate the cost of running a workload in ClickHouse Cloud. For details on migrating from self-managed to ClickHouse Cloud please refer to the migration documentation, and contact ClickHouse Cloud support if you have further questions.

What billing options are available for ClickHouse Cloud?

ClickHouse Cloud supports the following billing options:

  • Self-service monthly (in USD, via credit card)
  • Direct-sales annual / multi-year (through pre-paid "ClickHouse Credits", in USD, with additional payment options)

How long is the billing cycle?

Billing follows a monthly billing cycle and the start date is tracked as the date when the ClickHouse Cloud organization was created.

What controls does ClickHouse Cloud offer to manage costs for Production services?

  • Trial and Annual Commit customers will be notified with automated emails when the consumption hits certain thresholds-50%, 75%, and 90%, so that users can take action.
  • ClickHouse Cloud allows users to set a maximum auto-scaling limit on their compute via Advanced scaling control, a significant cost factor for analytical workloads.
  • The Advanced scaling control lets you set memory limits with an option to control the behavior of pausing/idling during inactivity.

What controls does ClickHouse Cloud offer to manage costs for Developer services?

  • The Advanced scaling control lets you control the behavior of pausing/idling during inactivity. Adjusting memory allocation is not supported for Developer services
  • Note that the default setting pauses the service after a period of inactivity

If I have multiple services, do I get an invoice per service or a consolidated invoice?

A consolidated invoice is generated for all services in a given organization for a billing period.

If I add my credit card and upgrade before my trial period and credits expire, will I be charged?

When a user converts from trial to paid before the 30-day trial period ends, but with credits remaining from the trial credit allowance, we continue to draw down from the trial credits during the initial 30-day trial period, and then charge the credit card.

How can I keep track of my spending?

ClickHouse Cloud console includes a Usage display that gives detailed information about usage per service on compute and storage. This can be used to understand the cost breakdown by metered units.

How do I access my invoice for my AWS marketplace subscription to the ClickHouse Cloud service?

All marketplace subscriptions will be billed and invoiced by AWS. You can download the invoice from the AWS Billing Dashboard.

Why do the dates on the Usage statements not match my AWS Marketplace Invoice?

AWS Marketplace billing follows the calendar month cycle e.g., for usage between dates 01-Dec-2022 and 01-Jan-2023, an invoice will be generated between 3-Jan and 5-Jan-2023

ClickHouse Cloud usage statements follow a different billing cycle where usage is metered and reported over 30 days starting from the day of sign up

The usage and invoice dates will differ if these dates are not the same. Since usage statements track usage by day for a given service, users can rely on statements to see the breakdown of costs.

Are there any restrictions around the usage of prepaid credits?

ClickHouse Cloud prepaid credits (whether direct through ClickHouse, or via a cloud provider's marketplace) can only be leveraged for the terms of the contract. This means they can be applied on the acceptance date, or a future date, and not for any prior periods. Any overages not covered by prepaid credits must be covered by a credit card payment, or marketplace monthly billing.

Is there a difference in ClickHouse Cloud pricing, whether paying through the cloud provider marketplace or directly to ClickHouse?

There is no difference in pricing between marketplace billing and signing up directly with ClickHouse. In either case, your usage of ClickHouse Cloud is tracked in terms of ClickHouse Cloud Credits (CHCs), which are metered in the same way and billed accordingly.

· One min read

You can subscribe to ClickHouse Cloud through the AWS, GCP, and Azure marketplaces. This allows you to pay for ClickHouse Cloud through your existing cloud provider billing.

You can either use pay-as-you-go (PAYG) or commit to a contract with ClickHouse Cloud through the marketplace. The billing will be handled by the cloud provider, and you will receive a single invoice for all your cloud services.

FAQs

How can I verify that my organization is connected to marketplace billing?​

In the ClickHouse Cloud console, navigate to Billing. You should see the name of the marketplace and the link in the Payment details section.

I am an existing ClickHouse Cloud user. What happens when I subscribe to ClickHouse Cloud via AWS / GCP / Azure marketplace?​

Signing up for ClickHouse Cloud from the cloud provider marketplace is a two step process:

  1. You first "subscribe" to ClickHouse Cloud on the cloud providers' marketplace portal. After you have finished subscribing, you click on "Pay Now" or "Manage on Provider" (depending on the marketplace). This redirects you to ClickHouse Cloud.
  2. On Clickhouse Cloud you either register for a new account, or sign in with an existing account. Either way, a new ClickHouse Cloud organization will be created for you which is tied to your marketplace billing.

NOTE: Your existing services and organizations from any prior ClickHouse Cloud signups will remain and they will not be connected to the marketplace billing. ClickHouse Cloud allows you to use the same account to manage multiple organization, each with different billing.

You can switch between organizations from the bottom left menu of the ClickHouse Cloud console.

I am an existing ClickHouse Cloud user. What should I do if I want my existing services to be billed via marketplace?​

Please contact ClickHouse Cloud support in this case. You will need to subscribe to ClickHouse Cloud via the marketplace and we can switch organization linkage to resources so that billing happens via the marketplace.

I subscribed to ClickHouse Cloud as a marketplace user. How can I unsubscribe?​

Note that you can simply stop using ClickHouse Cloud and delete all existing ClickHouse Cloud services. Even though the subscription will still be active, you will not be paying anything as ClickHouse Cloud doesn't have any recurring fees.

If you want to unsubscribe, please navigate to the Cloud Provider console and cancel the subscription renewal there. Once the subscription ends, all existing services will be stopped and you will be prompted to add a credit card. If no card was added, after two weeks all existing services will be deleted.

I subscribed to ClickHouse Cloud as a marketplace user, and then unsubscribed. Now I want to subscribe back, what is the process?​

In that case please subscribe to the ClickHouse Cloud as usual (see sections on subscribing to ClickHouse Cloud via the marketplace).

  • For AWS marketplace a new ClickHouse Cloud organization will be created and connected to the marketplace.
  • For the GCP marketplace your old organization will be reactivated.

If you have any trouble with reactivating your marketplace org, please contact ClickHouse Cloud Support.

How do I access my invoice for my marketplace subscription to the ClickHouse Cloud service?​

Why do the dates on the Usage statements not match my Marketplace Invoice?​

Marketplace billing follows the calendar month cycle. For example, for usage between December 1st and January 1st, an invoice will be generated between January 3rd and January 5th.

ClickHouse Cloud usage statements follow a different billing cycle where usage is metered and reported over 30 days starting from the day of sign up.

The usage and invoice dates will differ if these dates are not the same. Since usage statements track usage by day for a given service, users can rely on statements to see the breakdown of costs.

Where can I find general billing information​?

Please see the Billing overview page.

Is there a difference in ClickHouse Cloud pricing, whether paying through the cloud provider marketplace or directly to ClickHouse?

There is no difference in pricing between marketplace billing and signing up directly with ClickHouse. In either case, your usage of ClickHouse Cloud is tracked in terms of ClickHouse Cloud Credits (CHCs), which are metered in the same way and billed accordingly.

· One min read

The root cause of these permissions errors has been identified and is due to more strict checking of grants in the new release for async_inserts.

To fix this issue, grants must be updated for your service to work. Check the error message indicating the missing grants and add those needed grants manually. The required additional grants for the tables using async_inserts will either be the SELECT or dictGet grant.

Add necessary GRANTs to affected tables and dictionaries

-- Add SELECT grant permissions
GRANT SELECT ON mydb.insertTable TO insert_role WITH GRANT OPTION

-- Add dictGet grant permissions
GRANT dictGet ON mydb.insertDictionary TO insert_role

To learn more about the GRANT command, please see this page.

If you are unable to perform this change, then please contact ClickHouse Support for assistance.

· 3 min read

Problem

Map lookup such as a['key'] works with linear complexity (mentioned here) and can be inefficient. This is because selecting a value with a specific key from a table would require iterating through all keys (~M) across all rows (N) in the Map column, resulting in ~MxN lookups.

A lookup using Map can be 10x slower than a String column. The experiment below also shows ~10x slowdown for cold query, and difference in multiple magnitudes of data processed (7.21 MB vs 5.65 GB).

-- create table with SpanNAme as String and ResourceAttributes as Map
DROP TABLE IF EXISTS tbl;
CREATE TABLE tbl (
`Timestamp` DateTime64(9) CODEC (Delta(8), ZSTD(1)),
`TraceId` String CODEC (ZSTD(1)),
`ServiceName` LowCardinality(String) CODEC (ZSTD(1)),
`Duration` UInt8 CODEC (ZSTD(1)), -- Int64
`SpanName` LowCardinality(String) CODEC (ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC (ZSTD(1))
)
ENGINE = MergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId);

-- create UDF to generate random Map data for ResourceAttributes
DROP FUNCTION IF EXISTS genmap;
CREATE FUNCTION genmap AS (n) -> arrayMap (x-> (x::String, (x*rand32())::String), range(1, n));

-- check that genmap is working as intended
SELECT genmap(10)::Map(String, String);

-- insert 1M rows
INSERT INTO tbl
SELECT
now() - randUniform(1, 1000000.) as Timestamp,
randomPrintableASCII(2) as TraceId,
randomPrintableASCII(2) as ServiceName,
rand32() as Duration,
randomPrintableASCII(2) as SpanName,
genmap(rand64()%500)::Map(String, String) as ResourceAttributes
FROM numbers(1_000_000);

-- querying for SpanName is faster
-- [cold] 0 rows in set. Elapsed: 0.642 sec. Processed 1.00 million rows, 7.21 MB (1.56 million rows/s., 11.22 MB/s.)
-- [warm] 0 rows in set. Elapsed: 0.164 sec. Processed 1.00 million rows, 7.21 MB (6.10 million rows/s., 43.99 MB/s.)
SELECT
COUNT(*),
avg(Duration/1E6) as average,
quantile(0.95)(Duration/1E6) as p95,
quantile(0.99)(Duration/1E6) as p99,
SpanName
FROM tbl
GROUP BY SpanName ORDER BY 1 DESC LIMIT 50 FORMAT Null;

-- query for ResourceAttributes is slower
-- [cold] 0 rows in set. Elapsed: 6.432 sec. Processed 1.00 million rows, 5.65 GB (155.46 thousand rows/s., 879.07 MB/s.)
-- [warm] 0 rows in set. Elapsed: 5.935 sec. Processed 1.00 million rows, 5.65 GB (168.50 thousand rows/s., 952.81 MB/s.)
SELECT
COUNT(*),
avg(Duration/1E6) as average,
quantile(0.95)(Duration/1E6) as p95,
quantile(0.99)(Duration/1E6) as p99,
ResourceAttributes['1'] as hostname
FROM tbl
GROUP BY hostname ORDER BY 1 DESC LIMIT 50 FORMAT Null;

Solution To improve the query, we can add another column with the value defaulting to a particular key in the Map column, and then materializing it to populate value for existing rows. This way, we extract and store the necessary value at insertion time, thereby speeding up the lookup at query time.

-- solution is to add a column with value defaulting to a particular key in Map
ALTER TABLE tbl ADD COLUMN hostname LowCardinality(String) DEFAULT ResourceAttributes['1'];
ALTER TABLE tbl MATERIALIZE COLUMN hostname;

-- query for hostname (new column) is now faster
-- [cold] 0 rows in set. Elapsed: 2.215 sec. Processed 1.00 million rows, 21.67 MB (451.52 thousand rows/s., 9.78 MB/s.)
-- [warm] 0 rows in set. Elapsed: 0.541 sec. Processed 1.00 million rows, 21.67 MB (1.85 million rows/s., 40.04 MB/s.)
SELECT
COUNT(*),
avg(Duration/1E6) as average,
quantile(0.95)(Duration/1E6) as p95,
quantile(0.99)(Duration/1E6) as p99,
hostname
FROM tbl
GROUP BY hostname ORDER BY 1 DESC LIMIT 50 FORMAT Null;

-- drop cache to run query cold
SYSTEM DROP FILESYSTEM CACHE;

· 2 min read

The short answer is "yes". ClickHouse has multiple mechanisms that allow freeing up disk space by removing old data. Each mechanism is aimed for different scenarios.

TTL

ClickHouse allows to automatically drop values when some condition happens. This condition is configured as an expression based on any columns, usually just static offset for any timestamp column.

The key advantage of this approach is that it does not need any external system to trigger, once TTL is configured, data removal happens automatically in background.

Note

TTL can also be used to move data not only to /dev/null, but also between different storage systems, like from SSD to HDD.

More details on configuring TTL.

DELETE FROM

DELETE FROM allows standard DELETE queries to be run in ClickHouse. The rows targeted in the filter clause are marked as deleted, and removed from future result sets. Cleanup of the rows happens asynchronously.

Note

DELETE FROM is generally available from version 23.3 and newer. On older versions, it is experimental and must be enabled with:

SET allow_experimental_lightweight_delete = true;

ALTER DELETE

ALTER DELETE removes rows using asynchronous batch operations. Unlike DELETE FROM, queries run after the ALTER DELETE and before the batch operations complete will include the rows targeted for deletion. For more details see the ALTER DELETE docs.

ALTER DELETE can be issued to flexibly remove old data. If you need to do it regularly, the main downside will be the need to have an external system to submit the query. There are also some performance considerations since mutations rewrite complete parts even there is only a single row to be deleted.

This is the most common approach to make your system based on ClickHouse GDPR-compliant.

More details on mutations.

DROP PARTITION

ALTER TABLE ... DROP PARTITION provides a cost-efficient way to drop a whole partition. It’s not that flexible and needs proper partitioning scheme configured on table creation, but still covers most common cases. Like mutations need to be executed from an external system for regular use.

More details on manipulating partitions.

TRUNCATE

It’s rather radical to drop all data from a table, but in some cases it might be exactly what you need.

More details on table truncation.

· 2 min read

Question

How can I tell if a projection is used?

Answer

  1. Create a sample database
CREATE database db1;
  1. Create a sample table that will use column1 as the primary key
CREATE table db1.table1_projections
(
column1 Int32,
column2 Int32
)
engine = MergeTree()
order by column1;
  1. Add a projection for_column2 to use column2 as the primary key
ALTER table db1.table1_projections add projection for_column2
(
select *
order by column2
);
  1. Insert test data

*this inserts 100000 rows with random numbers in column1 and column2

INSERT INTO db1.table1_projections
select
floor(randNormal(50, 5)) as column1,
floor(randUniform(1, 100)) as column2
from numbers(100000);
  1. Check sample set of data
clickhouse-cloud :) SELECT * from db1.table1_projections limit 5;

SELECT *
FROM db1.table1_projections
LIMIT 5

Query id: d6940799-b507-4a5e-9843-df55ebe818ab

┌─column1─┬─column2─┐
│ 28 │ 41 │
│ 29 │ 12 │
│ 30 │ 73 │
│ 30 │ 75 │
│ 30 │ 70 │
└─────────┴─────────┘
  1. Test that it is using the original table with column1:
clickhouse-cloud :) explain indexes = 1 
SELECT count() from db1.table1_projections where column1 > 50;

EXPLAIN indexes = 1
SELECT count()
FROM db1.table1_projections
WHERE column1 > 50

Query id: e04d5236-1a05-4f1f-9502-7e41986beb44

┌─explain────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Aggregating │
│ Expression (Before GROUP BY) │
│ Filter (WHERE) │
│ ReadFromMergeTree (db1.table1_projections) │
│ Indexes: │
│ PrimaryKey │
│ Condition: true │
│ Parts: 1/1 │
│ Granules: 12/12 │
└────────────────────────────────────────────────────┘

*notice that it is reading from db1.table1_projections

  1. Test reading from the projection by using column2 in the where clause
clickhouse-cloud :) explain indexes = 1 
SELECT * from db1.table1_projections where column2 > 50;

EXPLAIN indexes = 1
SELECT *
FROM db1.table1_projections
WHERE column2 > 50

Query id: d2b20e01-93bf-4b60-a370-4aac7b454267

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Filter │
│ ReadFromMergeTree (for_column2) │
│ Indexes: │
│ PrimaryKey │
│ Keys: │
│ column2 │
│ Condition: (column2 in [51, +Inf)) │
│ Parts: 1/1 │
│ Granules: 6/12 │
└─────────────────────────────────────────────┘

*notice that now the for_column2 projection is used.

For more info

Projections: https://clickhouse.com/docs/en/sql-reference/statements/alter/projection

numbers table function: https://clickhouse.com/docs/en/sql-reference/table-functions/numbers

Blog for generating random data: https://clickhouse.com/blog/generating-random-test-distribution-data-for-clickhouse

· 2 min read

Can I just run HTTP requests to ClickHouse server using requests module?

Answer

Yes, here is the sample code.

import requests
import datetime

create_replace_stmt = 'CREATE OR REPLACE TABLE test_table (name String, age UInt8) Engine=MergeTree ORDER BY tuple();'
select_query = 'SELECT count() FROM test_table'
insert_query = 'INSERT INTO test_table SELECT * FROM generateRandom(\'name String, age UInt8\',1,1) LIMIT 300000000'

CH_URL = 'https://your_clickhouse_service_fqdn:8443'
CH_USER = 'default'
CH_PASSWORD = 'secret_pwd'

headers = {}
headers["X-ClickHouse-User"] = CH_USER
headers["X-ClickHouse-Key"] = CH_PASSWORD

now = (datetime.datetime.now())
print("{} - starting...".format(now))


# create/replace table
now = (datetime.datetime.now())
print("{} - creating/replacing table...".format(now))
response = requests.post(url=CH_URL,
params={"database": "default",
"query": create_replace_stmt,
"session_id": "my-session-id-string"
},
headers=headers)

# select count()
response = requests.post(url=CH_URL,
params={"database": "default",
"query": select_query,
"session_id": "my-session-id-string"
},
headers=headers)

now = (datetime.datetime.now())
print("{} - elements in test_table before insert: {}".format(
now, response.content.decode('utf-8')))


# insert
now = (datetime.datetime.now())
print("{} - Inserting data...".format(now))
response = requests.post(url=CH_URL,
params={"database": "default",
"query": insert_query,
"session_id": "my-session-id-string",
"wait_end_of_query": 1
},
headers=headers)

now = (datetime.datetime.now())
print("{} - Done inserting data...".format(now))

response = requests.post(url=CH_URL,
params={"database": "default",
"query": select_query,
"session_id": "my-session-id-string",
},
headers=headers)

now = (datetime.datetime.now())
print("{} - elements in test_table after insert: {}".format(
now, response.content.decode('utf-8')))

Sample expected output:

(venv) ➜  venv/bin/python main.py
2023-07-07 14:54:27.336450 - starting...
2023-07-07 14:54:27.336476 - creating/replacing table...
2023-07-07 14:54:28.125270 - elements in test_table before insert: 0

2023-07-07 14:54:28.125352 - Inserting data...
2023-07-07 14:55:23.788466 - Done inserting data...
2023-07-07 14:55:23.962134 - elements in test_table after insert: 299115357

requirements.txt

requests==2.31.0

See this other KB for steps on how setup your python venv.