JIT compilation of queries in ClickHouse

Maksim Kita, ClickHouse

About me

Maksim, developer of ClickHouse.

Plan

1. ClickHouse query execution.

2. JIT compilation.

3. ClickHouse compilation of expressions.

4. ClickHouse compilation of operator GROUP BY.

5. ClickHouse compilation of ORDER BY comparator.

ClickHouse query execution

ClickHouse Query Execution

Column-oriented storage — data is physically stored by columns. Only necessary columns are read from disk during query.

Vectorized query execution — data is processed in blocks. Block contains multiple columns with max_block_size rows (65536 by default).

Each column is stored as a vector of primitive data types or their combination:

1. Better utilization for CPU caches and pipeline.

2. Data is processed using SIMD instructions.

ClickHouse Columns

Numeric columns — PODArray. Almost the same as std::vector.

1. Use our Allocator with support of realloc.

2. No additional memset during resize.

3. Padding with 15 bytes at the end.

ClickHouse Columns

Nullable columns contain data column and UInt8 column bitmask is element null.

Array columns contain data column and UInt64 column with offsets.

Const column contain 1 constant value.

ClickHouse Function

Takes arguments as columns and returns function result as column.

class IFunction
{

virtual ~IFunction() = default;

virtual ColumnPtr executeImpl(
    const ColumnsWithTypeAndName & arguments,
    const DataTypePtr & result_type,
    size_t input_rows_count) const = 0;

...

}

Specializations using templates for different types. Example plus, multiply for different types combinations.

Specializations for constant columns. Example plus, multiply with constant column.

ClickHouse Plus Function

Function plus has:

UInt8 UInt16 UInt32 UInt64 UInt8 UInt16 UInt32 UInt64 Int8 Int16 Int32 Int64 ✕ Int8 Int16 Int32 Int64 Float32 Float64 Float32 Float64

specialization.

In addition specialization if one of the columns is a constant column

20 x 20 = 400 specializations for single plus function.

ClickHouse Function

Advantages:

1. Code isolation.

2. High efficiency (Necessary specializations can be generated).

3. Compiler can generate SIMD instructions.

Disadvantages:

1. Heavy template usage (Templates can be complex).

2. Binary code bloat.

3. No way to fuse multiple expressions into one. Example: x * y + 5.

ClickHouse Query Execution

1. Parse query into AST.

2. Make AST optimizations (Most of them need to be moved into logical, physical query plans or expression DAG optimizations).

3. Build logical query plan + make logical query plan optimizations.

4. Build physical query plan + make physical query plan optimizations.

5. Execution.

AST

EXPLAIN AST value * 2 + 1 FROM test_table
WHERE value > 10 ORDER BY value;

┌─explain─────────────────────────────────────┐
│ SelectWithUnionQuery (children 1)           │
│  ExpressionList (children 1)                │
│   SelectQuery (children 4)                  │
│    ExpressionList (children 1)              │
│     Function plus (children 1)              │
│      ExpressionList (children 2)            │
│       Function multiply (children 1)        │
│        ExpressionList (children 2)          │
│         Identifier value                    │
│         Literal UInt64_2                    │
│       Literal UInt64_1                      │
│    TablesInSelectQuery (children 1)         │
│     TablesInSelectQueryElement (children 1) │
│      TableExpression (children 1)           │
│       TableIdentifier test_table            │
│    Function greater (children 1)            │
│     ExpressionList (children 2)             │
│      Identifier value                       │
│      Literal UInt64_10                      │
│    ExpressionList (children 1)              │
│     OrderByElement (children 1)             │
│      Identifier value                       │
└─────────────────────────────────────────────┘

Logical Query Plan

EXPLAIN SELECT value * 2 + 1 FROM test_table
WHERE value > 10 ORDER BY value;

┌─explain──────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY [lifted up part]))     │
│   Sorting (Sorting for ORDER BY)                                 │
│     Expression (Before ORDER BY)                                 │
│       Filter (WHERE)                                             │
│         SettingQuotaAndLimits                                    │
│           ReadFromMergeTree (default.test_table)                 │
└──────────────────────────────────────────────────────────────────┘

Physical Query Plan

EXPLAIN PIPELINE SELECT value * 2 + 1 FROM test_table
WHERE value > 10 ORDER BY value;

┌─explain────────────────────────────────────┐
│ (Expression)                               │
│ ExpressionTransform                        │
│   (Sorting)                                │
│   MergingSortedTransform 16 → 1            │
│     MergeSortingTransform × 16             │
│       LimitsCheckingTransform × 16         │
│         PartialSortingTransform × 16       │
│           (Expression)                     │
│           ExpressionTransform × 16         │
│             (Filter)                       │
│             FilterTransform × 16           │
│               (SettingQuotaAndLimits)      │
│                 (ReadFromMergeTree)        │
│                 MergeTreeThread × 16 0 → 1 │
└────────────────────────────────────────────┘

JIT compilation

JIT Compilation

JIT - Just in time compilation

Generate machine code and execute in runtime.

Examples: JVM Hotspot, V8.

JIT Compilation

Use LLVM for JIT compilation.

Use our own wrapper around LLVM dynamic linker and compiler.

ClickHouse JIT Compilation

2018 - First implementation for JIT compilation of expressions. Developer Denis Skorobogatov.

2019 - Improvements in initial implementation. Enable by default in production. Developers Alexey Milovidov, Alexander Sapin.

2019 - Disable JIT by default. Available as an experimental feature.

2021 - Fix issues with JIT compilation of expressions. Enable JIT compilation by default. Developer Maksim Kita.

2021 - Implement JIT compilation for GROUP BY operator.

2022 - Implement JIT compilation for ORDER BY comparator.

ClickHouse compilation of expressions

Expressions Evaluation

EXPLAIN PIPELINE SELECT value * 2 + 1 FROM test_table
WHERE value > 10 ORDER BY value;

┌─explain────────────────────────────────────┐
│ (Expression)                               │
│ ExpressionTransform                        │
│   (Sorting)                                │
│   MergingSortedTransform 16 → 1            │
│     MergeSortingTransform × 16             │
│       LimitsCheckingTransform × 16         │
│         PartialSortingTransform × 16       │
│           (Expression)                     │
│           ExpressionTransform × 16         │
│             (Filter)                       │
│             FilterTransform × 16           │
│               (SettingQuotaAndLimits)      │
│                 (ReadFromMergeTree)        │
│                 MergeTreeThread × 16 0 → 1 │
└────────────────────────────────────────────┘

ExpressionTransform

SELECT a + b + c FROM test_table;

Expressions are represented in DAG that has input, function, constant types of node. Example:

Expressions Evaluation Problems

1. During DAG interpretation, data is moving between functions. Operations are not fused.

Example plus(plus(a, b), c)) first plus on columns a and b is performed and the result stored in a temporary column.. Then plus with temporary column and column c is performed.

2. Code bloat for different function specializations using templates, increases binary size.

3. For AVX256, AVX512 instructions need to use runtime instructions specialization using CPUID. Necessary for distributing portable binary.

Expressions Compilation Algorithm

1. For each node in ActionsDAG get compilable_children_size

2. Sort nodes in descending order of compilable_children_size, to first compile nodes with most children.

3. Check if node can be compiled using heuristics. Currently we require node to contain at least 1 compilable children.

4. Compile node functions together into a function that takes raw columns data pointers and returns expression result.

4. Replace node in DAG with special LLVMFunction node. LLVMFunction execute method converts columns into raw data and calls compiled function

Expressions DAG Without JIT

SELECT a + b + c FROM test_table;

Expressions DAG With JIT

SET compile_expressions = 1;
SELECT a + b + c FROM test_table;

How JIT can help

1. Improve L1, L2 cache usages.

2. Less code to execute. It is placed on 1 page. Better usage of CPU branch predictor.

3. Eliminate indirections.

4. Multiple operations are fused in one function. More optimizations can be performed by the compiler.

5. Using target CPU instructions (AVX256, AVX512) if necessary.

Which Functions Can Be Compiled

Any function potentially can be compiled, but generating IR is inconvenient. Currently supported:

1. Binary operators. Example: plus, minus, multiply, xor.

2. Unary operators. Example: abs.

3. Logical functions. Example: and, or, not.

4. Branch functions. Example: if, multiIf.

5. Bit shift functions. Example: bitShiftLeft.

JIT Expressions Compilation Example

SELECT a + b * c + 5 FROM test_jit_merge_tree;

loop:                                             ; preds = %loop, %entry
%19 = phi i64 [ 0, %entry ], [ %34, %loop ] /// Get counter phi node. If started from entry it is 0.
%20 = phi i64* [ %5, %entry ], [ %30, %loop ] /// Argument column phi node
%21 = phi i64* [ %9, %entry ], [ %31, %loop ] /// Argument column phi node
%22 = phi i64* [ %13, %entry ], [ %32, %loop ] /// Argument column phi node
%23 = phi i64* [ %17, %entry ], [ %33, %loop ] /// Result column phi node
%24 = load i64, i64* %20, align 8 /// Load argument current row
%25 = load i64, i64* %21, align 8 /// Load argument current row
%26 = load i64, i64* %22, align 8 /// Load argument current row
%27 = mul i64 %25, %26 /// Multiply b * c
%28 = add i64 %24, %27 /// Add a + result of multiply b * c
%29 = add i64 %28, 5 /// Add with constant 5
store i64 %29, i64* %23, align 8 /// Store value in result column
%30 = getelementptr inbounds i64, i64* %20, i64 1 /// Adjust pointer to next element
%31 = getelementptr inbounds i64, i64* %21, i64 1 /// Adjust pointer to next element
%32 = getelementptr inbounds i64, i64* %22, i64 1 /// Adjust pointer to next element
%33 = getelementptr inbounds i64, i64* %23, i64 1 /// Adjust pointer to next element
%34 = add i64 %19, 1 /// Increase counter
%35 = icmp eq i64 %34, %0br i1 %35, label %end, label %loop /// Check loop predicate

JIT Expressions Compilation Example

void aPlusBMulitplyCPlusConstant(
    int64_t * a,
    int64_t * b,
    int64_t * c,
    int64_t constant,
    int64_t * result,
    size_t size)
{
    for (size_t i = 0; i < size; ++i)
    {
        *result = (*a) + (*b) * (*c) + constant;

        ++a;
        ++b;
        ++c;
        ++result;
    }
}

JIT Expressions Compilation Example


.LBB0_8:                                # %vector.body
    vmovdqu	(%r11,%rax,8), %ymm1
    vmovdqu	(%r9,%rax,8), %ymm3
    vmovdqu	32(%r11,%rax,8), %ymm2
    vmovdqu	32(%r9,%rax,8), %ymm4
    vpsrlq	$32, %ymm3, %ymm5
    vpsrlq	$32, %ymm1, %ymm6
    vpmuludq	%ymm1, %ymm5, %ymm5
    vpmuludq	%ymm6, %ymm3, %ymm6
    vpmuludq	%ymm1, %ymm3, %ymm1
    vpsrlq	$32, %ymm4, %ymm3
    vpmuludq	%ymm2, %ymm3, %ymm3
    vpaddq	%ymm5, %ymm6, %ymm5
    vpsllq	$32, %ymm5, %ymm5
    vpaddq	%ymm5, %ymm1, %ymm1
    vpsrlq	$32, %ymm2, %ymm5
    vpmuludq	%ymm2, %ymm4, %ymm2
    vpaddq	(%r14,%rax,8), %ymm1, %ymm1
    vpmuludq	%ymm5, %ymm4, %ymm5
    vpaddq	%ymm3, %ymm5, %ymm3
    vpsllq	$32, %ymm3, %ymm3
    vpaddq	%ymm3, %ymm2, %ymm2
    vpaddq	32(%r14,%rax,8), %ymm2, %ymm2
    vpaddq	%ymm0, %ymm1, %ymm1 /// in ymm0 there is constant 5. vpbroadcastq	(%rbp), %ymm0
    vmovdqu	%ymm1, (%r10,%rax,8)
    vpaddq	%ymm0, %ymm2, %ymm2
    vmovdqu	%ymm2, 32(%r10,%rax,8)
    addq	$8, %rax
    cmpq	%rax, %r8

JIT Compilation Costs

JIT standard expression compilation time is around 15 ms. Grows linearly with code size.

Typical compiled expressions use 1 page for the code section and 1 page for the data section. 4096 * 2 = 8192 bytes on most configurations.

Introspection works inside ClickHouse using `CompileExpressionsMicroseconds`, `CompileExpressionsBytes` metrics.

SELECT
    ProfileEvents['CompileExpressionsMicroseconds'] AS compiled_time,
    ProfileEvents['CompileExpressionsBytes'] AS compiled_bytes
FROM system.query_log
WHERE compiled_time > 0;

┌─compiled_time─┬─compiled_bytes─┐
│         16258 │           8192 │
│         26792 │           8192 │
│         15280 │           8192 │
│         11594 │           8192 │
│         14989 │           8192 │
└───────────────┴────────────────┘

JIT Expressions Compilation Results

SELECT count() FROM hits
WHERE
     ((EventDate >= '2018-08-01')
    AND (EventDate <= '2018-08-03')
    AND (CounterID >= 34))
OR    ((EventDate >= '2018-08-04')
    AND (EventDate <= '2018-08-05')
    AND (CounterID <= 101500))

— 649 533 033 rows per second.

JIT Expressions Compilation Results

SET compile_expressions = 1;

SELECT count() FROM hits
WHERE
    ((EventDate >= '2018-08-01')
    AND (EventDate <= '2018-08-03')
    AND (CounterID >= 34))
OR    ((EventDate >= '2018-08-04')
    AND (EventDate <= '2018-08-05')
    AND (CounterID <= 101500))

— 865 491 052 rows per second.
— +33% performance improvement!

JIT Expressions Compilation Results

SELECT
    number * 2 +
    number * 3 +
    number * 4 +
    number * 5
FROM system.numbers
FORMAT Null

— 0 rows in set. Elapsed: 0.903 sec. Processed 329.23 million rows, 2.63 GB (364.78 million rows/s., 2.92 GB/s.

JIT Expressions Compilation Results

SET compile_expressions = 1;
SELECT
    number * 2 +
    number * 3 +
    number * 4 +
    number * 5
FROM system.numbers
FORMAT Null

— 0 rows in set. Elapsed: 1.602 sec. Processed 1.89 billion rows, 15.15 GB (1.18 billion rows/s., 9.46 GB/s.)
— +323% performance improvement!

JIT Expressions Compilation Results

SET compile_expressions = 1;

WITH number AS x, if(x = 1, 1, if(x = 2, 2,
if(x = 3, 3, if(x = 4, 4, if(x = 5, 5,
if(x = 6, 6, if(x = 7, 7, if(x = 8, 8,
if(x = 9, 9, if(x = 10, 10,
if(x = 11, 11, 12))))))))))) AS res
SELECT sum(res)
FROM numbers(10000000)

— 0 rows in set. Elapsed: 0.150 sec. Processed 10.02 million rows, 80.18 MB (66.95 million rows/s., 535.56 MB/s.).

JIT Expressions Compilation Results

SET compile_expressions = 1;

WITH number AS x, if(x = 1, 1, if(x = 2, 2,
if(x = 3, 3, if(x = 4, 4, if(x = 5, 5,
if(x = 6, 6, if(x = 7, 7, if(x = 8, 8,
if(x = 9, 9, if(x = 10, 10,
if(x = 11, 11, 12))))))))))) AS res
SELECT sum(res)
FROM numbers(10000000)

— 0 rows in set. Elapsed: 0.061 sec. Processed 10.02 million rows, 80.18 MB (163.20 million rows/s., 1.31 GB/s.)
— +244% performance improvement!

JIT Performance CPU Cycles

JIT Performance CPU Cache Misses

JIT Performance CPU Branch Misses

ClickHouse compilation of operator GROUP BY

ClickHouse Aggregation

EXPLAIN PIPELINE SELECT sum(UserID)
FROM default.hits_100m_single GROUP BY WatchID;
┌─explain────────────────────────────────┐
│ (Expression)                           │
│ ExpressionTransform                    │
│   (Aggregating)                        │
│   Resize 16 → 1                        │
│     AggregatingTransform × 15          │
│       StrictResize 16 → 16             │
│         (Expression)                   │
│         ExpressionTransform × 16       │
│           (SettingQuotaAndLimits)      │
│             (ReadFromMergeTree)        │
│             MergeTreeThread × 16 0 → 1 │
└────────────────────────────────────────┘

ClickHouse Aggregation

Aggregate Function Interface

class IAggregateFunction
{
...
    virtual ~IAggregateFunction() = default;

    /// AggregateDataPtr pointer to aggregate data for unique key during GROUP BY

    /// Create empty data for aggregation with `placement new` at the specified location.
    virtual void create(AggregateDataPtr place) const = 0;

    /** Adds a value into aggregation data on which place points to.
    * columns points to columns containing arguments of aggregation function.
    * row_num is number of row which should be added.
    */
    virtual void add(
        AggregateDataPtr place,
        const IColumn ** columns,
        size_t row_num,
        Arena * arena) const = 0;

    /// Merges state (on which place points to) with other state of current aggregation function.
    virtual void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena * arena) const = 0;

    /// Inserts results into a column. This method might modify the state (e.g.
    /// sort an array), so must be called once, from single thread.
    virtual void insertResultInto(AggregateDataPtr place, IColumn & to, Arena * arena) const = 0;
...
}

Aggregator Functions Interaction

SELECT
    sum(UserID),
    sum(ClientIP),
    sum(CounterClass),
    sum(CounterID),
    sum(WatchID)
FROM default.hits_100m_obfuscated
GROUP BY WatchID

1. For each unique key, create an aggregate state for each function.

2. For each key update aggregate state for each.

3. For each unique key merge aggregate state for each function, in case multiple threads are used.

4. For each unique key insert result from aggregate state into final columns for each function.

Aggregation Problems

1. A lot of virtual function calls.

2. For Nullable columns we have Nullable wrapper, additional indirection layer.

3. Aggregation combinators. -If, -Array, additional indirection layer.

JIT Compilation Strategy

Fuse multiple aggregate functions actions into one.

Functions require 4 actions: create, add, merge, insert. Fuse them and compile them into 4 functions.

1. Create function for multiple aggregate functions.

2. Add function for multiple aggregate functions.

3. Merge function for multiple aggregate functions.

4. Insert final result function for multiple aggregate functions.

Which functions can be compiled

1. Most common aggregate functions sum, count, min, max, avg, avgWeighted.

2. Combinators -If.

3. Nullable aggregate function adaptor.

JIT Aggregation Compilation Example

SELECT
    sum(UserID),
    avg(ClientIP),
    sum(CounterClass),
    min(CounterID),
    max(WatchID)
FROM default.hits_100m_obfuscated
GROUP BY WatchID

SELECT
    sum_avg_sum_min_max(
        UserID,
        ClientIP,
        CounterClass,
        CounterID,
        WatchID)
FROM default.hits_100m_obfuscated
GROUP BY WatchID

JIT Aggregation Compilation Results

SELECT
    sum(UserID),
    sum(ClientIP),
    sum(CounterClass),
    sum(CounterID),
    sum(WatchID)
FROM default.hits_100m_obfuscated
GROUP BY WatchID

— 0 rows in set. Elapsed: 5.504 sec. Processed 100.00 million rows, 2.50 GB (18.17 million rows/s., 454.21 MB/s.)

JIT Aggregation Compilation Results

SET compile_aggregate_expression = 1;

SELECT
    sum(UserID),
    sum(ClientIP),
    sum(CounterClass),
    sum(CounterID),
    sum(WatchID)
FROM default.hits_100m_obfuscated
GROUP BY WatchID

— 0 rows in set. Elapsed: 4.146 sec. Processed 100.00 million rows, 2.50 GB (24.12 million rows/s., 603.06 MB/s.)
— +34% performance improvement!

JIT Aggregation Compilation Results

WITH (WatchID % 2 == 0) AS predicate
SELECT
    minIf(UserID, predicate),
    minIf(ClientIP, predicate),
    minIf(CounterClass, predicate),
    minIf(CounterID, predicate),
    minIf(WatchID, predicate)
FROM default.hits_100m_obfuscated
GROUP BY WatchID

— 0 rows in set. Elapsed: 6.234 sec. Processed 100.00 million rows, 2.50 GB (16.04 million rows/s., 401.05 MB/s.)

JIT Aggregation Compilation Results

SET compile_aggregate_expression = 1;

WITH (WatchID % 2 == 0) AS predicate
SELECT
    minIf(UserID, predicate),
    minIf(ClientIP, predicate),
    minIf(CounterClass, predicate),
    minIf(CounterID, predicate),
    minIf(WatchID, predicate)
FROM default.hits_100m_obfuscated
GROUP BY WatchID

— 0 rows in set. Elapsed: 4.146 sec. Processed 100.00 million rows, 2.50 GB (24.12 million rows/s., 603.06 MB/s.)
— +71% performance improvement!

JIT Performance CPU Cycles

JIT Performance CPU Cache Misses

JIT Performance CPU Branch Misses

ClickHouse compilation of ORDER BY comparator

ClickHouse Sorting

EXPLAIN PIPELINE SELECT WatchID FROM hits_100m_single
ORDER BY WatchID, CounterID;

┌─explain──────────────────────────────────┐
│ (Expression)                             │
│ ExpressionTransform                      │
│   (Sorting)                              │
│   MergingSortedTransform 16 → 1          │
│     MergeSortingTransform × 16           │
│       LimitsCheckingTransform × 16       │
│         PartialSortingTransform × 16     │
│           (Expression)                   │
│           ExpressionTransform × 16       │
│             (SettingQuotaAndLimits)      │
│               (ReadFromMergeTree)        │
│               MergeTreeThread × 16 0 → 1 │
└──────────────────────────────────────────┘

ClickHouse Sorting

PartialSortingTransform — sort block, apply special optimization for LIMIT if specified.

MergeSortingTransform — sort blocks using k-way-merge algorithm, output stream of sorted blocks.

MergingSortedTransform — sort streams of sorted blocks using k-way-merge algorithm.

ClickHouse Sorting

JIT Comparator Compilation

ORDER BY WatchID, CounterID

We will apply comparator to each row almost N * log(N) * 2 times during our MergeSortingTransform, MergingSortedTransform.

We can compile it in a single function, to avoid unnecessary indirections.

For Nullable columns performance could be even better.

JIT Comparator Compilation Results

SET compile_sort_description=0;

SELECT WatchID FROM hits_100m_single
ORDER BY WatchID, CounterID

— 0 rows in set. Elapsed: 6.408 sec. Processed 100.00 million rows, 1.20 GB (15.60 million rows/s., 187.26 MB/s.)

JIT Comparator Compilation Results

SET compile_sort_description=1;

SELECT WatchID FROM hits_100m_single
ORDER BY WatchID, CounterID

— 0 rows in set. Elapsed: 5.300 sec. Processed 100.00 million rows, 1.20 GB (18.87 million rows/s., 226.40 MB/s.)
— +20% performance improvement!

JIT Comparator Compilation Results

SET compile_sort_description=1;

SELECT * FROM test_nullable_table
ORDER BY value

— 0 rows in set. Elapsed: 5.299 sec. Processed 100.00 million rows, 1.70 GB (18.87 million rows/s., 320.81 MB/s.)

JIT Comparator Compilation Results

SET compile_sort_description=1;

SELECT * FROM test_nullable_table
ORDER BY value

— 0 rows in set. Elapsed: 4.131 sec. Processed 100.00 million rows, 1.70 GB (24.21 million rows/s., 411.50 MB/s.)
— +30% performance improvement!

Results

JIT compilation improves performance of expression evaluation and aggregation.

For expression evaluation JIT compilation improves performance in 1.5-3 times (for some cases more than 20 times).

For aggregation JIT compilation improves performance in 1.15-2 times.

For order by comparator JIT compilation improves performance in 1.15-1.5 times

JIT compilation of queries in ClickHouse

Maksim Kita, ClickHouse

About me

Plan

ClickHouse query execution

ClickHouse Query Execution

ClickHouse Columns

ClickHouse Columns

ClickHouse Function

ClickHouse Plus Function

ClickHouse Function

ClickHouse Query Execution

AST

Logical Query Plan

Physical Query Plan

JIT compilation

JIT Compilation

JIT Compilation

ClickHouse JIT Compilation

ClickHouse compilation of expressions

Expressions Evaluation

ExpressionTransform

Expressions Evaluation Problems

Expressions Compilation Algorithm

Expressions DAG Without JIT

Expressions DAG With JIT

How JIT can help

Which Functions Can Be Compiled

JIT Expressions Compilation Example

JIT Expressions Compilation Example

JIT Expressions Compilation Example

JIT Compilation Costs

JIT Expressions Compilation Results

JIT Expressions Compilation Results

JIT Expressions Compilation Results

JIT Expressions Compilation Results

JIT Expressions Compilation Results

JIT Expressions Compilation Results

JIT Performance CPU Cycles

JIT Performance CPU Cache Misses

JIT Performance CPU Branch Misses

ClickHouse compilation of operator GROUP BY

ClickHouse Aggregation

ClickHouse Aggregation

Aggregate Function Interface

Aggregator Functions Interaction

Aggregation Problems

JIT Compilation Strategy

Which functions can be compiled

JIT Aggregation Compilation Example

JIT Aggregation Compilation Results

JIT Aggregation Compilation Results

JIT Aggregation Compilation Results

JIT Aggregation Compilation Results

JIT Performance CPU Cycles

JIT Performance CPU Cache Misses

JIT Performance CPU Branch Misses

ClickHouse compilation of ORDER BY comparator

ClickHouse Sorting

ClickHouse Sorting

ClickHouse Sorting

JIT Comparator Compilation

JIT Comparator Compilation Results

JIT Comparator Compilation Results

JIT Comparator Compilation Results

JIT Comparator Compilation Results

Results

Questions?