ClickHouse Alter Table 执行流程

2021-07-30 02:40:58 浏览数 (1)

StorageReplicatedMergeTree::alter

  1. 判断是不是 `MODIFY SETTING`, 如果是 `MODIFY SETTING`, 那么只会本地执行,不会 `Replicated` 地执行。
代码语言:txt复制
if (params.isSettingsAlter())
{
    lockStructureExclusively(table_lock_holder, query_context.getCurrentQueryId());
    /// We don't replicate storage_settings_ptr ALTER. It's local operation.
    /// Also we don't upgrade alter lock to table structure lock.
    StorageInMemoryMetadata metadata = getInMemoryMetadata();
    params.apply(metadata);


    changeSettings(metadata.settings_ast, table_lock_holder);

    global_context.getDatabase(table_id.database_name)->alterTable(query_context, table_id.table_name, metadata);
    return;
}
  1. 判断表是否是 is_readonly. 如果是, 那么不能执行 ALTER TABLE 操作。
代码语言:txt复制
if (is_readonly)
    throw Exception("Can't ALTER readonly table", ErrorCodes::TABLE_IS_READ_ONLY);
```

注意,通过这里可以看到, 如果表处于 READ ONLY 的状态下,还是可以执行 ALTER TABLE MODIFY SETTING 的, 因为这个的执行是在判断 READ ONLY 之前

  1. 根据内存中的 metadata, 构造新的 metadata
代码语言:txt复制
ReplicatedMergeTreeTableMetadata future_metadata_in_zk(*this);
if (ast_to_str(future_metadata.order_by_ast) != ast_to_str(current_metadata.order_by_ast))
    future_metadata_in_zk.sorting_key = serializeAST(*extractKeyExpressionList(future_metadata.order_by_ast));

if (ast_to_str(future_metadata.ttl_for_table_ast) != ast_to_str(current_metadata.ttl_for_table_ast))
    future_metadata_in_zk.ttl_table = serializeAST(*future_metadata.ttl_for_table_ast);

String new_indices_str = future_metadata.indices.toString();
if (new_indices_str != current_metadata.indices.toString())
    future_metadata_in_zk.skip_indices = new_indices_str;

String new_constraints_str = future_metadata.constraints.toString();
if (new_constraints_str != current_metadata.constraints.toString())
    future_metadata_in_zk.constraints = new_constraints_str;

比较有意思的是,在 /clickhouse/on_time/tables/ontime_local/1/metadata 中的信息:

代码语言:txt复制
void ReplicatedMergeTreeTableMetadata::read(ReadBuffer & in)
{
    in >> "metadata format version: 1n";
    in >> "date column: " >> date_column >> "n";
    in >> "sampling expression: " >> sampling_expression >> "n";
    in >> "index granularity: " >> index_granularity >> "n";
    in >> "mode: " >> merging_params_mode >> "n";
    in >> "sign column: " >> sign_column >> "n";
    in >> "primary key: " >> primary_key >> "n";

Partition Key 还是被称之为 data column 的。

  1. 计算需要创建/修改哪些 ZooKeeper 上的节点:
代码语言:txt复制
1) `/metadata`
代码语言:txt复制
2) `/columns`
代码语言:txt复制
3)  `/log/log-`
代码语言:txt复制
4) 如果 `have_mutation`, `/mutations`
  1. 请求 ZK int32_t rc = zookeeper->tryMulti(ops, results);
  2. 根据 replication_alter_partitions_sync 的设置,判断是否等待其他节点处理完 LOG ENTRY
代码语言:txt复制
if (query_context.getSettingsRef().replication_alter_partitions_sync == 2)
{
    LOG_DEBUG(log, "Updated shared metadata nodes in ZooKeeper. Waiting for replicas to apply changes.");
    unwaited = waitForAllReplicasToProcessLogEntry(*alter_entry, false);
}
else if (query_context.getSettingsRef().replication_alter_partitions_sync == 1)
{
    LOG_DEBUG(log, "Updated shared metadata nodes in ZooKeeper. Waiting for replicas to apply changes.");
    waitForReplicaToProcessLogEntry(replica_name, *alter_entry);
}
  1. 如果是 mutation, 还要判断是否需要等待 mutation 执行完成
代码语言:txt复制
if (mutation_znode)
{
    LOG_DEBUG(log, "Metadata changes applied. Will wait for data changes.");
    waitMutation(*mutation_znode, query_context.getSettingsRef().replication_alter_partitions_sync);
    LOG_DEBUG(log, "Data changes applied.");
}
  1. 比较有趣的是, 哪些操作是属于 mutation, 会导致 上面 7 上的判断逻辑被触发呢? 从 src/Storage/MutationCommands.h 中可以看出:
代码语言:txt复制
enum Type
{
    EMPTY,     /// Not used.
    DELETE,
    UPDATE,
    MATERIALIZE_INDEX,
    READ_COLUMN,
    DROP_COLUMN,
    DROP_INDEX,
    MATERIALIZE_TTL
};

比较奇怪的是, 在 ALTER TABLE 这里, 是用 replication_alter_partitions_sync 来控制 是否同步执行 ALTER, 而不是 mutations_sync. 这里也在官方提了一个 issue: https://github.com/ClickHouse/ClickHouse/issues/21821

0 人点赞