Apache Hudi 0.12.2发布

2023-01-12 11:27:17 浏览数 (1)

长期支持版本

我们的目标是维护 0.12 更长时间,并通过最新的 0.12.x 版本提供稳定版本供用户迁移。 此版本 (0.12.2) 是最新的 0.12 版本。

迁移指南

此版本 (0.12.2) 没有引入任何新的表版本,因此如果您使用的是 0.12.0,则无需迁移。

如果从旧版本迁移,请查看之前发行说明中的迁移指南,特别是0.6.0, 0.9.0, 0.10.0, 0.11.0, and 0.12.0.中的升级说明。

bug修复

0.12.2 版本主要用于错误修复和稳定性。 这些修复跨越许多组件,包括

  • DeltaStreamer
  • 数据类型/模式相关的错误修复
  • Table服务
  • 元数据表
  • Spark SQL
  • Presto 稳定性/性能修复
  • Trino 稳定性/性能修复
  • 元同步
  • Flink 引擎
  • 单元、功能、集成测试和 CI

Release Notes

Sub-task

  • [HUDI-5244] – Fix bugs in schema evolution client with lost operation field and not found schema

Bug

  • [HUDI-3453] – Metadata table throws NPE when scheduling compaction plan
  • [HUDI-3661] – Flink async compaction is not thread safe when use watermark
  • [HUDI-4281] – Using hudi to build a large number of tables in spark on hive causes OOM
  • [HUDI-4588] – Ingestion failing if source column is dropped
  • [HUDI-4855] – Bootstrap table from Deltastreamer cannot be read in Spark
  • [HUDI-4893] – More than 1 splits are created for a single log file for MOR table
  • [HUDI-4898] – for mor table, presto/hive shoud respect payload class during merge parquet file and log file
  • [HUDI-4901] – Add avro version to Flink profiles
  • [HUDI-4946] – merge into with no preCombineField has dup row in only insert
  • [HUDI-4952] – Reading from metadata table could fail when there are no completed commits
  • [HUDI-4966] – Meta sync throws exception if TimestampBasedKeyGenerator is used to generate partition path containing slashes
  • [HUDI-4971] – aws bundle causes class loading issue
  • [HUDI-4975] – datahub sync bundle causes class loading issue
  • [HUDI-4998] – Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work
  • [HUDI-5003] – InLineFileSystem will throw NumberFormatException, cause the type of startOffset is int and out of bounds
  • [HUDI-5007] – Prevent Hudi from reading the entire timeline's when performing a LATEST streaming read
  • [HUDI-5008] – Avoid unset HoodieROTablePathFilter in IncrementalRelation
  • [HUDI-5025] – Rollback failed with log file not found when rollOver in rollback process
  • [HUDI-5041] – lock metric register confict error
  • [HUDI-5057] – Fix msck repair hudi table
  • [HUDI-5058] – The primary key cannot be empty when Flink reads an error from the hudi table
  • [HUDI-5061] – bulk insert operation don't throw other exception except IOE Exception
  • [HUDI-5063] – totalScantime and other run time stats missing from commit metadata
  • [HUDI-5070] – Fix Flaky TestCleaner test : testInsertAndCleanByCommits
  • [HUDI-5076] – Non serializable path used with engineContext with metadata table initialization
  • [HUDI-5087] – Max value read from metatable incorrect
  • [HUDI-5088] – Failed to synchronize the hive metadata of the Flink table
  • [HUDI-5092] – Querying Hudi table throws NoSuchMethodError in Databricks runtime
  • [HUDI-5096] – boolean param is broken in HiveSyncTool
  • [HUDI-5097] – Read 0 records from partitioned table without partition fields in table configs
  • [HUDI-5151] – Flink data skipping doesn't work with ClassNotFoundException of InLineFileSystem
  • [HUDI-5157] – Duplicate partition path for chained hudi tables.
  • [HUDI-5163] – Failure handling w/ spark ds write failures
  • [HUDI-5176] – Incremental source may miss commits if there are inflight commits before completed commits
  • [HUDI-5185] – Compaction run fails with –hoodieConfigs
  • [HUDI-5203] – Debezium payload does not handle null-field cases
  • [HUDI-5228] – Flink table service job fs view conf overwrites the one of writing job
  • [HUDI-5242] – Do not fail Meta sync in Deltastreamer when inline table service fails
  • [HUDI-5251] – Unexpected avro dependency in flink 1.15 bundle
  • [HUDI-5253] – HoodieMergeOnReadTableInputFormat could have duplicate records issue if it contains delta files while still splittable
  • [HUDI-5260] – Insert into sql with strict insert mode and no preCombineField should not overwrite existing records
  • [HUDI-5277] – RunClusteringProcedure can't exit corretly
  • [HUDI-5286] – UnsupportedOperationException throws when enabling filesystem retry
  • [HUDI-5291] – NPE in collumn stats for null values
  • [HUDI-5320] – Spark SQL CTAS does not propagate Table properties to actual SparkSqlWriter
  • [HUDI-5325] – Fix Create Table to propagate properly Metadata Table enabling config
  • [HUDI-5336] – Fix log file parsing to consider "." at the beginning
  • [HUDI-5346] – Fixing performance traps in CTAS
  • [HUDI-5347] – Fix Merge Into performance traps
  • [HUDI-5350] – oom cause compaction event lost
  • [HUDI-5351] – Handle meta fields being disabled in Bulk Insert Partitioners
  • [HUDI-5373] – Different fileids are assigned to the same bucket
  • [HUDI-5375] – Fix re-using of file readers w/ metadata table in FileIndex
  • [HUDI-5393] – Remove the reuse of metadata table writer for flink write client
  • [HUDI-5403] – Input Format class has metadata table enabled for file listing unexpectedly by default
  • [HUDI-5409] – Avoid file index and use fs view cache in COW input format
  • [HUDI-5412] – Send the boostrap event if the JM also rebooted

Improvement

  • [HUDI-4526] – improve spillableMapBasePath disk directory is full
  • [HUDI-4799] – improve analyzer exception tip when can not resolve expression
  • [HUDI-4960] – Upgrade Jetty version for Timeline server
  • [HUDI-4980] – Make avg record size calculated based on commit instant only
  • [HUDI-4995] – Dependency conflicts on apache http with other projects
  • [HUDI-4997] – use jackson-v2 replace jackson-v1 import
  • [HUDI-5002] – Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement
  • [HUDI-5027] – Replace hardcoded hbase config keys with HbaseConstants
  • [HUDI-5045] – Add tests to integ test to test bulk_insert followed by upsert
  • [HUDI-5066] – Support hoodie source metaclient cache for flink planner
  • [HUDI-5102] – source operator(monitor and reader) support user uid
  • [HUDI-5104] – Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter
  • [HUDI-5111] – Add metadata on read support to integ tests
  • [HUDI-5184] – Remove export PYSPARK_SUBMIT_ARGS="–master local*" from HoodiePySparkQuickstart.py
  • [HUDI-5247] – Clean up java client tests
  • [HUDI-5296] – Support disabling schema on read if not required
  • [HUDI-5338] – Adjust coalesce behavior within "NONE" sort mode for bulk insert
  • [HUDI-5344] – Upgrade com.google.protobuf:protobuf-java
  • [HUDI-5345] – Avoid fs.exists calls for metadata table in HFileBootstrapIndex
  • [HUDI-5348] – Cache file slices within MDT reader
  • [HUDI-5357] – Optimize release artifacts' deployment
  • [HUDI-5370] – Properly close file handles for Metadata writer

Test

  • [HUDI-5383] – Test 0.12.2 release branch

Task

  • [HUDI-3287] – Remove unnecessary deps in hudi-kafka-connect
  • [HUDI-5081] – Resources clean-up in hudi-utilities tests
  • [HUDI-5221] – Make the decision for flink sql bucket index case-insensitive
  • [HUDI-5223] – Partial failover for flink
  • [HUDI-5227] – Upgrade Jetty to 9.4.48

本文为从大数据到人工智能博主「xiaozhch5」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。

原文链接:https://cloud.tencent.com/developer/article/2208628

0 人点赞