# hudi_table_related_tools # hudi_clean The function `hudi_clean(in_reloid oid, strong_lock boolean)` is used to clean up Hudi tables. ## Function返回值说明 | column | type | references | description | | :----- | :--- | :--------- | :---------- | | — | void | | 无返回值 | ## Function入参值说明 | args | type | description | | :------------- | :------ | :------------------------- | | `in_reloid` | oid | 需要清理的表的 oid | | `strong_lock` | boolean | 是否使用强锁进行清理操作 | ## Function举例说明 ``` db=# select oushu_toolkit.hudi_clean('public.tm'::regclass, true); hudi_clean ------------ (1 row) ``` # hudi_clean The function `hudi_clean(in_reloid oid)` is used to clean up Hudi tables with a strong lock by default. ## Function返回值说明 | column | type | references | description | | :----- | :--- | :--------- | :---------- | | — | void | | 无返回值 | ## Function入参值说明 | args | type | description | | :---------- | :----| :-----------------| | `in_reloid` | oid | 需要清理的表的 oid | ## Function举例说明 ``` db=# select oushu_toolkit.hudi_clean('public.tm'::regclass); hudi_clean ------------ (1 row) ``` # hudi_archive The function `hudi_archive(in_reloid oid, wait_for_lock boolean)` is used to archive Hudi tables. ## Function返回值说明 | column | type | references | description | | :----- | :--- | :--------- | :---------- | | — | void | | 无返回值 | ## Function入参值说明 | args | type | description | | :-------------- | :------ | :------------------------- | | `in_reloid` | oid | 需要归档的表的 oid | | `wait_for_lock` | boolean | 是否等待锁进行归档操作 | ## Function举例说明 ``` db=# select oushu_toolkit.hudi_archive('public.tm'::regclass, true); hudi_archive -------------- (1 row) ``` # hudi_archive The function `hudi_archive(in_reloid oid)` is used to archive Hudi tables with `wait_for_lock` set to `true` by default. ## Function返回值说明 | column | type | references | description | | :----- | :--- | :--------- | :---------- | | — | void | | 无返回值 | ## Function入参值说明 | args | type | description | | :---------- | :----| :-----------------| | `in_reloid` | oid | 需要归档的表的 oid | ## Function举例说明 ``` db=# select oushu_toolkit.hudi_archive('public.tm'::regclass); hudi_archive -------------- (1 row) ``` # get_hudi_table_path The function `get_hudi_table_path(in_reloid oid)` is used to obtain the file system path of a Hudi table. ## Function返回值说明 | column | type | references | description | | :------------- | :------ | :--------- | :------------------ | | (unnamed) | cstring | | Hudi 表的文件路径 | ## Function入参值说明 | args | type | description | | :---------- | :----| :-----------------| | `in_reloid` | oid | 需要查询的表的 oid | ## Function举例说明 ``` db=# select oushu_toolkit.get_hudi_table_path('public.tm'::regclass); get_hudi_table_path ----------------------------------------------------------- hdfs://localhost:9000/oushu/dfs_default/16385/16388/17409 (1 row) ``` # get_hudi_files_snapshot The function `get_hudi_files_snapshot(in_reloid oid, partition_path cstring, is_base boolean)` provides a snapshot of Hudi table files, including file type, file name, and file size. ## Function返回值说明 | column | type | references | description | | :---------- | :------| :--------- | :------------------ | | `file_type` | text | | 文件类型(如 data、log 等) | | `file_name` | text | | 文件名 | | `file_size` | bigint | | 文件大小(字节数) | ## Function入参值说明 | args | type | description | | :-------------- | :------ | :------------------------- | | `in_reloid` | oid | 需要查询的表的 oid | | `partition_path`| cstring | 分区路径(可指定分区,若为空则查询全部) | | `is_base` | boolean | 是否只查询 base 文件(true 只查 base 文件,false 查所有文件) | ## Function举例说明 ``` db=# select * from oushu_toolkit.get_hudi_files_snapshot('public.tm'::regclass, '', false); file_type | file_name | file_size ------------+---------------------------+----------- properties | .hoodie/hoodie.properties | 944 (1 row) ``` # get_hudi_timeline The function `get_hudi_timeline(in_reloid oid, partition_path cstring, is_base boolean)` provides the timeline information of Hudi table files, including file type, file name, and file size. ## Function返回值说明 | column | type | references | description | | :---------- | :------| :--------- | :------------------ | | `file_type` | text | | 文件类型(如 data、log 等) | | `file_name` | text | | 文件名 | | `file_size` | bigint | | 文件大小(字节数) | ## Function入参值说明 | args | type | description | | :-------------- | :------ | :------------------------------------------ | | `in_reloid` | oid | 需要查询的表的 oid | | `partition_path`| cstring | 分区路径(可指定分区,若为空则查询全部) | | `is_base` | boolean | 是否只查询 base 文件(true 只查 base 文件,false 查所有文件) | ## Function举例说明 ``` db=# select * from oushu_toolkit.get_hudi_timeline('public.tm'::regclass, '', false); file_type | file_name | file_size -----------+---------------------------------------+----------- timeline | .hoodie/20250522115923629.deltacommit | 989 timeline | .hoodie/20250522115924426.deltacommit | 988 (2 rows) ``` # show_hudi_files The function `show_hudi_files(in_reloid oid)` provides detailed information about the files of a Hudi table, including partition, file ID, base instant, file name, file size, insert/update/delete counts, latest operation time, and status flags. ## Function返回值说明 | column | type | references | description | | :------------- | :------ | :--------- | :--------------------------- | | `partion` | text | | 分区路径 | | `file_id` | text | | 文件唯一标识 | | `base_instant` | text | | 文件基线时间戳 | | `file_name` | text | | 文件名 | | `file_size` | bigint | | 文件大小(字节数) | | `num_insert` | bigint | | 插入记录数 | | `num_update` | bigint | | 更新记录数 | | `num_delete` | bigint | | 删除记录数 | | `latest_time` | text | | 最近操作时间 | | `is_replaced` | boolean | | 是否已被替换 | | `is_cleaned` | boolean | | 是否已被清理 | ## Function入参值说明 | args | type | description | | :---------- | :----| :-----------------| | `in_reloid` | oid | 需要查询的表的 oid | ## Function举例说明 ``` db=# select * from oushu_toolkit.show_hudi_files('public.tm'::regclass); partion | file_id | base_instant | file_name | file_size | num_insert | num_update | num_delete | latest_time | is_replaced | is_cleaned ---------+--------------------------------------+-------------------+-----------------------------------------------------------------------+-----------+------------+------------+------------+-------------------+-------------+------------ | 49f53621-89b4-4e88-8fc8-4021a196d05b | 20250522115924426 | 49f53621-89b4-4e88-8fc8-4021a196d05b_2-0-1_20250522115924426.parquet | 1561 | 1 | 0 | 0 | 20250522115924426 | f | f | 5c71a4bf-b1c8-45fb-bec6-52ed0e3ab594 | 20250522115923629 | 5c71a4bf-b1c8-45fb-bec6-52ed0e3ab594_10-0-1_20250522115923629.parquet | 1561 | 1 | 0 | 0 | 20250522115923629 | f | f (2 rows) ``` # hudi_file_contents The function `hudi_file_contents(in_reloid oid, file_path cstring)` returns the contents of a specified Hudi file as a set of text lines. ## Function返回值说明 | column | type | references | description | | :----- | :----| :--------- | :---------------| | (unnamed) | text | | 文件内容的每一行 | ## Function入参值说明 | args | type | description | | :----------- | :------ | :--------------------- | | `in_reloid` | oid | 需要查询的表的 oid | | `file_path` | cstring | 需要读取的文件路径 | ## Function举例说明 ``` db=# select * from oushu_toolkit.hudi_file_contents('public.tm'::regclass, '/data/hudi/warehouse/public/tm/part-1'); hudi_file_contents --------------------------------------------------------------------------------------------------------------------------------------------------------------------- "_hoodie_commit_time":20250522115924426,"_hoodie_commit_seqno":null,"_hoodie_record_key":KcVJ2bZACAAAAAAA,"_hoodie_partition_path":,"_hoodie_file_name":null,"id":2 (1 row) ``` # get_hudi_table_partitions_by_oid The function `get_hudi_table_partitions_by_oid(in_reloid oid)` returns the partition information for a given Hudi table oid. ## Function返回值说明 | column | type | references | description | | :------- | :----| :--------- | :---------------| | `relname`| text | | 分区表名 | | `partkey`| text | | 分区键 | ## Function入参值说明 | args | type | description | | :---------- | :----| :-----------------| | `in_reloid` | oid | 需要查询的表的 oid | ## Function举例说明 ``` db=# select * from oushu_toolkit.get_hudi_table_partitions_by_oid('public.tm'::regclass); SELECT * FROM oushu_toolkit.get_hudi_table_partitions_by_oid('public.tm'::regclass); relname | partkey ---------+--------- tm | (1 row) ``` # hudi_table_partitions The function `hudi_table_partitions(in_reloid oid)` provides the information of hudi partition tables. ## Function返回值说明 | column | type | references | description | | :-------------------- | :------ | :--------- | :----------------------- | | `partitionschamename` | text | | 分区表的 schema 名称 | | `partitiontablename` | text | | 分区表的表名 | | `partitionkey` | text | | 分区键 | | `partitionname` | text | | 分区值 | ## Function入参值说明 | args | type | description | | :---------- | :------ | :-----------------| | `in_reloid` | oid | 需要查询的表的 oid | ## Function举例说明 ``` db=# select * from oushu_toolkit.hudi_table_partitions('public.tm'::regclass); partitionschamename | partitiontablename | partitionkey | partitionname ---------------------+--------------------+--------------+--------------- public | tm | c1/c2 | 1069/1069 public | tm | c1/c2 | 3570/3570 public | tm | c1/c2 | 1966/1966 public | tm | c1/c2 | 2749/2749 public | tm | c1/c2 | 2603/2603 (5 rows) ```