hudi_table_related_tools
本页目录
hudi_clean#
The function hudi_clean(in_reloid oid, strong_lock boolean)
is used to clean up Hudi tables.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
— |
void |
无返回值 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要清理的表的 oid |
|
boolean |
是否使用强锁进行清理操作 |
Function举例说明#
db=# select oushu_toolkit.hudi_clean('public.tm'::regclass, true);
hudi_clean
------------
(1 row)
hudi_clean#
The function hudi_clean(in_reloid oid)
is used to clean up Hudi tables with a strong lock by default.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
— |
void |
无返回值 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要清理的表的 oid |
Function举例说明#
db=# select oushu_toolkit.hudi_clean('public.tm'::regclass);
hudi_clean
------------
(1 row)
hudi_archive#
The function hudi_archive(in_reloid oid, wait_for_lock boolean)
is used to archive Hudi tables.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
— |
void |
无返回值 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要归档的表的 oid |
|
boolean |
是否等待锁进行归档操作 |
Function举例说明#
db=# select oushu_toolkit.hudi_archive('public.tm'::regclass, true);
hudi_archive
--------------
(1 row)
hudi_archive#
The function hudi_archive(in_reloid oid)
is used to archive Hudi tables with wait_for_lock
set to true
by default.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
— |
void |
无返回值 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要归档的表的 oid |
Function举例说明#
db=# select oushu_toolkit.hudi_archive('public.tm'::regclass);
hudi_archive
--------------
(1 row)
get_hudi_table_path#
The function get_hudi_table_path(in_reloid oid)
is used to obtain the file system path of a Hudi table.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
(unnamed) |
cstring |
Hudi 表的文件路径 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要查询的表的 oid |
Function举例说明#
db=# select oushu_toolkit.get_hudi_table_path('public.tm'::regclass);
get_hudi_table_path
-----------------------------------------------------------
hdfs://localhost:9000/oushu/dfs_default/16385/16388/17409
(1 row)
get_hudi_files_snapshot#
The function get_hudi_files_snapshot(in_reloid oid, partition_path cstring, is_base boolean)
provides a snapshot of Hudi table files, including file type, file name, and file size.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
|
text |
文件类型(如 data、log 等) |
|
|
text |
文件名 |
|
|
bigint |
文件大小(字节数) |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要查询的表的 oid |
|
cstring |
分区路径(可指定分区,若为空则查询全部) |
|
boolean |
是否只查询 base 文件(true 只查 base 文件,false 查所有文件) |
Function举例说明#
db=# select * from oushu_toolkit.get_hudi_files_snapshot('public.tm'::regclass, '', false);
file_type | file_name | file_size
------------+---------------------------+-----------
properties | .hoodie/hoodie.properties | 944
(1 row)
get_hudi_timeline#
The function get_hudi_timeline(in_reloid oid, partition_path cstring, is_base boolean)
provides the timeline information of Hudi table files, including file type, file name, and file size.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
|
text |
文件类型(如 data、log 等) |
|
|
text |
文件名 |
|
|
bigint |
文件大小(字节数) |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要查询的表的 oid |
|
cstring |
分区路径(可指定分区,若为空则查询全部) |
|
boolean |
是否只查询 base 文件(true 只查 base 文件,false 查所有文件) |
Function举例说明#
db=# select * from oushu_toolkit.get_hudi_timeline('public.tm'::regclass, '', false);
file_type | file_name | file_size
-----------+---------------------------------------+-----------
timeline | .hoodie/20250522115923629.deltacommit | 989
timeline | .hoodie/20250522115924426.deltacommit | 988
(2 rows)
show_hudi_files#
The function show_hudi_files(in_reloid oid)
provides detailed information about the files of a Hudi table, including partition, file ID, base instant, file name, file size, insert/update/delete counts, latest operation time, and status flags.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
|
text |
分区路径 |
|
|
text |
文件唯一标识 |
|
|
text |
文件基线时间戳 |
|
|
text |
文件名 |
|
|
bigint |
文件大小(字节数) |
|
|
bigint |
插入记录数 |
|
|
bigint |
更新记录数 |
|
|
bigint |
删除记录数 |
|
|
text |
最近操作时间 |
|
|
boolean |
是否已被替换 |
|
|
boolean |
是否已被清理 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要查询的表的 oid |
Function举例说明#
db=# select * from oushu_toolkit.show_hudi_files('public.tm'::regclass);
partion | file_id | base_instant | file_name | file_size | num_insert | num_update | num_delete | latest_time | is_replaced | is_cleaned
---------+--------------------------------------+-------------------+-----------------------------------------------------------------------+-----------+------------+------------+------------+-------------------+-------------+------------
| 49f53621-89b4-4e88-8fc8-4021a196d05b | 20250522115924426 | 49f53621-89b4-4e88-8fc8-4021a196d05b_2-0-1_20250522115924426.parquet | 1561 | 1 | 0 | 0 | 20250522115924426 | f | f
| 5c71a4bf-b1c8-45fb-bec6-52ed0e3ab594 | 20250522115923629 | 5c71a4bf-b1c8-45fb-bec6-52ed0e3ab594_10-0-1_20250522115923629.parquet | 1561 | 1 | 0 | 0 | 20250522115923629 | f | f
(2 rows)
hudi_file_contents#
The function hudi_file_contents(in_reloid oid, file_path cstring)
returns the contents of a specified Hudi file as a set of text lines.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
(unnamed) |
text |
文件内容的每一行 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要查询的表的 oid |
|
cstring |
需要读取的文件路径 |
Function举例说明#
db=# select * from oushu_toolkit.hudi_file_contents('public.tm'::regclass, '/data/hudi/warehouse/public/tm/part-1');
hudi_file_contents
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
"_hoodie_commit_time":20250522115924426,"_hoodie_commit_seqno":null,"_hoodie_record_key":KcVJ2bZACAAAAAAA,"_hoodie_partition_path":,"_hoodie_file_name":null,"id":2
(1 row)
get_hudi_table_partitions_by_oid#
The function get_hudi_table_partitions_by_oid(in_reloid oid)
returns the partition information for a given Hudi table oid.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
|
text |
分区表名 |
|
|
text |
分区键 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要查询的表的 oid |
Function举例说明#
db=# select * from oushu_toolkit.get_hudi_table_partitions_by_oid('public.tm'::regclass);
SELECT * FROM oushu_toolkit.get_hudi_table_partitions_by_oid('public.tm'::regclass);
relname | partkey
---------+---------
tm |
(1 row)
hudi_table_partitions#
The function hudi_table_partitions(in_reloid oid)
provides the information of hudi partition tables.
Function返回值说明#
column |
type |
references |
description |
---|---|---|---|
|
text |
分区表的 schema 名称 |
|
|
text |
分区表的表名 |
|
|
text |
分区键 |
|
|
text |
分区值 |
Function入参值说明#
args |
type |
description |
---|---|---|
|
oid |
需要查询的表的 oid |
Function举例说明#
db=# select * from oushu_toolkit.hudi_table_partitions('public.tm'::regclass);
partitionschamename | partitiontablename | partitionkey | partitionname
---------------------+--------------------+--------------+---------------
public | tm | c1/c2 | 1069/1069
public | tm | c1/c2 | 3570/3570
public | tm | c1/c2 | 1966/1966
public | tm | c1/c2 | 2749/2749
public | tm | c1/c2 | 2603/2603
(5 rows)