hudi_clean#

The function hudi_clean(in_reloid oid, strong_lock boolean) is used to clean up Hudi tables.

Function返回值说明#

column

type

references

description

void

无返回值

Function入参值说明#

args

type

description

in_reloid

oid

需要清理的表的 oid

strong_lock

boolean

是否使用强锁进行清理操作

Function举例说明#

db=# select oushu_toolkit.hudi_clean('public.tm'::regclass, true);
 hudi_clean 
------------
 
(1 row)

hudi_clean#

The function hudi_clean(in_reloid oid) is used to clean up Hudi tables with a strong lock by default.

Function返回值说明#

column

type

references

description

void

无返回值

Function入参值说明#

args

type

description

in_reloid

oid

需要清理的表的 oid

Function举例说明#

db=# select oushu_toolkit.hudi_clean('public.tm'::regclass);
 hudi_clean 
------------
 
(1 row)

hudi_archive#

The function hudi_archive(in_reloid oid, wait_for_lock boolean) is used to archive Hudi tables.

Function返回值说明#

column

type

references

description

void

无返回值

Function入参值说明#

args

type

description

in_reloid

oid

需要归档的表的 oid

wait_for_lock

boolean

是否等待锁进行归档操作

Function举例说明#

db=# select oushu_toolkit.hudi_archive('public.tm'::regclass, true);
 hudi_archive 
--------------
 
(1 row)

hudi_archive#

The function hudi_archive(in_reloid oid) is used to archive Hudi tables with wait_for_lock set to true by default.

Function返回值说明#

column

type

references

description

void

无返回值

Function入参值说明#

args

type

description

in_reloid

oid

需要归档的表的 oid

Function举例说明#

db=# select oushu_toolkit.hudi_archive('public.tm'::regclass);
 hudi_archive 
--------------
 
(1 row)

get_hudi_table_path#

The function get_hudi_table_path(in_reloid oid) is used to obtain the file system path of a Hudi table.

Function返回值说明#

column

type

references

description

(unnamed)

cstring

Hudi 表的文件路径

Function入参值说明#

args

type

description

in_reloid

oid

需要查询的表的 oid

Function举例说明#

db=# select oushu_toolkit.get_hudi_table_path('public.tm'::regclass);
                    get_hudi_table_path                    
-----------------------------------------------------------
 hdfs://localhost:9000/oushu/dfs_default/16385/16388/17409
(1 row)

get_hudi_files_snapshot#

The function get_hudi_files_snapshot(in_reloid oid, partition_path cstring, is_base boolean) provides a snapshot of Hudi table files, including file type, file name, and file size.

Function返回值说明#

column

type

references

description

file_type

text

文件类型(如 data、log 等)

file_name

text

文件名

file_size

bigint

文件大小(字节数)

Function入参值说明#

args

type

description

in_reloid

oid

需要查询的表的 oid

partition_path

cstring

分区路径(可指定分区,若为空则查询全部)

is_base

boolean

是否只查询 base 文件(true 只查 base 文件,false 查所有文件)

Function举例说明#

db=# select * from oushu_toolkit.get_hudi_files_snapshot('public.tm'::regclass, '', false);
 file_type  |         file_name         | file_size 
------------+---------------------------+-----------
 properties | .hoodie/hoodie.properties |       944
(1 row)

get_hudi_timeline#

The function get_hudi_timeline(in_reloid oid, partition_path cstring, is_base boolean) provides the timeline information of Hudi table files, including file type, file name, and file size.

Function返回值说明#

column

type

references

description

file_type

text

文件类型(如 data、log 等)

file_name

text

文件名

file_size

bigint

文件大小(字节数)

Function入参值说明#

args

type

description

in_reloid

oid

需要查询的表的 oid

partition_path

cstring

分区路径(可指定分区,若为空则查询全部)

is_base

boolean

是否只查询 base 文件(true 只查 base 文件,false 查所有文件)

Function举例说明#

db=# select * from oushu_toolkit.get_hudi_timeline('public.tm'::regclass, '', false);
 file_type |               file_name               | file_size 
-----------+---------------------------------------+-----------
 timeline  | .hoodie/20250522115923629.deltacommit |       989
 timeline  | .hoodie/20250522115924426.deltacommit |       988
(2 rows)

show_hudi_files#

The function show_hudi_files(in_reloid oid) provides detailed information about the files of a Hudi table, including partition, file ID, base instant, file name, file size, insert/update/delete counts, latest operation time, and status flags.

Function返回值说明#

column

type

references

description

partion

text

分区路径

file_id

text

文件唯一标识

base_instant

text

文件基线时间戳

file_name

text

文件名

file_size

bigint

文件大小(字节数)

num_insert

bigint

插入记录数

num_update

bigint

更新记录数

num_delete

bigint

删除记录数

latest_time

text

最近操作时间

is_replaced

boolean

是否已被替换

is_cleaned

boolean

是否已被清理

Function入参值说明#

args

type

description

in_reloid

oid

需要查询的表的 oid

Function举例说明#

db=# select * from oushu_toolkit.show_hudi_files('public.tm'::regclass);
 partion |               file_id                |   base_instant    |                               file_name                               | file_size | num_insert | num_update | num_delete |    latest_time    | is_replaced | is_cleaned 
---------+--------------------------------------+-------------------+-----------------------------------------------------------------------+-----------+------------+------------+------------+-------------------+-------------+------------
         | 49f53621-89b4-4e88-8fc8-4021a196d05b | 20250522115924426 | 49f53621-89b4-4e88-8fc8-4021a196d05b_2-0-1_20250522115924426.parquet  |      1561 |          1 |          0 |          0 | 20250522115924426 | f           | f
         | 5c71a4bf-b1c8-45fb-bec6-52ed0e3ab594 | 20250522115923629 | 5c71a4bf-b1c8-45fb-bec6-52ed0e3ab594_10-0-1_20250522115923629.parquet |      1561 |          1 |          0 |          0 | 20250522115923629 | f           | f
(2 rows)

hudi_file_contents#

The function hudi_file_contents(in_reloid oid, file_path cstring) returns the contents of a specified Hudi file as a set of text lines.

Function返回值说明#

column

type

references

description

(unnamed)

text

文件内容的每一行

Function入参值说明#

args

type

description

in_reloid

oid

需要查询的表的 oid

file_path

cstring

需要读取的文件路径

Function举例说明#

db=# select * from oushu_toolkit.hudi_file_contents('public.tm'::regclass, '/data/hudi/warehouse/public/tm/part-1');
                                                                         hudi_file_contents                                                                      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 "_hoodie_commit_time":20250522115924426,"_hoodie_commit_seqno":null,"_hoodie_record_key":KcVJ2bZACAAAAAAA,"_hoodie_partition_path":,"_hoodie_file_name":null,"id":2
(1 row)

get_hudi_table_partitions_by_oid#

The function get_hudi_table_partitions_by_oid(in_reloid oid) returns the partition information for a given Hudi table oid.

Function返回值说明#

column

type

references

description

relname

text

分区表名

partkey

text

分区键

Function入参值说明#

args

type

description

in_reloid

oid

需要查询的表的 oid

Function举例说明#

db=# select * from oushu_toolkit.get_hudi_table_partitions_by_oid('public.tm'::regclass);
SELECT * FROM oushu_toolkit.get_hudi_table_partitions_by_oid('public.tm'::regclass);
 relname | partkey 
---------+---------
 tm      | 
(1 row)

hudi_table_partitions#

The function hudi_table_partitions(in_reloid oid) provides the information of hudi partition tables.

Function返回值说明#

column

type

references

description

partitionschamename

text

分区表的 schema 名称

partitiontablename

text

分区表的表名

partitionkey

text

分区键

partitionname

text

分区值

Function入参值说明#

args

type

description

in_reloid

oid

需要查询的表的 oid

Function举例说明#

db=# select * from  oushu_toolkit.hudi_table_partitions('public.tm'::regclass);
 partitionschamename | partitiontablename | partitionkey | partitionname 
---------------------+--------------------+--------------+---------------
 public              | tm                 | c1/c2        | 1069/1069
 public              | tm                 | c1/c2        | 3570/3570
 public              | tm                 | c1/c2        | 1966/1966
 public              | tm                 | c1/c2        | 2749/2749
 public              | tm                 | c1/c2        | 2603/2603
(5 rows)