这个 postgres 函数是否具有成本效益还是仍然需要清理

发布于 2024-10-09 10:10:04 字数 3175 浏览 3 评论 0原文

postgres 数据库中有两个表。 english_all 和 english_glob

第一个表包含国际、机密、引导、冷却器等单词,

我已经编写了从 english_all 获取单词的函数,然后对每个单词执行 for 循环以获取未插入 anglish_glob 表中的单词列表。单词列表就像

I
In
Int
Inte
Inter
..
b
bo
boo
boot
..
c
co
coo
cool
etc..

由于某种原因在插入到 english_all 表期间添加了 zwnj(零宽度非连接器)。 但在函数中,我使用 regexp_replace 删除该字符。

Postgres 函数 for_loop_test 基于我从 english_all 表中选择单词而采用两个参数 min 和 max 。

函数代码就像

DECLARE
inMinLength ALIAS FOR $1;
inMaxLength ALIAS FOR $2;
mviews RECORD;
outenglishListRow english_word_list;--custom data type eng_id,english_text
BEGIN
FOR mviews IN SELECT id,english_all_text FROM english_all where wlength between inMinLength and inMaxLength 
ORDER BY english_all_text limit 30 LOOP

FOR i IN 1..char_length(regexp_replace(mviews.english_all_text,'(‌)$','')) LOOP
    FOR outenglishListRow IN
        SELECT distinct on (regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)$','')) mviews.id,
        regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)$','') where 
        regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)$','') not 
        in(select english_glob.english_text from english_glob where i=english_glob.wlength)
        order by regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)$','')
    LOOP
    RETURN NEXT outenglishListRow;
    END LOOP;


END LOOP;
END LOOP;
END;

一旦我得到单词列表,我会将其插入到另一个表 english_glob 中。 我的问题是我可以在函数中添加或删除任何东西以使其更加高效。

编辑 假设 english_all 表有像

  • footer、settle、question、overflow、database、kingdom

这样的单词,如果 inMinLength = 5 且 inmaxLength=7 然后在外循环的

  • footer,settle,kingdom

会被选中。 对于以上 3 个单词,内部两个循环将适用于获取

  • f,fo,foo,foot,foote,footer,s,se,set 等单词>,解决,解决 ....等等。

在最后的过程中,粗体单词将被输入到 english_glob 中,并带有另一个参数(如 1)以表示它是一个正确的单词,并存储在 english_glob 表的另一个字段中。剩余的单词将与另一个参数 0 一起存储,因为在下一次调用中不应再次提取保存在数据库中的单词。

编辑2: 完整代码

CREATE TABLE english_all
(
id serial NOT NULL,
english_all_text  text NOT NULL,
wlength integer NOT NULL,
CONSTRAINT english_all PRIMARY KEY (id),
CONSTRAINT english_all_kan_text_uq_id UNIQUE (english_all_text)
)

CREATE TABLE english_glob
(
id serial NOT NULL,
english_text  text NOT NULL,
is_prop integer default 1,
CONSTRAINT english_glob PRIMARY KEY (id),
CONSTRAINT english_glob_kan_text_uq_id UNIQUE (english_text)
)

insert into english_all(english_all_text,wlength) values ('ant',char_length('ant')),('forget',char_length('forget')),('forgive',char_length('forgive'));

这是带有参数 3 和 6 个后续行的函数调用的

a
an
ant
f
fo
for
forg
forge
forget

,接下来应该获取基于上述行的

insert into english_glob(english_text,is_prop) 
values 
('a',1),('an',1),
('ant',1),('f',0),
('fo',0),('for',1),
('forg',0),('forge',1),
('forget',1),

函数调用,下次应该获取参数 3 和 7 个后续行插入到另一个表。(因为 f,fo,for, forg 均已输入 english_glob 表中)

forgi
forgiv
forgive

屏幕截图

There are two tables in postgres db.
english_all and english_glob

First table contains words like international,confidential,booting,cooler ...etc

I have written the function to get the words from english_all then perform for loop for each word to get word list which are not inserted in anglish_glob table. Word list is like

I
In
Int
Inte
Inter
..
b
bo
boo
boot
..
c
co
coo
cool
etc..

for some reason zwnj(zero-width non-joiner) is added during insertion to english_all table.
But in function I am removing that character with regexp_replace.

Postgres function for_loop_test is taking two parameter min and max based on that I am selecting words from english_all table.

function code is like

DECLARE
inMinLength ALIAS FOR $1;
inMaxLength ALIAS FOR $2;
mviews RECORD;
outenglishListRow english_word_list;--custom data type eng_id,english_text
BEGIN
FOR mviews IN SELECT id,english_all_text FROM english_all where wlength between inMinLength and inMaxLength 
ORDER BY english_all_text limit 30 LOOP

FOR i IN 1..char_length(regexp_replace(mviews.english_all_text,'(‌)

Once I get the word list I will insert that into another table english_glob.
My question is is there any thing I can add to or remove from function to make it more efficient.

edit
Let assume english_all table have words like

  • footer,settle,question,overflow,database,kingdom

If inMinLength = 5 and inmaxLength=7
then in the outer loop

  • footer,settle,kingdom

will be selected.
For above 3 words inner two loop will apply to get words like

  • f,fo,foo,foot,foote,footer,s,se,set,sett,settl
    .... etc.

In the final process words which are bold will be entered into english_glob with another parameter like 1 to denote it is a proper word and stored in the another filed of english_glob table. Remaining word will be stored with another parameter 0 because in the next call words which are saved in database should not be fetched again.

edit2:
This is a complete code

CREATE TABLE english_all
(
id serial NOT NULL,
english_all_text  text NOT NULL,
wlength integer NOT NULL,
CONSTRAINT english_all PRIMARY KEY (id),
CONSTRAINT english_all_kan_text_uq_id UNIQUE (english_all_text)
)

CREATE TABLE english_glob
(
id serial NOT NULL,
english_text  text NOT NULL,
is_prop integer default 1,
CONSTRAINT english_glob PRIMARY KEY (id),
CONSTRAINT english_glob_kan_text_uq_id UNIQUE (english_text)
)

insert into english_all(english_all_text,wlength) values ('ant',char_length('ant')),('forget',char_length('forget')),('forgive',char_length('forgive'));

on function call with parameter 3 and 6 fallowing rows should fetched

a
an
ant
f
fo
for
forg
forge
forget

next is insert to another table based on above row

insert into english_glob(english_text,is_prop) 
values 
('a',1),('an',1),
('ant',1),('f',0),
('fo',0),('for',1),
('forg',0),('forge',1),
('forget',1),

on function call next time with parameter 3 and 7 fallowing rows should fetched.(because f,fo,for,forg are all entered in english_glob table)

forgi
forgiv
forgive

Screenshot

,'')) LOOP FOR outenglishListRow IN SELECT distinct on (regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)

Once I get the word list I will insert that into another table english_glob.
My question is is there any thing I can add to or remove from function to make it more efficient.

edit
Let assume english_all table have words like

  • footer,settle,question,overflow,database,kingdom

If inMinLength = 5 and inmaxLength=7
then in the outer loop

  • footer,settle,kingdom

will be selected.
For above 3 words inner two loop will apply to get words like

  • f,fo,foo,foot,foote,footer,s,se,set,sett,settl
    .... etc.

In the final process words which are bold will be entered into english_glob with another parameter like 1 to denote it is a proper word and stored in the another filed of english_glob table. Remaining word will be stored with another parameter 0 because in the next call words which are saved in database should not be fetched again.

edit2:
This is a complete code


on function call with parameter 3 and 6 fallowing rows should fetched


next is insert to another table based on above row


on function call next time with parameter 3 and 7 fallowing rows should fetched.(because f,fo,for,forg are all entered in english_glob table)


Screenshot

,'')) mviews.id, regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)

Once I get the word list I will insert that into another table english_glob.
My question is is there any thing I can add to or remove from function to make it more efficient.

edit
Let assume english_all table have words like

  • footer,settle,question,overflow,database,kingdom

If inMinLength = 5 and inmaxLength=7
then in the outer loop

  • footer,settle,kingdom

will be selected.
For above 3 words inner two loop will apply to get words like

  • f,fo,foo,foot,foote,footer,s,se,set,sett,settl
    .... etc.

In the final process words which are bold will be entered into english_glob with another parameter like 1 to denote it is a proper word and stored in the another filed of english_glob table. Remaining word will be stored with another parameter 0 because in the next call words which are saved in database should not be fetched again.

edit2:
This is a complete code


on function call with parameter 3 and 6 fallowing rows should fetched


next is insert to another table based on above row


on function call next time with parameter 3 and 7 fallowing rows should fetched.(because f,fo,for,forg are all entered in english_glob table)


Screenshot

,'') where regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)

Once I get the word list I will insert that into another table english_glob.
My question is is there any thing I can add to or remove from function to make it more efficient.

edit
Let assume english_all table have words like

  • footer,settle,question,overflow,database,kingdom

If inMinLength = 5 and inmaxLength=7
then in the outer loop

  • footer,settle,kingdom

will be selected.
For above 3 words inner two loop will apply to get words like

  • f,fo,foo,foot,foote,footer,s,se,set,sett,settl
    .... etc.

In the final process words which are bold will be entered into english_glob with another parameter like 1 to denote it is a proper word and stored in the another filed of english_glob table. Remaining word will be stored with another parameter 0 because in the next call words which are saved in database should not be fetched again.

edit2:
This is a complete code


on function call with parameter 3 and 6 fallowing rows should fetched


next is insert to another table based on above row


on function call next time with parameter 3 and 7 fallowing rows should fetched.(because f,fo,for,forg are all entered in english_glob table)


Screenshot

,'') not in(select english_glob.english_text from english_glob where i=english_glob.wlength) order by regexp_replace((substring(mviews.english_all_text from 1 for i)),'(‌)

Once I get the word list I will insert that into another table english_glob.
My question is is there any thing I can add to or remove from function to make it more efficient.

edit
Let assume english_all table have words like

  • footer,settle,question,overflow,database,kingdom

If inMinLength = 5 and inmaxLength=7
then in the outer loop

  • footer,settle,kingdom

will be selected.
For above 3 words inner two loop will apply to get words like

  • f,fo,foo,foot,foote,footer,s,se,set,sett,settl
    .... etc.

In the final process words which are bold will be entered into english_glob with another parameter like 1 to denote it is a proper word and stored in the another filed of english_glob table. Remaining word will be stored with another parameter 0 because in the next call words which are saved in database should not be fetched again.

edit2:
This is a complete code


on function call with parameter 3 and 6 fallowing rows should fetched


next is insert to another table based on above row


on function call next time with parameter 3 and 7 fallowing rows should fetched.(because f,fo,for,forg are all entered in english_glob table)


Screenshot

,'') LOOP RETURN NEXT outenglishListRow; END LOOP; END LOOP; END LOOP; END;

Once I get the word list I will insert that into another table english_glob.
My question is is there any thing I can add to or remove from function to make it more efficient.

edit
Let assume english_all table have words like

  • footer,settle,question,overflow,database,kingdom

If inMinLength = 5 and inmaxLength=7
then in the outer loop

  • footer,settle,kingdom

will be selected.
For above 3 words inner two loop will apply to get words like

  • f,fo,foo,foot,foote,footer,s,se,set,sett,settl
    .... etc.

In the final process words which are bold will be entered into english_glob with another parameter like 1 to denote it is a proper word and stored in the another filed of english_glob table. Remaining word will be stored with another parameter 0 because in the next call words which are saved in database should not be fetched again.

edit2:
This is a complete code

on function call with parameter 3 and 6 fallowing rows should fetched

next is insert to another table based on above row

on function call next time with parameter 3 and 7 fallowing rows should fetched.(because f,fo,for,forg are all entered in english_glob table)

Screenshot

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

白龙吟 2024-10-16 10:10:04

您可以在单个语句中完成此操作:

SELECT  english_all_text, part,
        part IN
        (
        SELECT  english_text
        FROM    english_glob
        ) AS found
FROM    (
        SELECT  *, SUBSTRING(english_all_text, 1, generate_series(1, LENGTH(english_all_text))) AS part
        FROM    english_all
        WHERE   LENGTH(english_all_text) BETWEEN 5 AND 7
        ) q

要检查的一些示例数据:

WITH    english_all(english_all_text) AS
        (
        SELECT  unnest('{footer,settle,question,overflow,database,kingdom}'::text[])
        ),
        english_glob(english_text) AS
        (
        SELECT  unnest('{foot,footer,set}'::text[])
        )
SELECT  english_all_text, part,
        part IN
        (
        SELECT  english_text
        FROM    english_glob
        ) AS found
FROM    (
        SELECT  *, SUBSTRING(english_all_text, 1, generate_series(1, LENGTH(english_all_text))) AS part
        FROM    english_all
        WHERE   LENGTH(english_all_text) BETWEEN 5 AND 7
        ) q

更新:

如果您只需要返回其部分不在列表中的单词,请使用以下命令:

SELECT  part
FROM    (
        SELECT  *, SUBSTRING(english_all_text, 1, generate_series(1, LENGTH(english_all_text))) AS part
        FROM    english_all
        WHERE   LENGTH(english_all_text) BETWEEN 5 AND 7
        ) q
WHERE   part NOT IN
        (
        SELECT  english_text
        FROM    english_glob
        )

You can do it in a single statement:

SELECT  english_all_text, part,
        part IN
        (
        SELECT  english_text
        FROM    english_glob
        ) AS found
FROM    (
        SELECT  *, SUBSTRING(english_all_text, 1, generate_series(1, LENGTH(english_all_text))) AS part
        FROM    english_all
        WHERE   LENGTH(english_all_text) BETWEEN 5 AND 7
        ) q

Some sample data to check:

WITH    english_all(english_all_text) AS
        (
        SELECT  unnest('{footer,settle,question,overflow,database,kingdom}'::text[])
        ),
        english_glob(english_text) AS
        (
        SELECT  unnest('{foot,footer,set}'::text[])
        )
SELECT  english_all_text, part,
        part IN
        (
        SELECT  english_text
        FROM    english_glob
        ) AS found
FROM    (
        SELECT  *, SUBSTRING(english_all_text, 1, generate_series(1, LENGTH(english_all_text))) AS part
        FROM    english_all
        WHERE   LENGTH(english_all_text) BETWEEN 5 AND 7
        ) q

Update:

If you just need to return the words whose parts are not already in the list, use this:

SELECT  part
FROM    (
        SELECT  *, SUBSTRING(english_all_text, 1, generate_series(1, LENGTH(english_all_text))) AS part
        FROM    english_all
        WHERE   LENGTH(english_all_text) BETWEEN 5 AND 7
        ) q
WHERE   part NOT IN
        (
        SELECT  english_text
        FROM    english_glob
        )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文