请有人能帮我优化这个程序,Oracle 10g
create or replace procedure prcdr_Clustering is
v_sampleCount number;
v_sampleFlag number;
v_matchPercent number;
v_SpendAmount Number(18, 2);
cursor cur_PDCSample is
SELECT *
FROM TBL_BIL
WHERE UDF_CHK = 'N';
rec_Pdcsample TBL_BIL%rowtype;
BEGIN
OPEN cur_PDCSample;
LOOP
FETCH cur_PDCSample
into rec_Pdcsample;
EXIT WHEN cur_PDCSample%NOTFOUND;
SELECT COUNT(*)
INTO v_sampleCount
FROM TBL_BIL
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount <> 0 THEN
UPDATE TBL_BIL
SET UDF_CHK = 'Y'
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount > 1 THEN
v_sampleFlag := 1;
ELSE
IF v_sampleCount = 1 THEN
v_sampleFlag := 2;
ELSE
v_sampleFlag := 0;
END IF;
END IF;
UPDATE TBL_BIL
SET UDF_SAMPLECOUNT = v_sampleCount, UDF_SAMPLEFLAG = v_sampleFlag
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_PID = rec_Pdcsample.uniqueid
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
UPDATE TBL_BIL
SET UDF_PIDSPEND = v_SpendAmount
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_MATCHPERCENT = 1
WHERE uniqueid <> rec_Pdcsample.uniqueid
AND UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
END IF;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
OPEN cur_PDCSample;
END LOOP;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
end PrcdrClustering;
我花了几天时间来执行,我的表有 225,846 行数据。
我的表的结构是:-
UNIQUEID NUMBER Notnull primary key
VENDORNAME VARCHAR2(200)
SHORTTEXT VARCHAR2(500)
SPENDAMT NUMBER(18,2)
UDF_TOKENIZED VARCHAR2(999)
UDF_PID NUMBER(10)
UDF_SAMPLEFLAG NUMBER(4)
UDF_SAMPLECOUNT NUMBER(4)
UDF_MATCHPERCENT NUMBER(4)
UDF_TOKENCNT NUMBER(4)
UDF_PIDSPEND NUMBER(18,2)
UDF_CHK VARCHAR2(1)
create or replace procedure prcdr_Clustering is
v_sampleCount number;
v_sampleFlag number;
v_matchPercent number;
v_SpendAmount Number(18, 2);
cursor cur_PDCSample is
SELECT *
FROM TBL_BIL
WHERE UDF_CHK = 'N';
rec_Pdcsample TBL_BIL%rowtype;
BEGIN
OPEN cur_PDCSample;
LOOP
FETCH cur_PDCSample
into rec_Pdcsample;
EXIT WHEN cur_PDCSample%NOTFOUND;
SELECT COUNT(*)
INTO v_sampleCount
FROM TBL_BIL
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount <> 0 THEN
UPDATE TBL_BIL
SET UDF_CHK = 'Y'
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount > 1 THEN
v_sampleFlag := 1;
ELSE
IF v_sampleCount = 1 THEN
v_sampleFlag := 2;
ELSE
v_sampleFlag := 0;
END IF;
END IF;
UPDATE TBL_BIL
SET UDF_SAMPLECOUNT = v_sampleCount, UDF_SAMPLEFLAG = v_sampleFlag
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_PID = rec_Pdcsample.uniqueid
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
UPDATE TBL_BIL
SET UDF_PIDSPEND = v_SpendAmount
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_MATCHPERCENT = 1
WHERE uniqueid <> rec_Pdcsample.uniqueid
AND UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
END IF;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
OPEN cur_PDCSample;
END LOOP;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
end PrcdrClustering;
It takes me days to execute, my table has 225,846 rows of data.
The structure of my table is :-
UNIQUEID NUMBER Notnull primary key
VENDORNAME VARCHAR2(200)
SHORTTEXT VARCHAR2(500)
SPENDAMT NUMBER(18,2)
UDF_TOKENIZED VARCHAR2(999)
UDF_PID NUMBER(10)
UDF_SAMPLEFLAG NUMBER(4)
UDF_SAMPLECOUNT NUMBER(4)
UDF_MATCHPERCENT NUMBER(4)
UDF_TOKENCNT NUMBER(4)
UDF_PIDSPEND NUMBER(18,2)
UDF_CHK VARCHAR2(1)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从哪里开始?我有几点要说。
uniqueid
是唯一的,您可以使用rowid
更新表。udf_pidspend
更新为 null,无论这是有意还是无意,都无需为其进行单独更新。tbl_bil
在uniqueid
上建立索引,一般规则:
批量收集
。select *
rowid
以避免所有索引问题。这仅适用于11G,我回答这个问题最近我提供了自己在 11G 之前的版本中处理此实现限制的方法,并链接到Ollie's、Tom Kyte 和 Sathya's
我不完全确定你想在这里做什么,所以如果逻辑有点偏离,请原谅我。
最后调用所有表
tbl_...
有点不必要。Where to start? I've a number points to make.
bulk collect ... forall
would be far more efficient.elsif
to reduce the amount of statements to be evaluated ( very, very minor win )uniqueid
is unique you can userowid
to update the table.udf_pidspend
to null, whether this is intentional or not there's no need to do a separate update for it.tbl_bil
is indexed onuniqueid
As general rules:
bulk collect
.select *
rowid
where possible it avoids all index problems.This will only work in 11G, I answered this question recently where I provided my own way of dealing with this implementation restriction in versions prior to 11G and linked to Ollie's, Tom Kyte's and Sathya's
I'm not entirely certain what you're trying to do here so please forgive me if the logic is a little off.
Lastly calling all tables
tbl_...
is a little bit unnecessary.这是使用单个 SQL 语句的变体。我不能 100% 确定逻辑完全相同,但对于我的测试集来说,确实如此。此外,当您有多个 udf_chk = 'N' 和相同 udf_tokenized 的记录时,当前过程是不确定的...
这是重构的过程
,这是一个测试:
问候,
抢。
Here is a variant using a single SQL statement. I'm not 100% certain that the logic is exactly the same, but for my test set, it is. Also the current procedure is non deterministic when you have more than one record with udf_chk = 'N' and the same udf_tokenized ...
This is the refactored procedure
And here is a test:
Regards,
Rob.
我不知道为什么,但您打开 cur_PDCSample,它选择(我怀疑)数千条记录。然后,在循环中,关闭游标并重新打开它,每次仅处理返回的第一条记录。
如果您打开游标一次,处理每条记录然后关闭它,您的过程可能会快得多。
实际上,由于您并不总是将 TBL_BIL.UDF_CHK 更新为“Y”,因此在我看来,您当前的程序可能会无限运行。
I don't know why, but you open the cur_PDCSample, which select (I suspect) thousands of records. And then, in a loop, you close the cursor and reopen it, each time processing only the first record that is returned.
If you open the cursor once, process each record and then close it, your procedure will probably go a lot faster.
Actually, since you do not always update TBL_BIL.UDF_CHK to 'Y', it seems to me that your current procedure may run infinitely.