PL/SQL比较位串,我需要AND运算?

发布于 2024-12-04 16:34:02 字数 1446 浏览 0 评论 0原文

我目前有一个脚本可以计算化学库指纹的谷本系数。然而,在测试过程中,我发现由于我比较位字符串的方法(它花费的时间太长),我的实现无法实际扩展。见下文。这是我需要改进的循环。我已经简化了这一点,所以它只是查看两个结构,真正的脚本对结构数据集进行排列,但这会使我在这里遇到的问题变得过于复杂。

LOOP
-- Find the NA bit
SELECT SUBSTR(qsar_kb.fingerprint.fingerprint, var_fragment_id ,1) INTO var_na FROM qsar_kb.fingerprint where project_id = 1 AND structure_id = 1;

-- FIND the NB bit
SELECT SUBSTR(qsar_kb.fingerprint.fingerprint, var_fragment_id ,1) INTO var_nb FROM qsar_kb.fingerprint where project_id = 1 AND structure_id = 2;

-- Test for both bits the same
IF var_na > 0 AND var_nb > 0 then
var_tally := var_tally + 1;
END IF;

-- Test for bit in A on and B off
IF var_na > var_nb then
var_tna := var_tna + 1;
END IF

-- Test for bit in B on and A off.
IF var_nb > var_na then
var_tnb := var_tnb + 1;
END IF;

var_fragment_id := var_fragment_id + 1;
EXIT WHEN var_fragment_id > var_maxfragment_id;
END LOOP;

举个简单的例子

结构 A = '101010'

结构 B = '011001'

在我的真实数据集中,二进制的长度为 500 位及以上。 我需要知道:

1)两者共同的ON位数

2)A中ON但B中OFF的位数

3)B中ON但B中OFF的位数

所以在这种情况下

1)= 1

2 ) = 2

3) = 2

理想情况下,我想改变我的做法。我不想沉浸在每个字符串中的每个位上,当我用数千个结构扩展整个系统时,每个结构的指纹位字符串的长度顺序为 500-1000 ,这太耗时了

我解决这个问题的逻辑是至:

取两者中全部为ON的位数

A) = 3

B) = 3

然后进行AND运算,找出两者中都有多少位为ON = 1

然后只需从总数中减去该值即可找到其中一个而不是另一个中的位数。

那么如何对两个 0 和 1 的字符串执行类似 AND 的操作来查找公共 1 的数量呢?

I currently have a script that calculates the tanimoto coefficient on the fingerprints of a chemical library. However during testing I found my implementation could not be feasibly scaled up due to my method of comparing the bit strings (It just takes far too long). See below. This is the loop I need to improve. I've simplified this so it is just looking at two structures the real script does permutations about the dataset of structures, but that would over complicate the issue I have here.

LOOP
-- Find the NA bit
SELECT SUBSTR(qsar_kb.fingerprint.fingerprint, var_fragment_id ,1) INTO var_na FROM qsar_kb.fingerprint where project_id = 1 AND structure_id = 1;

-- FIND the NB bit
SELECT SUBSTR(qsar_kb.fingerprint.fingerprint, var_fragment_id ,1) INTO var_nb FROM qsar_kb.fingerprint where project_id = 1 AND structure_id = 2;

-- Test for both bits the same
IF var_na > 0 AND var_nb > 0 then
var_tally := var_tally + 1;
END IF;

-- Test for bit in A on and B off
IF var_na > var_nb then
var_tna := var_tna + 1;
END IF

-- Test for bit in B on and A off.
IF var_nb > var_na then
var_tnb := var_tnb + 1;
END IF;

var_fragment_id := var_fragment_id + 1;
EXIT WHEN var_fragment_id > var_maxfragment_id;
END LOOP;

For a simple example

Structure A = '101010'

Structure B = '011001'

In my real data set the length of the binary is 500 bits and up.
I need to know:

1)The number of bits ON common to Both

2)The number of bits ON in A but off in B

3)The number of bits ON in B but off in B

So in this case

1) = 1

2) = 2

3) = 2

Ideally I want to change how I'm doing this. I don't want to be steeping though each bit in each string its just too time expensive when I scale the whole system up with thousands of structures each with fingerprint bit strings in the length order of 500-1000

My logic to fix this would be to:

Take the total number of bits ON in both

A) = 3

B) = 3

Then perform an AND operation and find how many bits are on in both
= 1

Then just subtract this from the totals to find the number of bits on in one but not the other.

So how can I perform an AND like operation on two strings of 0's and 1's to find the number of common 1's?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

疑心病 2024-12-11 16:34:02

查看 BITAND 函数。

BITAND 函数将其输入和输出视为位向量;输出是输入的按位与。

但是,根据文档,这只适用于 2^128

Check out the BITAND function.

The BITAND function treats its inputs and its output as vectors of bits; the output is the bitwise AND of the inputs.

However, according to the documentation, this only works for 2^128

冰魂雪魄 2024-12-11 16:34:02

您应该将 SELECT 移出循环。我很确定您花费 99% 的时间选择 1 位 500 次,您可以一次性选择 500 位,然后循环遍历字符串:

DECLARE
   l_structure_a LONG;
   l_structure_b LONG;
   var_na        VARCHAR2(1);
   var_nb        VARCHAR2(1);
BEGIN
   SELECT MAX(decode(structure_id, 1, fingerprint)), 
          MAX(decode(structure_id, 2, fingerprint))
     INTO l_structure_a, l_structure_b
     FROM qsar_kb.fingerprint
    WHERE project_id = 1
      AND structure_id IN (1,2);
   LOOP
      var_na := substr(l_structure_a, var_fragment_id, 1);
      var_nb := substr(l_structure_b, var_fragment_id, 1);

      -- Test for both bits the same
      IF var_na > 0 AND var_nb > 0 THEN
         var_tally := var_tally + 1;
      END IF;   
      -- Test for bit in A on and B off
      IF var_na > var_nb THEN
         var_tna := var_tna + 1;
      END IF;   
      -- Test for bit in B on and A off.
      IF var_nb > var_na THEN
         var_tnb := var_tnb + 1;
      END IF;

      var_fragment_id := var_fragment_id + 1;
      EXIT WHEN var_fragment_id > var_maxfragment_id;
   END LOOP;
END;

编辑:
您还可以在单​​个 SQL 语句中完成此操作:

SQL> WITH DATA AS (
  2     SELECT '101010' fingerprint,1 project_id, 1 structure_id FROM dual
  3     UNION ALL SELECT '011001', 1, 2 FROM dual),
  4  transpose AS (SELECT ROWNUM fragment_id FROM dual CONNECT BY LEVEL <= 1000)
  5  SELECT COUNT(CASE WHEN var_na = 1 AND var_nb = 1 THEN 1 END) nb_1,
  6         COUNT(CASE WHEN var_na > var_nb THEN 1 END) nb_2,
  7         COUNT(CASE WHEN var_na < var_nb THEN 1 END) nb_3
  8    FROM (SELECT to_number(substr(struct_a, fragment_id, 1)) var_na,
  9                 to_number(substr(struct_b, fragment_id, 1)) var_nb
 10             FROM (SELECT MAX(decode(structure_id, 1, fingerprint)) struct_a,
 11                           MAX(decode(structure_id, 2, fingerprint)) struct_b
 12                      FROM DATA
 13                     WHERE project_id = 1
 14                       AND structure_id IN (1, 2))
 15            CROSS JOIN transpose);

      NB_1       NB_2       NB_3
---------- ---------- ----------
         1          2          2

You should move the SELECT out of the loop. I'm pretty sure you're spending 99% of the time selecting 1 bit 500 times where you could select 500 bits in one go and then loop through the string:

DECLARE
   l_structure_a LONG;
   l_structure_b LONG;
   var_na        VARCHAR2(1);
   var_nb        VARCHAR2(1);
BEGIN
   SELECT MAX(decode(structure_id, 1, fingerprint)), 
          MAX(decode(structure_id, 2, fingerprint))
     INTO l_structure_a, l_structure_b
     FROM qsar_kb.fingerprint
    WHERE project_id = 1
      AND structure_id IN (1,2);
   LOOP
      var_na := substr(l_structure_a, var_fragment_id, 1);
      var_nb := substr(l_structure_b, var_fragment_id, 1);

      -- Test for both bits the same
      IF var_na > 0 AND var_nb > 0 THEN
         var_tally := var_tally + 1;
      END IF;   
      -- Test for bit in A on and B off
      IF var_na > var_nb THEN
         var_tna := var_tna + 1;
      END IF;   
      -- Test for bit in B on and A off.
      IF var_nb > var_na THEN
         var_tnb := var_tnb + 1;
      END IF;

      var_fragment_id := var_fragment_id + 1;
      EXIT WHEN var_fragment_id > var_maxfragment_id;
   END LOOP;
END;

Edit:
You could also do it in a single SQL statement:

SQL> WITH DATA AS (
  2     SELECT '101010' fingerprint,1 project_id, 1 structure_id FROM dual
  3     UNION ALL SELECT '011001', 1, 2 FROM dual),
  4  transpose AS (SELECT ROWNUM fragment_id FROM dual CONNECT BY LEVEL <= 1000)
  5  SELECT COUNT(CASE WHEN var_na = 1 AND var_nb = 1 THEN 1 END) nb_1,
  6         COUNT(CASE WHEN var_na > var_nb THEN 1 END) nb_2,
  7         COUNT(CASE WHEN var_na < var_nb THEN 1 END) nb_3
  8    FROM (SELECT to_number(substr(struct_a, fragment_id, 1)) var_na,
  9                 to_number(substr(struct_b, fragment_id, 1)) var_nb
 10             FROM (SELECT MAX(decode(structure_id, 1, fingerprint)) struct_a,
 11                           MAX(decode(structure_id, 2, fingerprint)) struct_b
 12                      FROM DATA
 13                     WHERE project_id = 1
 14                       AND structure_id IN (1, 2))
 15            CROSS JOIN transpose);

      NB_1       NB_2       NB_3
---------- ---------- ----------
         1          2          2
梦毁影碎の 2024-12-11 16:34:02

我将用更多信息来扩展卢卡斯的答案。

一点互联网搜索揭示了 Tom Kyte 的代码(来自 Jonathan Lewis) 在碱基之间进行转换。有一个函数 to_dec 它将接受一个字符串并将其转换为数字。我复制了下面的代码:

将基数转换为十进制:

create or replace function to_dec( 
  p_str in varchar2, 
  p_from_base in number default 16) return number
is
    l_num   number default 0;
    l_hex   varchar2(16) default '0123456789ABCDEF';
begin
    for i in 1 .. length(p_str) loop
        l_num := l_num * p_from_base + instr(l_hex,upper(substr(p_str,i,1)))-1;
    end loop;
    return l_num;
end to_dec;

将十进制转换为基数:

create or replace function to_base( p_dec in number, p_base in number ) 
return varchar2
is
    l_str   varchar2(255) default NULL;
    l_num   number  default p_dec;
    l_hex   varchar2(16) default '0123456789ABCDEF';
begin
    if ( trunc(p_dec) <> p_dec OR p_dec < 0 ) then
        raise PROGRAM_ERROR;
    end if;
    loop
        l_str := substr( l_hex, mod(l_num,p_base)+1, 1 ) || l_str;
        l_num := trunc( l_num/p_base );
        exit when ( l_num = 0 );
    end loop;
    return l_str;
end to_base;

可以调用此函数将字符串位图转换为数字,然后可以与 bitand 一起使用。一个例子是:

select to_dec('101010', 2) from dual

Oracle 仅真正提供 BITAND (和 BIT_TO_NUM 这与这里并不真正相关)作为执行逻辑操作的方式,但这里所需的操作是(A AND B)、(A AND NOT B) 和 (NOT A AND B) 因此我们需要一个将 A 转换为 NOT A 的 var。 一个简单的方法是使用 translate

So.... 最后的 。结果是:

select 
  length(translate(to_base(bitand(data_A, data_B),2),'10','1')) as nb_1,
  length(translate(to_base(bitand(data_A, data_NOT_B),2),'10','1'))  as nb_2,
  length(translate(to_base(bitand(data_NOT_A, data_B),2),'10','1'))  as nb_3
from (
  select 
    to_dec(data_A,2) as data_A, 
    to_dec(data_b,2) as data_B,
    to_dec(translate(data_A, '01', '10'),2) as data_NOT_A, 
    to_dec(translate(data_B, '01', '10'),2) as data_NOT_B
  from (
    select '101010' as data_A, '011001' as data_B from dual
  )
)

这比我希望的要复杂一些当我开始写这个答案时,但它似乎确实有效。

I'll sort of expand on the answer from Lukas with a little bit more information.

A little bit of an internet search revealed code from Tom Kyte (via Jonathan Lewis) to convert between bases. There is a function to_dec which will take a string and convert it to a number. I have reproduced the code below:

Convert base number to decimal:

create or replace function to_dec( 
  p_str in varchar2, 
  p_from_base in number default 16) return number
is
    l_num   number default 0;
    l_hex   varchar2(16) default '0123456789ABCDEF';
begin
    for i in 1 .. length(p_str) loop
        l_num := l_num * p_from_base + instr(l_hex,upper(substr(p_str,i,1)))-1;
    end loop;
    return l_num;
end to_dec;

Convert decimal to base number:

create or replace function to_base( p_dec in number, p_base in number ) 
return varchar2
is
    l_str   varchar2(255) default NULL;
    l_num   number  default p_dec;
    l_hex   varchar2(16) default '0123456789ABCDEF';
begin
    if ( trunc(p_dec) <> p_dec OR p_dec < 0 ) then
        raise PROGRAM_ERROR;
    end if;
    loop
        l_str := substr( l_hex, mod(l_num,p_base)+1, 1 ) || l_str;
        l_num := trunc( l_num/p_base );
        exit when ( l_num = 0 );
    end loop;
    return l_str;
end to_base;

This function can be called to convert the string bitmap into a number which can then be used with bitand. An example of this would be:

select to_dec('101010', 2) from dual

Oracle only really provides BITAND (and BIT_TO_NUM which isn't really relevant here` as a way of doing logical operations but the operations required here are (A AND B), (A AND NOT B) and (NOT A AND B). So we need a var of converting A to NOT A. A simple way of doing this is to use translate.

So.... the final outcome is:

select 
  length(translate(to_base(bitand(data_A, data_B),2),'10','1')) as nb_1,
  length(translate(to_base(bitand(data_A, data_NOT_B),2),'10','1'))  as nb_2,
  length(translate(to_base(bitand(data_NOT_A, data_B),2),'10','1'))  as nb_3
from (
  select 
    to_dec(data_A,2) as data_A, 
    to_dec(data_b,2) as data_B,
    to_dec(translate(data_A, '01', '10'),2) as data_NOT_A, 
    to_dec(translate(data_B, '01', '10'),2) as data_NOT_B
  from (
    select '101010' as data_A, '011001' as data_B from dual
  )
)

This is somewhat more complicated than I was hoping when I started writing this answer but it does seem to work.

幻想少年梦 2024-12-11 16:34:02

可以通过这样的事情非常简单地完成:

SELECT utl_raw.BIT_AND( t.A, t.B )                                                 SET_IN_A_AND_B,
       length(replace(utl_raw.BIT_AND( t.A, t.B ), '0', ''))                       SET_IN_A_AND_B_COUNT,
       utl_raw.BIT_AND( t.A, utl_raw.bit_complement(t.B) )                         ONLY_SET_IN_A,
       length(replace(utl_raw.BIT_AND( t.A, utl_raw.bit_complement(t.B) ),'0','')) ONLY_SET_IN_A_COUNT,
       utl_raw.BIT_AND( t.B, utl_raw.bit_complement(t.A) )                         ONLY_SET_IN_A,
       length(replace(utl_raw.BIT_AND( t.B, utl_raw.bit_complement(t.A) ),'0','')) ONLY_SET_IN_A_COUNT       
  FROM (SELECT '1111100000111110101010' A, '1101011010101010100101' B FROM dual) t

确保您的位字符串具有偶数长度(如果它具有奇数长度,只需用零填充它)。

Can be done pretty simply with something like this:

SELECT utl_raw.BIT_AND( t.A, t.B )                                                 SET_IN_A_AND_B,
       length(replace(utl_raw.BIT_AND( t.A, t.B ), '0', ''))                       SET_IN_A_AND_B_COUNT,
       utl_raw.BIT_AND( t.A, utl_raw.bit_complement(t.B) )                         ONLY_SET_IN_A,
       length(replace(utl_raw.BIT_AND( t.A, utl_raw.bit_complement(t.B) ),'0','')) ONLY_SET_IN_A_COUNT,
       utl_raw.BIT_AND( t.B, utl_raw.bit_complement(t.A) )                         ONLY_SET_IN_A,
       length(replace(utl_raw.BIT_AND( t.B, utl_raw.bit_complement(t.A) ),'0','')) ONLY_SET_IN_A_COUNT       
  FROM (SELECT '1111100000111110101010' A, '1101011010101010100101' B FROM dual) t

Make sure your bit string has an even length (just pad it with a zero if it has an odd length).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文