AWK 快速值搜索

发布于 2024-12-06 09:26:32 字数 298 浏览 0 评论 0原文

我需要一种快速的方法来匹配 AWK 中的值,我有 250k 个值要搜索。

我正在做这样的事情:

    #list with 250k numbers instead of four
    number_list="9998532001 9998536052 9998543213 9998544904"

    if ( index(number_list,substr($5,9)) ) 
         {printf "Value: %s\n",$5;}

对于更快的搜索有什么建议吗?

I need a fast way to match a value in AWK, I have 250k values to search.

I'm doing something like this:

    #list with 250k numbers instead of four
    number_list="9998532001 9998536052 9998543213 9998544904"

    if ( index(number_list,substr($5,9)) ) 
         {printf "Value: %s\n",$5;}

Any suggestions for a faster search ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

阿楠 2024-12-13 09:26:33

number_list 中的所有数字放入 awk 数组中,即可快速查找。

if (substr($5,9) in numbers)

Put all the numbers from number_list into an awk array and you'll have a fast lookup.

if (substr($5,9) in numbers)
另类 2024-12-13 09:26:32

如果您要搜索的子字符串在目标字符串中具有一致的长度和位置(例如最后 6 位数字),那么您可以将列表预处理为数组,然后就可以开始了。

预处理步骤(可能在 BEGIN 目标中)

n=split(numbers_list,a," "); # Rip in input sting into pieces
for ( num in a ) {
    key=substr(a[num],length(a[num])-6,6);  # Get the last six digits

    # Error processing (i.e. collision handling) should go here 

    list[key]=a[num];
}

然后当您需要进行查找时

i=list[substr($5,9)]   # i is now the full number associated with the key

这只有在您进行多次查找时才是胜利,因为您仍然需要支付遍历整个列表的成本(实际上是两次)在预处理期间。


请注意,与整个数字的精确匹配符合已知长度和位置的子字符串,只需使用 key=a[num] (这看起来很有趣,并导致上述代码的一些简化,但我'我相信你能弄清楚)。


如果您要查找任何数字中任何出现的 substring($5,9),则此方法不起作用,您必须迭代 每次n

If the substring you are searching for is of a consistent length and position in the target string (say the last 6 digits), then you could preprocess the list into an array and you'd be good to go.

Preprocessing step (perhaps in the BEGIN target)

n=split(numbers_list,a," "); # Rip in input sting into pieces
for ( num in a ) {
    key=substr(a[num],length(a[num])-6,6);  # Get the last six digits

    # Error processing (i.e. collision handling) should go here 

    list[key]=a[num];
}

Then when you need to do the lookup

i=list[substr($5,9)]   # i is now the full number associated with the key

This is only a win if you will do many lookups, because you still have to pay the cost of iterating through that whole list (twice, in fact) during the pre-processing.


Note that exact matching to the whole number qualifies as a substring of known length and position, just use key=a[num] (which looks funny and leads to several simplifications of the above code, but I'm sure you can figure it out).


If you are looking for any occurrence of substring($5,9) in any of the numbers, this won't work, you'll have to iterate through n every time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文