AWK 快速值搜索

发布于 2024-12-06 09:26:32 字数 298 浏览 0 评论 0原文

我需要一种快速的方法来匹配 AWK 中的值，我有 250k 个值要搜索。

我正在做这样的事情：

    #list with 250k numbers instead of four
    number_list="9998532001 9998536052 9998543213 9998544904"

    if ( index(number_list,substr($5,9)) ) 
         {printf "Value: %s\n",$5;}

对于更快的搜索有什么建议吗？

原文

I need a fast way to match a value in AWK, I have 250k values to search.

I'm doing something like this:

    #list with 250k numbers instead of four
    number_list="9998532001 9998536052 9998543213 9998544904"

    if ( index(number_list,substr($5,9)) ) 
         {printf "Value: %s\n",$5;}

Any suggestions for a faster search ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阿楠 2024-12-13 09:26:33

将 number_list 中的所有数字放入 awk 数组中，即可快速查找。

if (substr($5,9) in numbers)

Put all the numbers from number_list into an awk array and you'll have a fast lookup.

if (substr($5,9) in numbers)

回复收藏 0 原文

另类 2024-12-13 09:26:32

如果您要搜索的子字符串在目标字符串中具有一致的长度和位置（例如最后 6 位数字），那么您可以将列表预处理为数组，然后就可以开始了。

预处理步骤（可能在 BEGIN 目标中）

n=split(numbers_list,a," "); # Rip in input sting into pieces
for ( num in a ) {
    key=substr(a[num],length(a[num])-6,6);  # Get the last six digits

    # Error processing (i.e. collision handling) should go here 

    list[key]=a[num];
}

然后当您需要进行查找时

i=list[substr($5,9)]   # i is now the full number associated with the key

这只有在您进行多次查找时才是胜利，因为您仍然需要支付遍历整个列表的成本（实际上是两次）在预处理期间。

请注意，与整个数字的精确匹配符合已知长度和位置的子字符串，只需使用 key=a[num] （这看起来很有趣，并导致上述代码的一些简化，但我'我相信你能弄清楚）。

如果您要查找任何数字中任何出现的 substring($5,9)，则此方法不起作用，您必须迭代 每次n。

If the substring you are searching for is of a consistent length and position in the target string (say the last 6 digits), then you could preprocess the list into an array and you'd be good to go.

Preprocessing step (perhaps in the BEGIN target)

n=split(numbers_list,a," "); # Rip in input sting into pieces
for ( num in a ) {
    key=substr(a[num],length(a[num])-6,6);  # Get the last six digits

    # Error processing (i.e. collision handling) should go here 

    list[key]=a[num];
}

Then when you need to do the lookup

i=list[substr($5,9)]   # i is now the full number associated with the key

This is only a win if you will do many lookups, because you still have to pay the cost of iterating through that whole list (twice, in fact) during the pre-processing.

Note that exact matching to the whole number qualifies as a substring of known length and position, just use key=a[num] (which looks funny and leads to several simplifications of the above code, but I'm sure you can figure it out).

If you are looking for any occurrence of substring($5,9) in any of the numbers, this won't work, you'll have to iterate through n every time.

回复收藏 0 原文

~没有更多了~