AWK 快速值搜索
我需要一种快速的方法来匹配 AWK 中的值,我有 250k 个值要搜索。
我正在做这样的事情:
#list with 250k numbers instead of four
number_list="9998532001 9998536052 9998543213 9998544904"
if ( index(number_list,substr($5,9)) )
{printf "Value: %s\n",$5;}
对于更快的搜索有什么建议吗?
I need a fast way to match a value in AWK, I have 250k values to search.
I'm doing something like this:
#list with 250k numbers instead of four
number_list="9998532001 9998536052 9998543213 9998544904"
if ( index(number_list,substr($5,9)) )
{printf "Value: %s\n",$5;}
Any suggestions for a faster search ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将
number_list
中的所有数字放入 awk 数组中,即可快速查找。Put all the numbers from
number_list
into an awk array and you'll have a fast lookup.如果您要搜索的子字符串在目标字符串中具有一致的长度和位置(例如最后 6 位数字),那么您可以将列表预处理为数组,然后就可以开始了。
预处理步骤(可能在
BEGIN
目标中)然后当您需要进行查找时
这只有在您进行多次查找时才是胜利,因为您仍然需要支付遍历整个列表的成本(实际上是两次)在预处理期间。
请注意,与整个数字的精确匹配符合已知长度和位置的子字符串,只需使用
key=a[num]
(这看起来很有趣,并导致上述代码的一些简化,但我'我相信你能弄清楚)。如果您要查找任何数字中任何出现的
substring($5,9)
,则此方法不起作用,您必须迭代每次n
。If the substring you are searching for is of a consistent length and position in the target string (say the last 6 digits), then you could preprocess the list into an array and you'd be good to go.
Preprocessing step (perhaps in the
BEGIN
target)Then when you need to do the lookup
This is only a win if you will do many lookups, because you still have to pay the cost of iterating through that whole list (twice, in fact) during the pre-processing.
Note that exact matching to the whole number qualifies as a substring of known length and position, just use
key=a[num]
(which looks funny and leads to several simplifications of the above code, but I'm sure you can figure it out).If you are looking for any occurrence of
substring($5,9)
in any of the numbers, this won't work, you'll have to iterate throughn
every time.