Ruby 中的通配符字符串匹配

发布于 2024-11-17 00:56:14 字数 651 浏览 2 评论 0原文

我想编写一个实用函数/模块,它将提供与字符串的简单通配符/全局匹配。我不使用正则表达式的原因是用户最终将使用某种配置文件提供匹配的模式。我找不到任何这样稳定的宝石 - 尝试了小丑,但设置时遇到问题。

我正在寻找的功能很简单。例如,给定以下模式,以下是匹配项:

pattern | test-string         | match
========|=====================|====================
*hn     | john, johnny, hanna | true , false, false     # wildcard  , similar to /hn$/i
*hn*    | john, johnny, hanna | true , true , false     # like /hn/i
hn      | john, johnny, hanna | false, false, false     # /^hn$/i
*h*n*   | john, johnny, hanna | true , true , true
etc...

我希望这尽可能高效。我考虑过从模式字符串创建正则表达式,但这在运行时似乎效率很低。对此实施有何建议?谢谢。

编辑:我正在使用 ruby​​ 1.8.7

I'd like to write a utility function/module that'll provide simple wildcard/glob matching to strings. The reason I'm not using regular expressions is that the user will be the one who'll end up providing the patterns to match using some sort of configuration file. I could not find any such gem that's stable - tried joker but it had problems setting up.

The functionality I'm looking for is simple. For example, given the following patterns, here are the matches:

pattern | test-string         | match
========|=====================|====================
*hn     | john, johnny, hanna | true , false, false     # wildcard  , similar to /hn$/i
*hn*    | john, johnny, hanna | true , true , false     # like /hn/i
hn      | john, johnny, hanna | false, false, false     # /^hn$/i
*h*n*   | john, johnny, hanna | true , true , true
etc...

I'd like this to be as efficient as possible. I thought about creating regexes from the pattern strings, but that seemed rather inefficient to do at runtime. Any suggestions on this implementation? thanks.

EDIT: I'm using ruby 1.8.7

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

溺深海 2024-11-24 00:56:14

我不明白为什么你认为它效率低下。众所周知,对这类事情的预测是不可靠的,在你竭尽全力寻找更快的方法之前,你应该先确定它太慢了。然后您应该对其进行分析,以确保这就是问题所在(顺便说一句,从切换到 1.9 后,速度平均提高了 3-4 倍)

无论如何,执行此操作应该很容易,例如:

class Globber 
  def self.parse_to_regex(str)
    escaped = Regexp.escape(str).gsub('\*','.*?')
    Regexp.new "^#{escaped}$", Regexp::IGNORECASE
  end

  def initialize(str)
    @regex = self.class.parse_to_regex str
  end

  def =~(str)
    !!(str =~ @regex)
  end
end


glob_strs = {
  '*hn'    => [['john', true, ], ['johnny', false,], ['hanna', false]],
  '*hn*'   => [['john', true, ], ['johnny', true, ], ['hanna', false]],
  'hn'     => [['john', false,], ['johnny', false,], ['hanna', false]],
  '*h*n*'  => [['john', true, ], ['johnny', true, ], ['hanna', true ]],
}

puts glob_strs.all? { |to_glob, examples|
  examples.all? do |to_match, expectation|
    result = Globber.new(to_glob) =~ to_match
    result == expectation
  end
}
# >> true

I don't see why you think it would be inefficient. Predictions about these sorts of things are notoriously unreliable, you should decide that it is too slow before you go bending over backwards to find a faster way. And then you should profile it to make sure that this is where the problem lies (btw there is an average of 3-4x speed boost from switching to 1.9)

Anyway, it should be pretty easy to do this, something like:

class Globber 
  def self.parse_to_regex(str)
    escaped = Regexp.escape(str).gsub('\*','.*?')
    Regexp.new "^#{escaped}$", Regexp::IGNORECASE
  end

  def initialize(str)
    @regex = self.class.parse_to_regex str
  end

  def =~(str)
    !!(str =~ @regex)
  end
end


glob_strs = {
  '*hn'    => [['john', true, ], ['johnny', false,], ['hanna', false]],
  '*hn*'   => [['john', true, ], ['johnny', true, ], ['hanna', false]],
  'hn'     => [['john', false,], ['johnny', false,], ['hanna', false]],
  '*h*n*'  => [['john', true, ], ['johnny', true, ], ['hanna', true ]],
}

puts glob_strs.all? { |to_glob, examples|
  examples.all? do |to_match, expectation|
    result = Globber.new(to_glob) =~ to_match
    result == expectation
  end
}
# >> true
木森分化 2024-11-24 00:56:14
def create_regex(pattern)
 if pattern[0,1] != '*'
    pattern = '[^\w\^]' + pattern
 end
 if pattern[-1,1] != '*'
    pattern = pattern + '[^\w$]'
 end
 return Regexp.new( pattern.gsub(/\*/, '.*?') )
end

这个方法应该返回你的正则表达式

PS:它没有经过测试:D

def create_regex(pattern)
 if pattern[0,1] != '*'
    pattern = '[^\w\^]' + pattern
 end
 if pattern[-1,1] != '*'
    pattern = pattern + '[^\w$]'
 end
 return Regexp.new( pattern.gsub(/\*/, '.*?') )
end

This methoid should return your regexp

PS: it is not tested :D

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文