用于查找有效 sphinx 字段的正则表达式
我正在尝试验证提供给 sphinx 的字段是否有效,但我遇到了困难。
想象一下,有效字段是猫、老鼠、狗、小狗。
有效搜索将是:
- @cat search terms
- @(cat) search terms
- @(cat, dog) search term
- @cat searchterm1 @dog searchterm2
- @(cat, dog) searchterm1 @mouse searchterm2
所以,我想使用正则表达式在上面的示例中查找“猫”、“狗”、“老鼠”等术语,并根据有效术语列表检查它们。
因此,查询如下: @(goat)
会产生错误,因为 goat 不是有效术语。
我已经得到了这样的结果,我可以使用这个正则表达式找到简单的查询,例如 @cat: (?:@)([^( ]*)
但我不知道如何找到其余的。
我正在使用 python & ; django,这是值得的。
I'm trying to validate that the fields given to sphinx are valid, but I'm having difficulty.
Imagine that valid fields are cat, mouse, dog, puppy.
Valid searches would then be:
- @cat search terms
- @(cat) search terms
- @(cat, dog) search term
- @cat searchterm1 @dog searchterm2
- @(cat, dog) searchterm1 @mouse searchterm2
So, I want to use a regular expression to find terms such as cat, dog, mouse in the above examples, and check them against a list of valid terms.
Thus, a query such as:
@(goat)
Would produce an error because goat is not a valid term.
I've gotten so that I can find simple queries such as @cat with this regex: (?:@)([^( ]*)
But I can't figure out how to find the rest.
I'm using python & django, for what that's worth.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
为了匹配所有允许的字段,以下看起来相当可怕的正则表达式可以工作:
它按顺序返回这些匹配项:
@cat
,@(cat)
,@(cat ,狗)
,@cat
,@dog
,@(猫,狗)
,@mouse
>。正则表达式分解如下:
现在要识别任何无效搜索,您可以将所有内容包装在否定的前瞻中:
这将识别任何
@
字符,其后是无效的尝试了搜索词(或词组合)。修改它,使其也匹配无效的尝试,而不是仅仅指向它,不再那么困难了。您必须从您的字段动态准备
(?:cat|mouse|dog|puppy)
并将其插入正则表达式的静态其余部分。也不应该太难做到。To match all allowed fields, the following rather fearful looking regex works:
It returns these matches, in order:
@cat
,@(cat)
,@(cat, dog)
,@cat
,@dog
,@(cat, dog)
,@mouse
.The regex breaks down as follows:
Now to identify any invalid search, you would wrap all that in a negative look-ahead:
This would identify any
@
character after which an invalid search term (or term combination) was attempted. Modifying it so that it also matches the invalid attempt instead of just pointing at it is not that hard anymore.You would have to prepare
(?:cat|mouse|dog|puppy)
from your field dynamically and plug it into the static rest of the regex. Should not be too hard to do either.这个 pyparsing 解决方案遵循与您发布的答案类似的逻辑路径。所有标签都会匹配,然后对照已知有效标签列表进行检查,将它们从报告的结果中删除。只有那些在删除有效值后仍保留值的匹配才会被报告为匹配。
有了这些可爱的结果:
最后一个片段将为您完成所有扫描,并只为您提供找到的无效标签的列表:
打印:
This pyparsing solution follows a similar logic path as your posted answer. All tags are matched, and then checked against the list of known valid tags, removing them from the reported results. Only those matches that have values left over after removing the valid ones are reported as matches.
With these lovely results:
This last snippet will do all the scanning for you, and just give you the list of found invalid tags:
Prints:
这应该有效:
它将匹配单个
@parameter
或仅包含允许的单词(一个或多个)的带括号的@(par1, par2)
列表。它还确保不接受部分匹配 (
@caterpillar
)。This should work:
It will either match a single
@parameter
or a parenthesized@(par1, par2)
list containing only allowed words (one or more).It also makes sure that no partial matches are accepted (
@caterpillar
).试试这个:
单个字段名称(如
@cat
中的cat
)将在组 #1 中捕获,而括号列表中的名称如@(cat ,dog)
将被存储在组#2中。在后一种情况下,您需要使用 split() 或其他方法来分解列表;无法使用 Python 正则表达式单独捕获名称。Try this:
A single field name (like
cat
in@cat
) will be captured in group #1, while the names in a parenthesized list like@(cat, dog)
will be stored in group #2. In the latter case you'll need to break the list down withsplit()
or something; there's no way to capture the names individually with a Python regex.这将匹配猫、狗、老鼠或小狗及其组合的所有字段。
This will match all fields that are cat, dog, mouse, or puppy and combinations thereof.
我最终以不同的方式做了这件事,因为以上都不起作用。首先,我找到了像 @cat 这样的字段,如下所示:
接下来,我找到了更复杂的字段,如下所示:
接下来,我检查了我找到的属性是否有效,并将它们添加到数组中(唯一地添加到数组中):
谢谢大家不过有帮助。我很高兴拥有它!
I ended up doing this a different way, since none of the above worked. First I found the fields like @cat, with this:
Next, I found the more complicated ones, with this:
Next, I checked if the attributes I found were valid, and added them (uniquely to an array):
Thanks all for the help though. I'm very glad to have had it!