将 Google 搜索查询转换为 PostgreSQL“tsquery”
如何将 Google 搜索查询转换为可以提供给 PostgreSQL 的 to_tsquery() 的内容?
如果没有现有的库,我应该如何用 PHP 等语言解析 Google 搜索查询?
例如,我想采用以下 Google 风格的搜索查询:
("used cars" OR "new cars") -ford -mistubishi
并将其转换为 to_tsquery() 友好的字符串:
('used cars' | 'new cars') & !ford & !mistubishi
我可以用正则表达式来修改它,但这是我能做的最好的事情。 是否有一些强大的词法分析方法可以解决这个问题? 我也希望能够支持扩展搜索运算符(例如 Google 的 site: 和 intitle:),它们将适用于不同的数据库字段,因此需要与 tsquery 字符串分开。
更新:我意识到,使用特殊运算符,这将成为 Google 到 SQL WHERE 子句的转换,而不是 Google 到 tsquery 的转换。 但 WHERE 子句可能包含一个或多个 tsqueries。
例如,Google 风格的查询:
((color:blue OR "4x4") OR style:coupe) -color:red used
应该生成这样的 SQL WHERE 子句:
WHERE to_tsvector(description) MATCH to_tsquery('used')
AND color <> 'red'
AND ( (color = 'blue' OR to_tsvector(description) MATCH to_tsquery('4x4') )
OR style = 'coupe'
);
我不确定上面的内容是否可以使用正则表达式?
How can I convert a Google search query to something I can feed PostgreSQL's to_tsquery() ?
If there's no existing library out there, how should I go about parsing a Google search query in a language like PHP?
For example, I'd like to take the following Google-ish search query:
("used cars" OR "new cars") -ford -mistubishi
And turn it into a to_tsquery()-friendly string:
('used cars' | 'new cars') & !ford & !mistubishi
I can fudge this with regexes, but that's the best I can do. Is there some robust lexical analysis method of going about this? I'd like to be able to support extended search operators too (like Google's site: and intitle:) that will apply to different database fields, and thus would need to be separated from the tsquery string.
UPDATE: I realize that with special operators this becomes a Google to SQL WHERE-clause conversion, rather than a Google to tsquery conversion. But the WHERE clause may contain one or more tsqueries.
For example, the Google-style query:
((color:blue OR "4x4") OR style:coupe) -color:red used
Should produce an SQL WHERE-clause like this:
WHERE to_tsvector(description) MATCH to_tsquery('used')
AND color <> 'red'
AND ( (color = 'blue' OR to_tsvector(description) MATCH to_tsquery('4x4') )
OR style = 'coupe'
);
I'm not sure if the above is possible with regex?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
老实说,我认为正则表达式是处理此类事情的方法。 尽管如此,这是一个有趣的练习。 下面的代码非常原型化 - 事实上,您会看到我什至没有实现词法分析器本身 - 我只是伪造了输出。 我想继续,但我今天没有更多的空闲时间。
此外,在支持其他类型的搜索运算符等方面,肯定还有很多工作要做。
基本上,其想法是对某种类型的查询进行词法分析,然后将其解析为通用格式(在本例中为 QueryExpression 实例),然后将其呈现为另一种类型的查询。
Honest, I think regular expressions are the way to go with something like this. Just the same, this was a fun exercise. The code below is very prototypal - in fact, you'll see that I didn't even implement the lexer itself - I just faked the output. I'd like to continue it but I just don't have more spare time today.
Also, there definitely a lot more work to be done here in terms of supporting other types of search operators and the like.
Basically, the idea is that a certain type of query is lexed then parsed into a common format (in this case, a QueryExpression instance) which is then rendered back out as another type of query.