当前位置：文江博客话题详情

将 Sphinx 搜索与 ORM 结合使用

发布于 2024-12-22 06:56:34 字数 275 浏览 3 评论 0原文

我正在考虑在我们的网站中实施 Sphinx 搜索。

对我来说，使用 SphinxQL 集成它比必须做一些奇怪的事情包括新库等更有意义，因为这与本机 SQL 相当接近。然而，我担心我们最终可能不得不重新发明轮子，以便我们可以在我们的系统中使用 Sphinx。

为了尝试防止这种情况，我想将 Sphinx 拖到我们的 ORM 系统中。

有没有人以前尝试过这个，或者有人可以解决我们这样做可能会遇到的问题吗？

我们目前混合使用 Zend Framework 和 Propel

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ζ澈沫 2024-12-29 06:56:34

Propel 的长期用户最近使用 Zend Framework 将 Sphinx 添加到我们的应用程序中。

关于 Propel 和 MVC 的注意事项

在过去几个月的开发过程中，我注意到我希望能够更直接地利用 Propel 提供的抽象。您可能知道，Propel 为 ORM 创建基类以及仅扩展基类的空类。

目前，许多业务逻辑都依赖于单独的模型，而完全相同的逻辑可以简单地实现为扩展 Propel 类中的方法。

您应该采用相同的方法来实现 Sphinx 搜索。尝试使用扩展的 ORM 类来抽象它。

Sphinx 注释

创建视图以简化索引：Sphinx 不能很好地处理高级子查询，并且很容易被 MySQL 函数混淆。尝试抽象要索引的数据，以便总 SQL 就像 SELECT id, field1, field2, field3 FROM MyView 一样简单。当您可能想要将每个 Sphinx 文档与用户帐户或其他一些重要的外键关系相关联时，这特别有用。
Sphinx 只能索引 UINT：在大多数情况下可能是理所当然的，但您不能使用 UUID 或负数来解决各种奇怪的数据库结构。
避免重复的文档 ID：在每个 Sphinx 索引中，每个文档 ID 必须是唯一的。假设您希望使 A 类型的对象可搜索，但您希望通过搜索标签、评论和地理位置来查找对象 A。使用 Sphinx 执行此操作的正确方法是创建一个索引 A，其中包含有关对象的所有元数据以及注释、标签和地理位置的单独索引，并确保放置属性 sql_attr_uint映射回对象 A，然后找出要在代码中检索的内容。
使用最新版本的 Sphinx：Sphinx 正在快速开发，像 Debian 这样的发行版在存储库中往往有相当过时的版本。如果可能并且您有时间确保稳定性，请从源代码进行编译（Sphinx 几乎没有依赖项，因此在大多数情况下不会成为问题）。此外，PHP 库代码具有故障安全功能，可防止客户端代码与太新版本的 Sphinx 搜索守护进程通信。
Sphinx 的范围：执行搜索后，您仍然需要从数据库中检索相关信息，因为 Sphinx 只会为您提供匹配条目的 ID。在某些情况下，使用以下内容可能是明智的：
$a = AQuery::create()
->findByPk(ID_FROM_SPHINX)
在 foreach 循环中。但在某些情况下，依赖 ORM 来获取列表可能效率太低，特别是如果您只想列出几列（例如搜索结果）。然后，您可以使用自定义优化的 SQL 选择来获取信息（可以在 Propel 类中执行）。

Long time Propel user here who recently added Sphinx to our application using Zend Framework.

Note on Propel and MVC

What I have noticed during the past few months of development is that I wish I had taken more immediate advantage of the abstraction Propel provides. As you probably know, Propel creates base classes for the ORM as well as empty classes simply extending the base ones.

Currently a lot of business logic is relying on separate models, when the exact same logic could simply be implemented as methods in the extended Propel classes.

You should take the same approach to implementing Sphinx search. Try to abstract it using your extended ORM classes.

Notes on Sphinx

Create views to simplify the indexing: Sphinx does not play well with advanced subqueries and easily gets confused by MySQL functions. Try to abstract the data you want to index so that the total SQL is as simple as SELECT id, field1, field2, field3 FROM MyView. This is particularly useful when you might want to associate each Sphinx document with a user account or some other foreign key relationship that is non-trivial.
Sphinx can only index UINT: Probably a no-brainer in most cases, but you cannot use UUIDs or negative numbers to work around various odd database structures.
Avoid duplicate document ID: Within each Sphinx index, each document ID must be unique. Let us say you want to make objects of type A searchable, but you want to find an object A by searching on tags, comments and geographical positions. The proper way to do this using Sphinx is to make an index A with all meta-data about the object and separate indexes for comments, tags and geographical positions and make sure you put an attribute sql_attr_uint mapping back to object A, then figure out what to retrieve in your code.
Use a recent version of Sphinx: Sphinx is under rapid development and distributions like Debian tend to have a pretty outdated version in the repositories. If possible and you have time to ensure stability, compile from source (Sphinx has few dependencies, so it won't be a problem in most cases). Also, the PHP library code has a fail safe which prevents the client code from talking to a too recent version of the Sphinx search daemon.
The extent of Sphinx: After you have performed your search, you still need to retrieve the relevant information from the database since Sphinx will only give you the IDs of matching entries. In some situations it might be wise to use something like:
$a = AQuery::create()
->findByPk(ID_FROM_SPHINX)
in a foreach loop. But in some cases it might be too inefficient to rely on ORM to get the list, especially if you just want to list a few columns for e.g. search results. Then you could instead use a custom optimized SQL select to get the information (possible to do within the Propel classes).