用java实现搜索过滤器
我们需要为我的社交网站针对用户个人资料实施一个搜索过滤器(类似 Net-log),个人资料上的过滤器包括年龄范围、性别和兴趣,
我们在 MySQL 上运行了大约 100 万个个人资料,MySQL 似乎不是正确的选择实现这样的过滤器,所以我们也在寻找 Cassandra,
那么实现这样的过滤器的最佳方法是什么,结果需要非常快,
例如 足球
年龄 = 18 - 24,性别 = 男性,兴趣 =日期中的
年龄,性别和兴趣为 varchar已编辑:
让我重新表述一下问题,如何才能获得任何类型搜索的最快结果。 它可以基于配置文件名称,或 100 万条配置文件记录上的任何其他配置文件内容。
谢谢
We need to implement a search filter (Net-log like) for my social networking site against user profile, filters on profile include age range, gender and interests
we have approx 1M profiles running on MySQL, MySQL doesn't seems the right option to implement such filters so we are looking on Cassandra as well,
So what is the best way to implement such filter, The result need to be very quick
e.g. age = 18 - 24 and gender = male and interest = Football
Age in Date, Gender and interests are varchar
EDITED:
Let me rephrase the problem, How can I get fastest result of any type of search.
It could be on the bases of profile name, or any other profile thing on 1M profile records.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
进行底层 SQL 更改会对您的项目很有帮助。您可能需要考虑将“兴趣”列从自由输入字段 (varchar) 更改为标签(例如,附加表上的多对多)。
您使用了
Football
示例,并在其上使用了like
运算符。如果您将其更改为标签,那么您将遇到决定放置位置的初始结构问题:但是一旦您这样做了,标签将帮助您的 select 语句运行得更快。
如果没有这种更改,您将把数据管理问题从数据库(有能力处理它)转移到 Java(可能没有)。
It would serve your project well to make an underlying SQL change. You might want to consider changing the Interest column from a free-input field (varchar) to a tag (Many-to-many on an additional table, for example).
You used the example of
Football
and having alike
operator on it. If you changed it to a tag, then you will have an initial structural problem of deciding where to place:But once you have done so, the tags will help your select statement go much faster.
Without this change, you will be pushing your data management problem from a database (which is equipped to handle it) to Java (which might not be).
尝试优化您的查询可能是有意义的(至少您可以做一些事情)。听起来你有一个很大的数据库,如果你返回一个很大的结果集并用java过滤结果,你可能会遇到性能问题,因为所有数据都保存在缓存中。
如果是这种情况,您可以尝试的一件事是在数据库外部缓存结果并从中读取结果。 Hibernate 做得很好,但如果需要,您可以实现自己的版本。如果您对此感兴趣,Memcached 是一个很好的起点。
我刚刚注意到 MySQL 的这一点。我不知道它的效率有多高,但他们有一些构建 full文本搜索功能,这可能有助于加快速度。
It may make some sense to try to optimize your query (there may at least be some things that you can do). It sounds like you have a large database, and if you are returning a large result set and filtering the results with java, you may get performance issues because of all of the data kept in cache.
If this is the case, one thing that you could try is looking into caching the results, outside of the database and reading from that. This is something that Hibernate does very well, but you could implement your own version if needed. If this is something that you are interested in, Memcached, is a good starting place.
I just noticed this for MySQL. I do not know how efficient it is but they have some build in full text searching functions, that may help speed things up.