使用 JQuery 和 JSON 格式的平面文件进行快速动态自动建议
我一直在为这个问题绞尽脑汁,所以我最终决定把它放在 Stackoverflow 上,希望有人能帮助我解决这个问题。
让我概述一下问题:
- 自动建议功能的最基本用法是在每次击键时查询数据库。
此方法的优点是您始终可以获得最新的结果。
缺点是速度可能很慢,而且每次击键都对数据库进行寻址基本上是多余的。假设您想自动建议一个国家/地区表,搜索“美国”之类的内容将在数据库上生成 7 个 SELECT 查询! 因此,该方法不是一种选择。
- 第二个选项(我当前正在使用的选项)是收集所有可能的结果并将其保存到 JSON 格式的平面文件中。 因此,基本上您将拥有一个文件,例如包含所有国家/地区的countries.json。我没有查询数据库,而是指向国家/地区 json 文件,每次击键都会在加载的 json 文件中进行搜索。
此方法速度快并且利用数据库。在国家/地区的示例中,这将是一个很好的解决方案,因为内容不是很动态,并且不包含那么多数据,因此 json 平面文件将非常小,因此下载速度很快。
但是当 json 文件包含动态数据并且数据集很大时,就会出现问题。 如果您想构建包含所有电影和演员的自动建议搜索该怎么办? 将所有结果放入一个 json 文件中是很荒谬的,因为该文件可能太大而无法下载。 我们以 IMDB(www.imdb.com)为例。他们的网站上有自动建议功能。搜索电影名称或演员的速度快如闪电,可搜索数十万条记录。深入研究他们的方法,我注意到以下几点:
如果您使用 Google Chrome 调试窗口(或 Filezilla Firefox Firebug): 查看网络选项卡中的脚本。例如,当您输入“星球大战”时,您会注意到输入每个字母都会加载一个单独的 json 文件。 首先是 s.json,然后是 st.json、sta.json 等... 它停在 star_w.json 显然它总是停在第六个字母。
它加载的每个 json 文件包含 8 个与字母匹配的结果。似乎有 36 个文件夹与字母表中的每个字母(包括数字)相匹配。 例如 star_w.json 的链接是 http:// sg.media-imdb.com/suggests/s/star_w.json
我的问题是你将如何构建类似的智能系统?他们是否有 6 个字母的所有可能组合作为 json 文件?您将如何生成这些文件? 另请注意,它不仅仅查看标题的开头来进行匹配。 因此,如果您搜索“战争”,您还会在搜索结果中找到“星球大战”。
任何帮助构建类似系统的帮助将不胜感激。 我认为该系统非常简洁、快速,并且可以用于多种用途!
编辑
好的,解决了。 我的解决方案是查询数据库和将结果缓存在平面 json 文件中的组合。 基本上,当用户键入“星球大战”时,每次击键时我都会首先检查是否存在搜索字符串的 json 文件。如果存在,我将只显示 json 文件的内容。如果文件不存在或者文件早于您设置的某个时间限制(例如早于 12 小时),您将查询数据库并以 json 形式回显输出。 同时将输出写入 json 文件。下次您进行搜索时,它只会从平面文件中获取内容,而不是从数据库中获取(当然,直到达到 TTL)。 我将字母的长度限制为 6 个字母,因此如果搜索字符串大于 6 个字母,则不会发生任何情况,您只需继续在最后找到的结果的搜索结果中查找,在“星球大战”的示例中,这将是star_w
使用此方法,缓存的 json 结果集合会根据用户输入自动填充。 您还可以编写一个脚本来循环表并生成 json 文件...
我知道这不是完美的解决方案,但这样至少您可以通过将结果缓存在平面文件中来承担数据库的一些负载。
I've been busting my head on this one so i finally decided to put it on Stackoverflow hoping someone could help me with this.
Let me sketch the problem:
- The most basic usage of an autosuggest feature would be to query the database on each keystroke.
The advantage of this method is that you always get the most recent results.
The disadvantages are that it can be slow and that basically it is overkill to address your database on each keystroke. Suppose you want to autosuggest a country table, searching for something like 'america' would generate 7 SELECT query's on the database !
This method is therefor not an option.
- The second option (the one i am currently using) is to collect all the possible results and save it into a flat file in JSON format.
So basically what you would have is a single file e.g. countries.json which contains all the countries. Instead of querying the database i just point to the countries json file and each keystroke will just search within the loaded json file.
This method is fast and leverages the database. In the example of the countries this would be a good solution since the content is not very dynamic and it does not contain that much data so the json flat file will be quite small so it will download fast.
But the problem arises when the json file contains dynamic data and has a large dataset.
What if you want to build an autosuggest search containing all the movies and actors?
Putting all the results inside one json file would be ridiculous since the file would probably be too large to download.
Let's take IMDB (www.imdb.com) as an example. They have an autosuggest feature on their website. Searching on a movie title or actor is lightning fast and it searches within hundreds of thousands of records. Looking deeper into their method i notice the following:
If you use the Google Chrome debug window (or Filezilla Firefox Firebug) :
Look inside the network tab for scripts. As you type say for example for "star wars" you notice that by typing each letter a separate json file is loaded.
first it's s.json, then st.json, sta.json, etc...
It stops at star_w.json
Apparently it always stops at the 6th letter.
Each json file it loads contains 8 results matching the letters. There seem to be 36 folders matching each letter of the alphabet including numbers.
E.g the link for star_w.json is http://sg.media-imdb.com/suggests/s/star_w.json
My question is how would you build a similar intelligent system? Do they have every possible combination of 6 letters as a json file? How would you generate the files?
Also notice it does not only look at start of the title to make a match.
So if you would search for "wars" you will also find "star wars" inside the search results.
Any help in figuring out building a similar system would be much appreciated.
I think the system is very neat and fast and it can be used for many purposes!
EDIT
Ok, worked it out.
My solution is a combination of querying the DB and caching the results in flat json files.
Basically when the user types e.g. "star wars", with each keystroke i will first check if there is a json file for the searchstring. If it exists i'll just display the contents of the json file. If the file does not exist or if the file is older than a certain timelimit you set (e.g older than 12hrs) you query the database and echo the output in json.
At the same time you write the output to a json file. The next time you make the search it will just get it's content from the flat files and not from the DB (until ofcourse the TTL is reached).
I've limited the length of letters to 6 letters so if the search string is larger than 6 letters nothing happens, you just continue to look within the search results of the last found results, in the example of "star wars" that would be star_w
With this method the collection of cached json results is automatically populated based on the user input.
You could also write a script that loops your table and generates the json files...
I know it's not the perfect solution but this way at least you can take some load of the DB by caching results in flat files.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会使用一些创建索引的搜索引擎软件,它比 RDMS 提供的功能更适合这些场景。
看一下 Softbool 的 Boolware:http://www.softbool.se/(单击 In English)链接...
糟糕的网页,很棒的产品。
I would use some search engine software that creates indexes, better suited for these scenarios than what the RDMS can provide.
Take a look at Boolware from Softbool: http://www.softbool.se/ (click the In English) link...
Horrible web page, great product.