实时网络搜索(.Net 中)

发布于 2024-08-09 21:54:35 字数 597 浏览 2 评论 0原文

你们将如何在.Net 平台上创建一个“实时”搜索引擎。近乎实时的网络搜索现在非常流行,我希望你们能帮助我集思广益一些想法。我最终可能会尝试制作一些原型,但大多数情况下这只是一种“心理训练”。

要求是:

  1. .NET 平台、IIS、MS SQL 服务器或 Lucene.Net(文件系统)
  2. 要索引的输入数据只是关键字加上一些元信息 - 无需进一步处理所需的
  3. 数据按关键字分组并按出现次数排序 不保留关键字的
  4. 历史数据(超过某个固定时间的数据将被丢弃或移动到其他数据存储)

对主题不太了解,这就是我到目前为止所想到的:

数据被馈送通过网络服务连接到系统。由于数据已经是关键字的形式,因此不进行进一步的处理。 WS将数据保存到db。 Select查询以固定的时间间隔执行以返回数据(例如:我们查询过去一小时的传入数据并每秒执行一次查询)。分组和排序在内存中执行,以减轻sql server的负担。数据库中的旧数据每隔几分钟就会被丢弃。 我不确定如果不断添加许多新行,sql server 将如何处理。 然后显示分组和排序的数据。

我相信你们对于这类事情有更多的经验和更好的想法。

问候,

翁德雷

How would you guys go about creating a "real-time" search engine on .Net platform. Near real-time search of the web is so popular nowadays and I was hoping you guys would help me brainstorm some ideas. I might try to make some prototype eventually, but mostly it is just a "mental training".

The requirements are:

  1. .NET platform, IIS, MS SQL server or Lucene.Net (file-system)
  2. input data to be indexed are only keywords plus some meta information - no further processing required
  3. data are grouped by keywords and ordered by number of occurrences of the keywords
  4. no historic data are kept (data older than some fixed amount of time are discarded or moved to some other data store)

Not knowing much about the subject matter, this is what I've come up with so far:

Data are fed to the system through web service. Since data are already in form of keywords, no further processing is performed. WS saves data to db. Select query is performed in fixed time intervals to return data (for example: we query incoming data for past hour and perform the query every second). Grouping and sorting is performed in memory to offload the sql server. Old data in db are discarded every couple minutes.
I'm not sure how would sql server handle that if there were many new rows added constantly.
Grouped and sorted data are then displayed.

I'm sure you guys have more experience and better ideas for this kind of thing.

Regards,

Ondrej

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

萌梦深 2024-08-16 21:54:35

根据您对系统的描述,一个简单的数据库架构可能如下所示


- id(主键)
- 关键字(唯一)

输入
- id(主键)
- 数据(文本)

输入关键字
- id(主键)
- input_id(外键)
-keyword_id(外键)
- count(整数;id为keyword_id的关键字出现在id为input_id的输入中的次数)
-expiration_date(时间戳;定期删除所有已过期的条目)

数据操作如下:

  1. 写入:每当执行输入操作时,数据库引擎都必须处理写入所有三个的写入操作表。
  2. 读取:每当执行搜索操作时,数据库引擎都需要处理所有三个表的读取操作
  3. 。 删除:定期删除 input_keyword 中的条目,如果需要,还需要删除关键字表中的条目。

在流量大的系统上,您的数据库会经常受到攻击。由于您实际上只是为了方便在这些表中执行 SELECT 操作而使用数据库,并且数据的生命周期非常短暂,因此您最好使用内存中的数据结构来替换“关键字”和“ input_keyword”表以消除对磁盘的命中。这可能需要更复杂的应用程序代码,但在繁忙的系统上可能是值得的。

From your description of the system, a bare-bones database schema might look like the following:

keyword
- id (primary key)
- keyword (unique)

input
- id (primary key)
- data (text)

input_keyword
- id (primary key)
- input_id (foreign key)
- keyword_id (foreign key)
- count (integer; the number of times keyword with id keyword_id appears in input with id input_id)
- expiration_date (timestamp; at regular intervals, all entries that have expired need to be deleted)

Data operations would be as follows:

  1. Writes: Whenever an input operation is performed, your database engine will have to handle a write operation that writes to all three tables.
  2. Reads: Whenever a search operation is performed, your database engine will need to handle a read operations across all three tables
  3. Deletes: At regular intervals, you'll need to remove entries in input_keyword and, if desired, keyword tables.

On a highly trafficked system, your database will be hit quite often. Since you are really only using the database for the convenience of performing SELECT operations across these tables, and since the data is very short-lived, you might be better off just using an in-memory data structure to replace the "keyword" and "input_keyword" tables to eliminate hits to disk. This may require more complex application code, but it may be worth it on a busy system.

も星光 2024-08-16 21:54:35

该网站并不是真正的集思广益或帮助您设计应用程序。

您可能想将其发布到 http://answers.onstartups.com/ 上,看看有哪些要求和建议这个想法是,看看实时网络搜索是否有任何商业意义。

但是,您需要确定如何才能比 Google 更快。

This site is not really for brainstorming, or to help you design applications.

You may want to post this on http://answers.onstartups.com/ and see what requirements and suggestions on this idea would be, to see if there is any business sense to a real-time websearch.

But, you would need to determine how you can go faster than Google.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文