实施大规模分层、地理新闻搜索的正确方法是什么?

发布于 2024-08-27 16:10:02 字数 586 浏览 12 评论 0原文

我工作的公司从事发送新闻稿的业务。我们希望感兴趣的各方能够根据多种标准搜索新闻稿,其中最重要的是位置。例如,某人可能会在“交通”主题下搜索从政府机构发送到马萨诸塞州纽约市或邮政编码 89134 的所有新闻。或者无论如何。

问题是,我们实际上已经发送了数十万份新闻稿。搜索缓慢且复杂。例如,发送到纽约皇后区的新闻稿应该显示在我上面提到的搜索中,即使它没有专门发送到纽约市,因为皇后区是纽约市的一部分。我们可能还想对查询实施“与”和“或”以及否定和文本搜索以创建复杂的搜索。这些搜索还必须足够快才能充当动态 RSS 提要。

我真的对搜索理论一无所知,也不知道它是如何正确完成的。我们现在的做法是使用数据集市将版本发送到的位置存储在单个表中。然而,由于上面提到的子集,数据集市非常庞大,有数百万行。而且我们甚至还没有实现城市,而美国大约有5万个城市,这将使数据集市的规模呈指数级增长,这样的规模恐怕就行不通了。

无论如何,我意识到这不是一个简单的问题,也不会有“这样做”的答案。然而,我希望你们中的一位能够为我指明正确的方向,让我可以了解如何进行大规模搜索?因为我真的对此一无所知。事实证明,这样的搜索引擎的制作极其困难。谢谢!我知道一定有办法,因为如果谷歌可以搜索整个互联网,我们必须能够搜索我们自己的数据库:-)

The company I work for is in the business of sending press releases. We want to make it possible for interested parties to search for press releases based on a number of criteria, the most important being location. For example, someone might search for all news sent to New York City, Massachusetts, or ZIP code 89134, sent from a governmental institution, under the topic of "traffic". Or whatever.

The problem is, we've sent, literally, hundreds of thousands of press releases. Searching is slow and complex. For example, a press release sent to Queens, NY should show up in the search I mentioned above even though it wasn't specifically sent to New York City, because Queens is a subset of New York City. We may also want to implement "and" and "or" and negation and text search to the query to create complex searches. These searches also have to be fast enough to function as dynamic RSS feeds.

I really don't know anything about search theory, or how it's properly done. The way we are getting by right now is using a data mart to store the locations the releases were sent to in a single table. However, because of the subset thing mentioned above, the data mart is gigantic with millions of rows. And we haven't even implemented cities yet, and there are about 50,000 cities in the United States, which will exponentially increase the size of the data mart by so much I'm afraid it just won't work anymore.

Anyway, I realize this is not a simple question and there won't be a "do this" answer. However, I'm hoping one of you can point me in the right direction where I can learn about how massive searches are done? Because I really know nothing about it. And such a search engine is turning out to be incredibly difficult to make. Thanks! I know there must be a way because if Google can search the entire internet we must be able to search our own database :-)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

初相遇 2024-09-03 16:10:02

Google can search the entire internet, and your data via a Google Appliance!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文