数据库规范化和快速搜索

发布于 2024-10-10 22:02:32 字数 515 浏览 0 评论 0原文

我正在研究内容解决方案集成的技术架构。来自解决方案提供商的数据运行到数百万行并标准化为 3NF。它定期更新(最有可能是每天),并且其数据被分解为非常细粒度的原子性级别。

我需要搜索和查询这些数据,而我当前的倾向是单独保留规范化数据,并根据其数据(OLAP 到 OLTP)创建非规范化数据库。 “传输”可以是定制的程序,除了原始复制功能之外,还可以包含必要的业务逻辑,并根据需要按设定的时间表运行。非规范化数据库将降低原子性并允许关键字搜索和查询高效运行。我正在考虑使用 Lucene .NET 在非规范化数据库上进行关键字工作。

因此,在我在山上大声歌唱这是前进的道路之前,我想听听专家对此的意见以及什么是公认的“最佳实践”。考虑到我将提供的数据,我建议的方法是否是最佳的前进方式?有人建议我也许可以使用“搜索引擎”来搜索标准化数据。这把我吓坏了,但也提出了一个问题:什么搜索引擎以及如何搜索?

意见、火焰、脏话和帮助表示赞赏:)

I'm working on the technical architecture for a content solution integration. The data from the solution provider runs to millions of rows and normalised to 3NF. It is updated on a regular schedule (daily most likely) and its data is split down to a very granular level of atomicity.

I need to search and query this data and my current inclination is to leave the normalised data alone and create a denormalised database from its data (OLAP to OLTP). The 'transfer' can be a custom built program that can contain the necessary business logic in addition to the raw copying power and be run at a set schedule as required. The denormalised database would then reduce the atomicity and allow the keyword searches and queries to run efficiently. I was looking at using Lucene .NET for the keyword work on the denormalised database.

So before I sing loudly from the hills that this is the way forward, I wanted some expert opinion on this and what is the perceived "best practise". Is the method I have suggested the best way forward considering the data I will be provided? It was suggested that perhaps I could use a 'search engine' to search the normalised data. This scared the hell out of me, but raised the question; what search engine and how?

Opinions, flames, bad language and help appreciated :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

时光与爱终年不遇 2024-10-17 22:02:32

我根据以标准化形式存储的数据构建了报告数据库和数据仓库。传输程序(ETL)涉及相当多的工作。根据您对数据馈送的描述,馈送器可能已经为您完成了其中一些工作。

如今,数百万行并不是很多。您也许可以在现有数据库中使用面向报告的视图。尝试一下看看。

构建面向 OLAP 的数据库的最大好处不是速度。这是灵活性。 “我们喜欢这份报告,但现在我们希望每周和每季度查看一次,而不是每月一次。砰!完成!” “你能按营销类别而不是制造类别来细分吗?砰!完成!”等等。

I have built reporting databases and data warehouses based on data stored in normalized form. There is quite a bit of work involved in the transfer program (ETL). Given your description of the data feed, maybe some of that work has been done for you by the feeder.

Millions of rows isn't a lot, these days. You may be able to get away with report oriented views into the existing database. Try it and see.

The biggest benefit to building an OLAP oriented database is not speed. It's flexibility. "We love this report, but now we want to see it weekly and quarterly instead of monthly. Bam! Done!" "Can you break it down by marketing category instead of manufacturing category? Bam! Done!" And so on.

等待我真够勒 2024-10-17 22:02:32

合理归一化模型 (3NF/BCNF) 可为最大数量的场景提供最佳的平均性能和最少的修改异常量。那很大,所以我将从那里开始。由于您的要求很模糊,这似乎是最明智的选择。

实际上,最明智的做法是仔细检查这些要求,直到它们变得更加“清晰”;)

此外,如果您可以从数据提供商处获得一些早期摘录,您可以对其进行实验并获得感觉数据分布(并非所有人都生活在一个国家,有些国家比其他国家拥有更多人口。并非所有人都有孩子,并且每个国家的每人孩子数量有很大不同)。这是重点,对于优化器能够做出正确的决策至关重要。

除此之外,我同意沃尔特所说的一切,并投了他一票。

A resonably normalized model (3NF/BCNF) provides the best average performance and the least amount of modification anomalies for the largest number of scenarios. That's big, so I would start from there. As your requirements are fuzzy, it's seems like the most sensible option.

Actually, the most sensible thing would be to go over the requirements until they are a bit more "crisp" ;)

Also, if you could get your hands on a few early extracts from your data provider you could experiment with it and get a feeling for the data distributions (not all people live in one country, and some countries holds more people than others. Not all people have children, and the number children per person is vastly different depending on the country). This is a major point and it is crucial that the optimizer can make good decisions.

Other than that, I agree with everything Walter said and also gave him my vote.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文