具有高度动态数据的高吞吐量服务的案例研究或示例

发布于 2024-09-13 23:52:39 字数 789 浏览 18 评论 0原文

我正在寻找一些关于我可能需要解决的工作问题的架构想法。

问题。
1) 我们的企业 LDAP 已成为“联系人大师”，充满了多年的陈旧数据以及未使用和未维护的属性。
2) 管理层决定LDAP将不再作为公司电话簿。仅用于授权目的。
3) 该公司拥有数百个不同来源的人员联系类型数据。我们需要清除 LDAP 中的所有垃圾，并为其他应用程序提供一个中央存储库来存储有关某个人的所有这些数据。

理想目标
1）有一个单一的来源来存储一个人的所有各种属性
2) 该公司可能拥有 50 万个人的信息（读取 50 万行）
3) 我估计这些人可能有 500 到 1000 个可选属性。（阅读 500 多篇专栏）
4）数据主要通过 jms 上的 xml 设置/获取（此基础设施已经就位）
5）公司内的各个小组可以“拥有”专栏。只有他们才被允许写入他们的列，他们将负责保持数据干净。
6) 单个记录查找应在亚秒内返回
7) 系统在高峰时应支持每小时 100 万个请求。
8) 主要目标是为企业提供实时数据，报告是次要目标。
9) 我们是一家java、oracle、terradata商店。我们是典型的大型 IT 商店。

我的想法：
1) 最初我认为 LDAP 可能有效，但添加新列时它无法扩展。
2）我的下一个想法是某种无sql解决方案，但从我所读到的内容来看，我认为我无法获得我需要的性能，而且它仍然相对较新。我不确定我能否让我的经理为如此重要的项目签署类似的协议。
3）我认为该解决方案将有一个元数据组件，它将跟踪谁拥有这些列以及每列代表什么，以及原始源系统。

感谢您的阅读，并提前感谢您的任何想法。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小猫一只 2024-09-20 23:52:39

SQL

借助 Teradata 级工具，基于 SQL 的解决方案可能是可行的。不久前，我看到一篇关于数据库设计的文章，其中讨论了"锚定建模"。

基本上，这个想法是创建一个单一的、愚蠢的、合成的主键表，而所有真实或元数据都存在于其他表（子集）中，并通过外键+连接的方式附加。

我认为这种设计有两个好处。首先，您可以出于组织或性能原因更轻松地划分数据存储。其次，您只需为任何给定子集中具有数据的记录创建额外的行，因此您使用的空间更少，索引和搜索速度更快。

子集可能基于维护者或其他一些标准。 XML set/get 将针对每个子集/记录（而不是全局记录）。给定记录的所有子集都可以组合并缓存。可以为元数据、搜索索引等创建附加子集，并且可以独立查询这些。

NoSQL

NoSQL 看起来与 LDAP 类似（至少在理论上），但是一个好的 NoSQL 工具的好处包括对元数据、版本控制和组织的更大抽象。事实上，从我所读到的内容看来，NoSQL 数据存储旨在解决您提出的有关扩展和松散结构数据的一些问题。关于数据存储的一个很好的问题。

生产 NoSQL

有少数大公司在大规模环境中使用 NoSQL，例如 Google 的 Bigtable。它似乎是完美的工具：

6）应在亚秒内返回单个记录查找
7) 系统高峰时应支持每小时100万个请求。

据我所知，Bigtable 只能通过 AppEngine 使用。其他类似的技术此处列出。

其他想法

无论您决定使用哪种技术，大局观看起来都或多或少相同。例如，划分存储、复合视图、缓存视图、将元数据粘贴到某处以便您可以找到东西。

您所瞄准的性能特征将需要基于实际使用模式的某种缓存和/或优化。无论您选择哪种解决方案，您都可能无法在设计阶段解决该问题。

SQL

With Teradata-grade tools an SQL-based solution may be feasible. I came across an article on database design awhile ago that discussed "anchor modeling".

Basically, the idea is to create a single, dumb, synthetic primary key table, while all real or meta data lives in other tables (subsets) and is attached by way of a foreign key + join.

I see the benefit of this design being two-fold. First, you can more easily compartmentalize data storage either for organizational or performance reasons. Second, you only create additional rows for records that have data in any given subset, so you use less space and indexing and searching are faster.

Subsets might be based on maintainer or some other criteria. XML set/get would be per-subset/record (rather than global record). All subsets for a given records can be composited and cached. Additional subsets can be created for metadata, search indexes, etc., and these can be queried independently.

NoSQL

NoSQL seems similar to LDAP (in theory, at least) but the benefit of a good NoSQL tool would include greater abstraction of metadata, versioning, and organization. In fact, from what I've read it seems that NoSQL datastores are designed to address some of the issues you've raised with respect to scaling and loosely structured data. There's a good question on SO regarding datastores.

Production NoSQL

Off-hand, there are a handful of large companies using NoSQL in massively-scaled environments, such as Google's Bigtable. It seems like the perfect tool for:

6) a single record lookup should be returned in sub seconds
7) system should support 1 million requests per hour at peak.

Bigtable is only available (to my knowledge) through AppEngine. Other, similar technologies are listed here.

Other Thoughts

The bigger picture view looks more or less the same regardless of the technology you decide to use. E.g. compartmentalize storage, composite views, cache views, stick metadata somewhere so you can find things.

The performance characteristics you're targeting are going to require some kind of caching and/or optimization based on real-world usage patterns. Regardless of the solution you choose, you probably can't resolve that in the design phase.

回复收藏 0 原文