具有高度动态数据的高吞吐量服务的案例研究或示例
我正在寻找一些关于我可能需要解决的工作问题的架构想法。
问题。
1) 我们的企业 LDAP 已成为“联系人大师”,充满了多年的陈旧数据以及未使用和未维护的属性。
2) 管理层决定LDAP将不再作为公司电话簿。仅用于授权目的。
3) 该公司拥有数百个不同来源的人员联系类型数据。我们需要清除 LDAP 中的所有垃圾,并为其他应用程序提供一个中央存储库来存储有关某个人的所有这些数据。
理想目标
1)有一个单一的来源来存储一个人的所有各种属性
2) 该公司可能拥有 50 万个人的信息(读取 50 万行)
3) 我估计这些人可能有 500 到 1000 个可选属性。 (阅读 500 多篇专栏)
4)数据主要通过 jms 上的 xml 设置/获取(此基础设施已经就位)
5)公司内的各个小组可以“拥有”专栏。只有他们才被允许写入他们的列,他们将负责保持数据干净。
6) 单个记录查找应在亚秒内返回
7) 系统在高峰时应支持每小时 100 万个请求。
8) 主要目标是为企业提供实时数据,报告是次要目标。
9) 我们是一家java、oracle、terradata商店。我们是典型的大型 IT 商店。
我的想法:
1) 最初我认为 LDAP 可能有效,但添加新列时它无法扩展。
2)我的下一个想法是某种无sql解决方案,但从我所读到的内容来看,我认为我无法获得我需要的性能,而且它仍然相对较新。我不确定我能否让我的经理为如此重要的项目签署类似的协议。
3)我认为该解决方案将有一个元数据组件,它将跟踪谁拥有这些列以及每列代表什么,以及原始源系统。
感谢您的阅读,并提前感谢您的任何想法。
I'm looking for some architecture ideas on a problem at work that I may have to solve.
the problem.
1) our enterprise LDAP has become a "contact master" filled with years of stale data and unused and unmaintained attributes.
2) management has decided that LDAP will no longer serve as a company phone book. it is for authorization purposes only.
3) the company has contact type data about people in hundreds of different sources. we need to scrub all the junk out of LDAP and give the other applications a central repo to store all this data about a person.
the ideal goal
1) have a single source to store all the various attributes about a person
2) the company probably has info on 500k people ( read 500K rows)
3) i estimate there could be 500 to 1000 optional attributes on these people. (read 500+ columns)
4) data would primarily be set/get via xml over jms (this infrastructure is already in place)
5) individual groups within the company could "own" columns. only they would be allowed to write to their columns, they would be responsible for keeping the data clean.
6) a single record lookup should be returned in sub seconds
7) system should support 1 million requests per hour at peak.
8) the primary goal is to serve real time data to the enterprise, reporting is a secondary goal.
9) we are a java, oracle, terradata shop. we are your typical big IT shop.
my thoughts:
1) originally i thought LDAP might work, but it doesn't scale when new columns are added.
2) my next thought was some kind of no-sql solution, but from what i have read, I don't think i cant get the performance I need, and its still relatively new. I'm not sure i can get my manager to sign off on something like that for such a critical project.
3) i think there will be a meta-data component to the solution that will track who owns the columns and what each column represents, and the original source system.
Thanks for reading, and thanks in advance for any thoughts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
SQL
借助 Teradata 级工具,基于 SQL 的解决方案可能是可行的。不久前,我看到一篇关于数据库设计的文章,其中讨论了"锚定建模"。
基本上,这个想法是创建一个单一的、愚蠢的、合成的主键表,而所有真实或元数据都存在于其他表(子集)中,并通过外键+连接的方式附加。
我认为这种设计有两个好处。首先,您可以出于组织或性能原因更轻松地划分数据存储。其次,您只需为任何给定子集中具有数据的记录创建额外的行,因此您使用的空间更少,索引和搜索速度更快。
子集可能基于维护者或其他一些标准。 XML set/get 将针对每个子集/记录(而不是全局记录)。给定记录的所有子集都可以组合并缓存。可以为元数据、搜索索引等创建附加子集,并且可以独立查询这些。
NoSQL
NoSQL 看起来与 LDAP 类似(至少在理论上),但是一个好的 NoSQL 工具的好处包括对元数据、版本控制和组织的更大抽象。事实上,从我所读到的内容看来,NoSQL 数据存储旨在解决您提出的有关扩展和松散结构数据的一些问题。 关于数据存储的一个很好的问题。
生产 NoSQL
有少数大公司在大规模环境中使用 NoSQL,例如 Google 的 Bigtable。它似乎是完美的工具:
据我所知,Bigtable 只能通过 AppEngine 使用。其他类似的技术此处列出。
其他想法
无论您决定使用哪种技术,大局观看起来都或多或少相同。例如,划分存储、复合视图、缓存视图、将元数据粘贴到某处以便您可以找到东西。
您所瞄准的性能特征将需要基于实际使用模式的某种缓存和/或优化。无论您选择哪种解决方案,您都可能无法在设计阶段解决该问题。
SQL
With Teradata-grade tools an SQL-based solution may be feasible. I came across an article on database design awhile ago that discussed "anchor modeling".
Basically, the idea is to create a single, dumb, synthetic primary key table, while all real or meta data lives in other tables (subsets) and is attached by way of a foreign key + join.
I see the benefit of this design being two-fold. First, you can more easily compartmentalize data storage either for organizational or performance reasons. Second, you only create additional rows for records that have data in any given subset, so you use less space and indexing and searching are faster.
Subsets might be based on maintainer or some other criteria. XML set/get would be per-subset/record (rather than global record). All subsets for a given records can be composited and cached. Additional subsets can be created for metadata, search indexes, etc., and these can be queried independently.
NoSQL
NoSQL seems similar to LDAP (in theory, at least) but the benefit of a good NoSQL tool would include greater abstraction of metadata, versioning, and organization. In fact, from what I've read it seems that NoSQL datastores are designed to address some of the issues you've raised with respect to scaling and loosely structured data. There's a good question on SO regarding datastores.
Production NoSQL
Off-hand, there are a handful of large companies using NoSQL in massively-scaled environments, such as Google's Bigtable. It seems like the perfect tool for:
Bigtable is only available (to my knowledge) through AppEngine. Other, similar technologies are listed here.
Other Thoughts
The bigger picture view looks more or less the same regardless of the technology you decide to use. E.g. compartmentalize storage, composite views, cache views, stick metadata somewhere so you can find things.
The performance characteristics you're targeting are going to require some kind of caching and/or optimization based on real-world usage patterns. Regardless of the solution you choose, you probably can't resolve that in the design phase.
一些想法:
这实际上并不是一个技术问题。无论是否使用 LDAP,新系统也会遇到此问题。
有很多巨大 LDAP 系统。 LDAP 无疑是一门黑暗艺术,但我敢打赌,在这种情况下,它的扩展性比任何 SQL 等效项都要好。更不用说 LDAP 是此类信息的标准,因此可以从无数不同类型的系统访问它。
也许您正在寻找一个更易于管理/具有更好管理工具的新 LDAP 系统?
A couple thoughts:
This isn't really a technological problem. You will have this problem with a new system as well, LDAP or not.
There are lots of huge LDAP systems out there. LDAP is surely a dark art, but I'd willing to bet that it scales better than any SQL equivalent in this situation. Not to mention that LDAP is a standard for this kind of info, and as such it is accessible from zillions of different kinds of systems.
Maybe what you're looking for is a new LDAP system that's easier to manage / has better admin tools?
您可能想研究一下 Len Silverston 的派对模型。这是他的书的链接:http://www.amazon.com/Data-Model -资源书籍-Vol/dp/0471380237。
我没有构建如此规模的东西的经验,尽管我认为将其视为 500k 行 x 500 - 1000 列听起来有点荒谬。
You may want to look into Len Silverston's Party Model. Here's a link to his book: http://www.amazon.com/Data-Model-Resource-Book-Vol/dp/0471380237.
I have no experience building something on that scale, though I think that thinking of it as 500k rows x 500 - 1000 columns sounds a bit ridiculous.