组织报告制度。需要架构建议

发布于 2024-09-03 06:02:11 字数 333 浏览 7 评论 0原文

我们有几个遗产和组织中使用多个 RDBMS 供应商（以及更具体的数据存储）的 3'd 方系统。图表和模板群（winword、excel）需要跨系统数据报告（以及未在 3'd 方系统中实现的额外报告）。报告系统被设想为具有自定义用户访问报告的内联网网站。我们预计每天约 50 份报告。

如果商务部门不打算购买任何昂贵的东西，您是否建议使用BizTalk或任何其他集成软件？

您是否建议为定期填充的报告创建集中式数据存储，或者依赖始终提供最新请求数据的按需服务。集中式数据存储将带来使用 MSSQL 报告服务等标准工具的能力，但模板报告将使用轻量级解决方案进行自定义编码（正如我怀疑的那样）

提前谢谢您！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

下雨或天晴 2024-09-10 06:02:11

要选择理想的架构，您需要检查系统的一些动态。一些相关问题：

源数据多久更改或更新一次？
报告中的数据必须有多“新鲜”和实时？
您多久怀疑源系统将来可能会发生变化？
源数据结构彼此有何不同？
未来除了举报系统之外还会有其他的消费者吗？
除了示意性差异之外，数据中是否存在语义异质性？
模式有多复杂？

考虑到这一点，让我们检查两种数据聚合方法的优缺点：

中央数据仓库

报告系统和其他消费者的简单统一模式。
中心辐射型拓扑意味着每个源仅需要一个连接器。如果源发生变化，则只需修复一处连接。
数据可能不是新鲜的，因为它依赖于与终端系统的定期同步。
如果您的数据仓库架构不能满足未来的某些需求，则中心辐射型拓扑意味着您必须替换所有源系统连接器。
该模式是严格定义的，但需要一个广泛的验证器系统来强制执行语义。
您有机会在一个地方执行数据清理，纠正您已知的某些类别的脏数据。

点对点自定义连接器

尽可能接近实时数据。
所有连接器都是相互隔离的，如果源发生变化，您只需更改一个连接器。
模式和语义的一致性可能隐含在您的连接器中，但可能不会严格执行到公共数据库目标所隐含的程度。
对报告系统的更改或添加新目标可能需要您重新设计所有连接器。
报告系统必须承担任何必要的数据清理的责任。
如果这些连接器是面向消息的，那么 ESB（例如 Biztalk）可能是管理这些连接器的好方法。这会增加一些管理费用和费用，但您将获得可靠性和中央经纪人来帮助您。根据聚合系统的规模和预期增长，ESB 可能会也可能不会代表复杂性的净降低。

在这两种情况下，我认为连接器的构建可以通过商业产品、开源产品或普通的旧代码来完成。当您开始购买产品时，可能会有一些额外的花哨的东西（这可能会提高生产力），但主要的支出将是工程师的时间（编码和分析）。我建议：

如果您已经熟悉连接器的给定工具（例如 ETL），请继续考虑它。特别是，它会减少很多样板文件。
如果您没有使用这些系统的经验，请三思而后行 - 您正在将代码隐藏在工具中，这可能会给一次性项目带来更多的困惑而不是帮助。但从长远来看，削减样板并利用强加给你的结构可能是一件好事。
考虑到需求有一天会改变。确保您选择了一种易于调整和维护的技术。

当然没有唯一答案，但希望这可以帮助您检查正确的问题。我认为关键是管理复杂性，并意识到整个聚合网络有一天会发生变化。这只是时间问题。

To pick an ideal architecture, you'll need to examine some of the dynamics of your system. A few relevant questions:

How often does the source data change or update?
How "fresh" and real-time does the data have to be in the reports?
How often do you suspect that the source systems may change in the future?
How different are the source data structures from each other?
May there be other consumers besides the reporting system in the future?
In addition to schematic differences, are there semantic heterogeneities in the data?
How complex are the schemas?

With that in mind, let's examine the pros and cons of two data aggregation approaches:

Central Data Warehouse

Easy uniform schema for the reporting system and other consumers.
Hub-and-spoke topology means only one connector per source is required. If the source changes, there is only one place you need to fix the connection.
Data may not be fresh, as it relies on periodic synchronization with end systems.
If your data warehouse schema does not cover some future need, the hub-and-spoke topology means that you have to replace all the source system connectors.
The schema is rigidly defined, but an extensive system of validators is needed to enforce semantics.
You have an opportunity to perform data cleansing in one spot, correcting certain classes of dirty data known to you.

Point-to-Point Custom Connectors

As close to real-time data as possible.
All connectors are isolated from each other, and if a source changes then you need to change only one connector.
The uniformity of both schema and semantics may be implied in your connectors, but may not be rigidly enforced to the degree that a common database target would imply.
Changes to your reporting system or the addition of a new target may require you to rework all connectors.
The reporting system has to assume responsibility for any data cleansing necessary.
An ESB (e.g. Biztalk) may be a nice way to manage these connectors if they are message-oriented. It will add some overhead and expense, but you'll get reliability and a central broker to help you out. Depending on the size and expected growth of this aggregated system, and ESB may or may not represent a net reduction in complexity.

In both cases, I think the construction of the connectors can be accomplished with commercial products, open source products, or plain old code. There may be some extra bells and whistles (which may improve productivity) as you start to pay for products, but the major expense will be your engineers' time (both coding and analyzing). I would suggest:

If you are already fluent with a given tool for the connectors (e.g. ETL), go ahead and consider it. In particular, it'll cut a lot of the boilerplate.
If you don't have experience with these systems, think twice - you're masking the code in tools that might confuse more than help for a one-off project. But cutting the boilerplate and utilizing the structure forced on your may be a good thing over the long run.
Consider that the requirements WILL change someday. Make sure you've picked a technology that can be easily adapted and maintained.

There is of course no One Answer, but hopefully this helps you examine the right questions. I think the keys are managing complexity, and realizing that the overall aggregation network will change someday. It's just a matter of when.

回复收藏 0 原文

~没有更多了~