n 层系统是否“有意义”?用于大数据集处理?

发布于 2024-08-07 02:40:09 字数 342 浏览 4 评论 0原文

我最近成为编写我们“旗舰”产品的开发团队的一员。它主要是一个在 N 层系统中实现的读取密集型 Web 应用程序(asp.net(c#) 和 oracle)。数据库中的大部分写入都是通过外部服务(而不是通过 web 应用程序)完成的。他们不是在数据库中调度正常的批处理作业来进行数据聚合,而是将所有内容都推到业务层(有时会创建一亿个对象)。虽然这确实将所有“业务逻辑”保留在同一位置,但它也比在数据库中运行等效查询花费的时间大约 200 倍。对我来说这似乎是一个糟糕的主意。我在这里错了吗?这是标准的好东西吗?有没有人有任何真实的案例研究,我可以向我的同事指出(如果我错了,也可以向我自己指出)?

我不是在争论n层是好还是坏,而是它是否适合数据聚合处理等?

I recently became a part of the team of developers writing our "flagship" product. Its primarily a read intensive web app (asp.net(c#) and oracle) implemented in an N-tier system. Most of the writes in the DB are done through external services (not through the webapp). Instead of scheduling normal batch jobs in the DB for data aggregation, they're pushing everything up the tiers to the business layer (sometimes a creating a hundred million objects). While this does keep all the "business logic" in the same place, it also takes about 200 times longer than running the equivalent query in the database. This seems like a terrible idea to me. Am I wrong here and this is standard and good stuff? Does anybody have any real case studies I can point my co-workers towards (or myself if I'm in the wrong)?

I'm not debating whether n-tier is good or bad, but does it fit for data aggregation processing and the like?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜夜流光相皎洁 2024-08-14 02:40:09

您对处理时间(以及资源​​,如内存)的看法是正确的。

  • 最佳实践是聚合尽可能接近的数据,最好是在数据库中。一亿个物体看起来很疯狂。
  • 然而,我们都知道这样代码的可维护性较差。因此,最终会花费更多的开发时间和更多的成本。

所以你需要达到一个正确的平衡。这无法从外部降临到你身上,
您必须仔细权衡项目特定环境中的优势

例如,所有这些发生的频率非常重要。如果这个过程每分钟都发生,那么高成本显然是可以接受的,但如果每年都发生的话,可能就不行了……


也许正确的平衡需要两者兼而有之。例如,为了获得良好的投资回报率:

  • 对数据库的查询可以进行第一级聚合,消除微小的细节,并将要创建的对象数量减少一百个。
  • 业务层可以应用其余规则

哪些因素是查询中所关注的需求的良好候选者:

  • 低级聚合,它会删除脱离数据库的对象(或行)的数量
  • 很少更改
  • 规则用 SQL 轻松读取

为了使您的代码更加明确(并减少查询之间的重复),我建议您的代码采用使其清晰的编译时结构。创建显式常量或函数来体现您将放入查询中的每个业务规则,并使用它们来构建(在运行时或编译时)您的查询。

You are right about the processing time (and also ressources, like memory).

  • Best-practices are to aggregate the closest possible to the data, ideally in the database. A hundred million object seem crazy.
  • However, we all know that the code is less maintainable that way. So it is more development time, and more cost in the end.

So you need to reach a correct balance. This cannot come to you from the outside,
you must weight carefully the advantages in the specific context of your project.

For example, the frequency all this happens matters a lot. The high cost is obviously acceptable if the process happens every minute, but probably not if it happens every year...


Maybe the correct balance would take a bit of both. For example, for a good ROI:

  • the queries to the database could do a first level of aggregation, getting rid of the tiny details, and dropping the number of objects to be created by a hundred.
  • the business layer could apply the rest of the rules

What makes a good candidate for a requirement being taken care in the query:

  • low level aggregation that drops the number of objects (or lines) that get out of the database
  • rules that rarely change
  • rules that read easily in SQL

To make your code more explicit (and reduce duplication between queries), I suggest that your code adopts a compile-time structure that makes it clear. Create explicit constants or functions that embody each business rule that you will put in a query, and use that to build (at runtime or compile-time) your queries.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文