数据仓库项目的存储过程与 JDO

发布于 2024-08-19 04:43:19 字数 610 浏览 3 评论 0原文

过去我们通过存储过程来访问数据库。它们被视为管理数据的“更好”方式。我们将数据保存在数据库中,任何语言/平台都可以通过 JDBC/ODBC/等访问它。

然而,近年来,基于运行时反射/元数据的存储检索机制(例如 Hibernate/DataNucleus)已经变得流行。最初,我们担心它们会很慢,因为涉及额外的步骤(反射成本很高),以及当我们只需要一个字段时它们如何检索不必要的数据(整个对象)。

我开始计划一个使用 J2EE 的大型数据仓库项目,但我有点不确定是选择存储过程还是 JDO/JPA 等。最近,我一直在使用 Hibernate,说实话,我很怀念编写 CRUD 存储过程!

它本质上可以归结为:

存储过程
+ 可以在服务器上进行优化(尽管只是查询)
- 每个表可能有超过一千个存储过程:添加、删除、更新、getById 等。

JDO
+ 我不会在接下来的几个月里写parameters.add("@firstNames", customer.getFirstName()); ...
- 会比 SP 慢(但大多数支持分页)

在我的情况下你会选择什么?在这种情况下,我认为这是一个很大的问题。

谢谢,

约翰

In the old days we used to access the database through stored procedures. They were seen as `the better' way of managing the data. We keep the data in the database, and any language/platform can access it through JDBC/ODBC/etc.

However, in recent years run-time reflection/meta-data based storage retrieval mechanisms such as Hibernate/DataNucleus have become popular. Initially we were worried that they'd be slow because of the extra steps involved (reflection is expensive) and how they retrieve unnecessary data (the whole object) when all we need is one field.

I'm starting to plan for a large data warehousing project that uses J2EE, but I'm a bit unsure whether to go for Stored Procedures or JDO/JPA and the like. Recently, I've been working with Hibernate, and to be quite honest, I don't miss writing CRUD stored procedures!

It essentially boils down to:

Stored procedures
+ Can be optimised on the server (although only the queries)
- There's likely to be more than a thousand stored procedures: add, delete, update, getById, etc, for each table.

JDO
+ I won't spend the next few months writing parameters.add("@firstNames", customer.getFirstName()); ...
- Will be slower than SPs (but most support paging)

What would you plump for in my situation. In this case I think it's a much of a muchness.

Thanks,

John

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

南街女流氓 2024-08-26 04:43:19

“JDO - 会比 SP 慢(但大多数支持分页)”

这个假设通常是错误的。 SP 没有理由特别快。我已经做了一些测量,它们并不比数据库外部的代码快。

数据仓库的特点是仅插入加载和长时间运行的SELECT...GROUP BY...查询。

您不是在编写 OLTP 事务处理。您没有使用 3NF 作为防止更新/删除事务中出现更新异常的方法。

由于您正在进行批量插入,因此 SP 肯定会比批量加载实用程序慢。批量加载器通常是多线程的,并且会消耗所有可用的 CPU 资源。 SP是DB的一部分,只能共享有限的DB资源。

由于您主要执行SELECT GROUP BY,因此 SP 在这里也没有多大帮助。 SELECT 语句不会因包含在过程中而受益。

你不需要它们。他们没有帮助。

您可以轻松地对批量加载和查询进行基准测试,以证明 SP 没有帮助。

"JDO - Will be slower than SPs (but most support paging)"

This assumption is often false. There's no reason for SP's to be particularly fast. I've done some measurements and they're no faster than code outside the database.

A data warehouse is characterized by insert-only loads and long-running SELECT...GROUP BY... queries.

You're not writing OLTP transactional processing. You're not using 3NF as a way to prevent update anomalies on update/delete transactions.

Since you're doing bulk inserts, a SP will definitely be slower than a bulk load utility. Bulk loaders are often multi-threaded and will consume all available CPU resources. The SP is part of the DB and can only share limited DB resources.

Since you're mostly doing SELECT GROUP BY, a SP won't help much here, either. The SELECT statement doesn't benefit from being wrapped in a procedure.

You don't need them. They don't help.

You can easily benchmark a bulk-load and a query to demonstrate that SP's aren't helping.

那请放手 2024-08-26 04:43:19

Rod Johnson 在他的《J2EE Design and Development》中写了一篇非常清晰的关于 ORM/StoredProcedures 的分析。他说

存储过程只能在 J2EE 系统中用于执行始终大量使用数据库的操作,无论它们是在数据库中实现还是在与数据库交换大量数据的 Java 代码中实现。

当您计划实现数据仓库时,我认为存储过程方法是正确的选择。

Rod Johnson in his "J2EE Design adn Development" wrote a very clear analysis about ORM/StoredProcedures. He said that

Stored procedures should only be used in a J2EE system to perform operations that will always use the database heavily, whether they're implemented in the database or in Java code that exchanges a lot of data with the database.

As you're planning to implement a datawarehouse, I think that the stored procedures approach is the right choice.

枕花眠 2024-08-26 04:43:19

我建议使用元数据来生成用于加载到数据仓库中的脚本。这使您可以通过使用专门的加载工具以及存储过程(如果您使用的是足够古老的数据库)来获得性能优势。另外,您可能最终会至少手工编写一些 SQL 代码。将通用脚本作为存储过程完成将允许您以相同的方式安排所有脚本,并且不必担心在重写某些生成的代码以使其运行得更好时更改它们的调用方式。

至于获取数据,如果您在 J2EE 中构建的是报告工具,那么使用 JDO 可能会更好。虽然我对报告方面不太熟悉,但我可以看到的一个好处是,允许您的最终用户更容易制作您事先没有预料到的自定义报告(尽管您仍然必须拥有对他们可以做的事情有一些限制,这样他们就不会在此过程中删除数据库)。

I would suggest using the metadata to generate the scripts you use for loading into the data warehouse. This allows you to get performance benefits from using specialised load tools and perhaps from stored procedures (if you're using a sufficiently ancient database). Also, you will probably end up hand coding at least some SQL. Having your generic scripts done as stored procs will allow you to schedule all of them in the same way and not have to worry about changing how they are invoked when you rewrite some generated code to make it run better.

As for getting the data out, if what you're building in J2EE is a reporting tool, then you may be better off using JDO. While I'm not terribly familiar with the reporting side of things, one benefit I can see is that it will be easier to allow your end users to make custom reports that you did not anticipate in advance (although you've still got to have some limits on what they can do so that they don't take down the database in the process).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文