SQL 连接生存期

发布于 2024-07-21 01:37:28 字数 667 浏览 3 评论 0原文

我正在开发一个 API 来查询数据库服务器(在我的例子中是 Oracle)以检索大量数据。 (这实际上是 JDBC 之上的一层。)

我创建的 API 尝试尽可能限制将每个查询信息加载到内存中。 我的意思是,我更喜欢迭代结果集并逐一处理返回的行,而不是将每一行加载到内存中并稍后处理它们。

但我想知道这是否是最佳实践,因为它有一些问题:

  • 在整个处理过程中保留结果集,如果处理与检索数据一样长,则意味着我的结果集打开时间将是原来的
  • 两倍我的处理循环中的另一个查询意味着在我已经使用一个结果集时打开另一个结果集,同时开始打开太多结果集可能不是一个好主意。

另一方面,它有一些优点:

  • 结果集的内存中从来不会有超过一行的数据,因为我的查询往往返回大约 100k 行,所以这可能是值得的。
  • 由于我的框架很大程度上基于函数式编程概念,因此我从不依赖同时在内存中的多行。
  • 当数据库引擎仍在返回其他行时开始处理返回的第一行可以极大地提高性能。

为了回应甘道夫,我添加了一些更多信息:

  • 我总是必须处理整个结果集
  • 我不进行任何行聚合

我正在与主数据管理应用程序集成并检索数据以便验证它们或导出它们使用许多不同的格式(ERP、网络平台等)

I am working on an API to query a database server (Oracle in my case) to retrieve massive amount of data. (This is actually a layer on top of JDBC.)

The API I created tries to limit as much as possible the loading of every queried information into memory. I mean that I prefer to iterate over the result set and process the returned row one by one instead of loading every rows in memory and process them later.

But I am wondering if this is the best practice since it has some issues:

  • The result set is kept during the whole processing, if the processing is as long as retrieving the data, it means that my result set will be open twice as long
  • Doing another query inside my processing loop means opening another result set while I am already using one, it may not be a good idea to start opening too much result sets simultaneously.

On the other side, it has some advantages:

  • I never have more than one row of data in memory for a result set, since my queries tend to return around 100k rows, it may be worth it.
  • Since my framework is heavily based on functionnal programming concepts, I never rely on multiple rows being in memory at the same time.
  • Starting the processing on the first rows returned while the database engine is still returning other rows is a great performance boost.

In response to Gandalf, I add some more information:

  • I will always have to process the entire result set
  • I am not doing any aggregation of rows

I am integrating with a master data management application and retrieving data in order to either validate them or export them using many different formats (to the ERP, to the web platform, etc.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

绅刃 2024-07-28 01:37:28

没有普遍的答案。 我个人实施了这两种解决方案数十次。

这取决于对您来说更重要的是:内存还是网络流量。

如果您有快速的网络连接 (LAN) 和较差的客户端计算机,则从服务器逐行获取数据。

如果您通过 Internet 工作,那么批量获取将对您有所帮助。

您可以设置预取计数或数据库层属性并找到黄金分割。

经验法则是:获取您可以保留而不会注意到的所有内容

如果您需要更详细的分析,则涉及六个因素:

  • 行生成响应时间/速率(多久Oracle生成第一行/最后一行)
  • 行传递响应时间/速率(多久可以得到第一行/最后一行)
  • 行处理响应时间/速率< /strong>(多久可以显示第一行/最后一行)

其中一个将成为瓶颈。

通常,速率响应时间 是对立的。

通过预取,您可以控制行传递响应时间行传递率:较高的预取计数会提高速率,但会缩短响应时间,较低的预取计数会产生相反的效果。

选择哪一个对您来说更重要。

您还可以执行以下操作:创建单独的线程用于获取和处理。

仅选择足够的行以使用户在低预取模式(具有高响应时间)下感到有趣,然后切换到高预取模式。

它将在后台获取行,当用户浏览第一行时,您也可以在后台处理它们。

There is no universal answer. I personally implemented both solutions dozens of times.

This depends of what matters more for you: memory or network traffic.

If you have a fast network connection (LAN) and a poor client machine, then fetch data row by row from the server.

If you work over the Internet, then batch fetching will help you.

You can set prefetch count or your database layer properties and find a golden mean.

Rule of thumb is: fetch everything that you can keep without noticing it

if you need more detailed analysis, there are six factors involved:

  • Row generation responce time / rate(how soon Oracle generates first row / last row)
  • Row delivery response time / rate (how soon can you get first row / last row)
  • Row processing response time / rate (how soon can you show first row / last row)

One of them will be the bottleneck.

As a rule, rate and responce time are antagonists.

With prefetching, you can control the row delivery response time and row delivery rate: higher prefetch count will increase rate but decrease response time, lower prefetch count will do the opposite.

Choose which one is more important to you.

You can also do the following: create separate threads for fetching and processing.

Select just ehough rows to keep user amused in low prefetch mode (with high response time), then switch into high prefetch mode.

It will fetch the rows in the background and you can process them in the background too, while the user browses over the first rows.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文