在 SQL Server 2008 中处理对 BeginExecuteNonQuery 的多次调用

发布于 2024-07-24 05:55:05 字数 355 浏览 10 评论 0原文

我有一个应用程序正在接收大量数据,我想将这些数据存储在数据库中。 我当前的策略是在每条记录准备好时触发异步调用(BeginExecuteNonQuery)。 我使用异步调用来确保应用程序的其余部分顺利运行。

我遇到的问题是,随着数据量的增加,最终我会尝试在连接仍在使用时触发命令。 我可以看到两种可能的选择:

  1. 自己缓冲待处理的数据,直到现有命令完成。
  2. 根据需要打开多个连接。

我不确定这些选项中哪一个最好,或者实际上是否有更好的方法。 选项 1 可能会导致我的缓冲区变得越来越大,而选项 2 可能是非常糟糕的形式 - 我只是不知道。

任何帮助,将不胜感激。

I have an application that is receiving a high volume of data that I want to store in a database. My current strategy is to fire off an asynchronous call (BeginExecuteNonQuery) with each record when it's ready. I'm using the asynchronous call to ensure that the rest of the application runs smoothly.

The problem I have is that as the volume of data increases, eventually I get to the point where I'm trying to fire a command down the connection while it's still in use. I can see two possible options:

  1. Buffer the pending data myself until the existing command is finished.
  2. Open multiple connections as needed.

I'm not sure which of these options is best, or if in fact there is a better way. Option 1 will probably lead to my buffer getting bigger and bigger, while option 2 may be very bad form - I just don't know.

Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

昇り龍 2024-07-31 05:55:05

根据您的锁定策略,可能值得使用多个连接,但肯定不是“没有上限”的数字。 因此,这里使用的一个好的策略/模式是 “线程池”,其中 N 个专用线程池线程持有连接并在请求到来时拾取写入请求,并且线程完成了它正在执行的前一个操作。 池中获得最佳性能的线程数最好根据经验确定,通过在实际实验/原型设置中对各种可能性进行基准测试。

如果“缓冲区”队列(主线程在其中排队写入请求,池中的专用线程拾取它们)增长超过特定阈值,则意味着您获取数据的速度比写出数据的速度快,因此,除非您可以获得更多资源,否则您将不得不删除一些传入数据 - 可能通过随机抽样策略来避免未来统计分析产生偏差。 只需计算一下每个时间段(比如每分钟左右)你写了多少内容,以及由于资源短缺而不得不放弃的内容,这样你就可以在未来的数据挖掘探索中使用“分层抽样”技术。

Depending on your locking strategy, it may be worth using several connections but certainly not a number "without upper bounds". So a good strategy/pattern to use here is "thread pool", with each of N dedicated threads holding a connection and picking up write requests as the requests come and the thread finishes the previous one it was doing. Number of threads in the pool for best performance is best determined empirically, by benchmarking various possibilities in a realistic experimental/prototype setting.

If the "buffer" queue (in which your main thread queues write requests and the dedicated threads in the pool picks them up) grows beyond a certain threshold, it means you're getting data faster than you can possibly write it out, so, unless you can get more resources, you'll simply have to drop some of the incoming data -- maybe by a random-sampling strategy to avoid biasing future statistical analysis. Just count how much you're writing and how much you're having to drop due to the resource shortage in each period of time (say every minute or so), so you can use "stratified sampling" techniques in future data-mining explorations.

旧伤慢歌 2024-07-31 05:55:05

谢谢亚历克斯 - 那么你建议采用混合方法,假设如果所有连接都在使用中我仍然需要缓冲更新?

(我是原贴,不知不觉就弄到了两个账号)

Thanks Alex - so you'd suggest a hybrid method then, assuming that I'll still need to buffer updates if all connections are in use?

(I'm the original poster, I've just managed to get two accounts without realizing)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文