连接池 - 开销是多少?
我正在 Webpshere 应用程序服务器 6.1 中运行一个 web 应用程序。 这个 web 应用程序有一个规则类型的引擎,其中每个规则都从 websphere 数据源池获取自己的连接。 因此,我看到当运行一个用例时,对于 100 条输入记录,从池中获取大约 400-800 个连接并将其释放回池。 我有一种感觉,如果这款发动机投入生产,可能需要花费太多时间才能完成加工。
频繁地从池中获取连接是一种不好的做法吗? 从池中获取连接涉及哪些开销? 我的猜测是,所涉及的成本应该是最小的,因为池只是一个资源缓存。 如果我错了,请纠正我。
I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
连接池可以让您的连接保持活动状态,如果另一个用户连接到数据库的就绪连接将被移交,并且数据库不必再次打开连接。
这实际上是一个好主意,因为打开连接不仅仅是一次性的事情。 有很多次访问服务器(身份验证、检索、状态等),因此,如果您的网站上有连接池,则可以更快地为客户提供服务。
除非您的网站没有人访问,否则您无法承受没有连接池为您工作的后果。
Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.
游泳池似乎不是你的问题。 真正的问题在于,您的“规则引擎”在完成整个计算之前不会将连接释放回池中。 看来该引擎的扩展性不太好。 如果数据库连接的数量在某种程度上取决于正在处理的记录数量,那么几乎总是会出现非常错误的情况!
如果你设法让你的引擎尽快释放连接,那么你可能只需要几个连接而不是几百个。 如果做不到这一点,您可以使用连接包装器,每次规则引擎请求一个连接时,它都会重新使用相同的连接,但这在某种程度上抵消了连接池的好处......更不用说
它引入了许多多线程和事务隔离问题,如果连接是只读的,这可能是一个选项。
The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.
连接池就是连接重用。
如果您在不需要连接时保留连接,那么您就阻止了该连接在其他地方重用。 如果您有很多线程执行此操作,那么您还必须使用更大的连接池来运行,以防止池耗尽。 更多的连接需要更长的时间来创建和建立,并且需要更多的资源来维护; 随着连接变旧,将会有更多的重新连接,并且您的数据库服务器也会受到更多连接的影响。
换句话说:您希望使用尽可能小的池来运行而不耗尽它。 做到这一点的方法就是尽可能少地保留你的联系。
我自己实现了一个 JDBC 连接池,尽管许多池实现可能可以更快,但您可能不会注意到,因为池中发生的任何松弛都很可能在时间上相形见绌在数据库上执行查询需要。
简而言之:当您返回连接池的连接时,连接池就会喜欢它。 或者无论如何他们应该这样做。
A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.
要真正检查您的池是否是瓶颈,您应该分析您的程序。 如果您发现池有问题,那么您就有调整问题。 一个简单的池应该能够每秒处理 100K 或更多或大约 10 微秒的分配。 但是,一旦使用连接,就需要 200 到 2,000 微秒才能完成一些有用的操作。
To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.
我认为这是一个糟糕的设计。 听起来像是 Rete 规则引擎失控了。
如果您假设每个线程最少 0.5-1.0 MB(例如堆栈等),您将消耗大量内存。 检查进出池的连接将是最不重要的问题。
最好的了解方法是进行性能测试并测量内存、每个操作的挂起时间等。但这听起来似乎不会有好结果。
有时我看到人们认为将所有规则放入 Blaze 或 ILOG 或 JRules 或 Drools 中只是因为它是“标准”和高科技。 这是一个很棒的简历项目,但是有多少解决方案可以通过更简单的表驱动决策树更好地提供服务? 也许你的问题就是其中之一。
我建议您获取一些数据,看看是否存在问题,并准备好重新设计(如果数据告诉您有必要)。
I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.
您能否提供有关规则引擎具体功能的更多详细信息? 如果每个规则“触发”正在执行数据更新,您可能需要验证连接是否被正确释放(将其放在代码的finally 块中以确保连接真正被释放)。
如果可能,您可能需要考虑将数据更新捕获到内存缓冲区,并仅在规则会话/调用结束时写入数据库。
如果数据库操作是只读的,请考虑缓存信息。
尽管您认为创建并释放到池中的 400-800 个连接很糟糕,但我怀疑如果您必须创建并关闭 400-800 个非池连接,情况会更糟。
Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.