通过 Java 运行存储过程比直接在数据库上运行慢 30%
我正在使用 Java 1.6、JTDS 1.2.2(也只是尝试了 1.2.4,但无济于事)和 SQL Server 2005 创建 CallableStatement 来运行存储过程(不带参数)。我发现 Java 包装器运行相同的存储过程比使用 SQL Server Management Studio 慢 30%。我已经运行了 MS SQL 分析器,两个进程之间的 I/O 几乎没有差异,因此我认为这与查询计划缓存无关。
存储过程不带任何参数,也不返回任何数据。它使用服务器端游标来计算填充表所需的值。
我看不出从 Java 调用存储过程会增加 30% 的开销,当然它只是一个通往数据库的管道,SQL 被发送下来,然后数据库执行它......数据库是否可以提供 Java应用程序不同的查询计划?
我已发布到 MSDN 论坛 和 sourceforge JTDS 论坛(主题:“JTDS 中的存储过程比 DB 中的直接存储过程慢”)我想知道是否有人对为什么会发生这种情况有任何建议?
提前致谢,
-James
(注意,一旦找到解决方案,我会将在其他论坛中获得的所有答案整理在一起)
Java 代码片段:
sLogger.info("Preparing call...");
stmt = mCon.prepareCall("SP_WB200_POPULATE_TABLE_limited_rows");
sLogger.info("Call prepared. Executing procedure...");
stmt.executeQuery();
sLogger.info("Procedure complete.");
我已经运行了 sql profiler,并发现了以下内容:
Java 应用程序: CPU:466,514 读取:142,478,387 写入:284,078 持续时间:983,796
SSMS: CPU:466,973 读取:142,440,401 写入:280,244 持续时间:769,851
(两者都在分析之前运行 DBCC DROPCLEANBUFFERS,并且都生成正确的行数)
所以我的结论是,它们都执行相同的读取和写入,只是这样他们的做法不同,你们觉得怎么样?
事实证明,不同客户端的查询计划有很大不同(Java 客户端在插入期间更新索引,而在更快的 SQL 客户端中则没有),而且执行连接的方式也不同(嵌套循环与循环)。收集流、嵌套循环与索引扫描,啊!))。到底为什么会这样,我还不知道(当我真正了解它的底部时,我会重新发布)
尾声
我无法将其传达给正常工作。我尝试同质化 Java 和 Mgmt studio 客户端之间的连接属性(arithabort
、ansi_nulls
等)。最终两个不同的客户端具有非常相似的查询/执行计划(但仍然具有不同的实际 plan_id)。我将我发现的内容摘要发布到 MSDN SQL Server 论坛,因为我发现不仅 JDBC 客户端和管理工作室之间存在性能差异,而且 Microsoft 自己的命令行客户端 SQLCMD 之间也存在性能差异,我还检查了一些更根本的东西,例如网络流量,或者将存储过程包装在另一个存储过程中,只是为了笑。
我感觉问题出在游标执行方式的某个地方,并且它以某种方式导致 Java 进程被挂起,但是为什么不同的客户端在没有其他东西运行时会产生这种不同的锁定/等待行为并且相同的执行计划正在运行有点超出了我的技能范围(我不是DBA!)。
结果,我决定 4 天足够任何人的时间浪费在这样的事情上,所以我会勉强地围绕它进行编码(如果我诚实的话,存储过程需要重新编码以增加增量而不是重新编码) -无论如何每周计算所有数据),并将这一点归结为经验。我将保留这个问题,非常感谢所有参与其中的人,这都很有用,如果有人提出任何进一步的建议,我很乐意听到更多选择......如果有人发现这篇文章是在他们自己的环境中看到这种行为的结果,然后希望这里有一些提示您可以自己尝试,并希望比我们看得更远。
我现在已经准备好迎接周末了!
——詹姆斯
I'm using Java 1.6, JTDS 1.2.2 (also just tried 1.2.4 to no avail) and SQL Server 2005 to create a CallableStatement to run a stored procedure (with no parameters). I am seeing the Java wrapper running the same stored procedure 30% slower than using SQL Server Management Studio. I've run the MS SQL profiler and there is little difference in I/O between the two processes, so I don't think it's related to query plan caching.
The stored proc takes no arguments and returns no data. It uses a server-side cursor to calculate the values that are needed to populate a table.
I can't see how the calling a stored proc from Java should add a 30% overhead, surely it's just a pipe to the database that SQL is sent down and then the database executes it....Could the database be giving the Java app a different query plan??
I've posted to both the MSDN forums, and the sourceforge JTDS forums (topic: "stored proc slower in JTDS than direct in DB") I was wondering if anyone has any suggestions as to why this might be happening?
Thanks in advance,
-James
(N.B. Fear not, I will collate any answers I get in other forums together here once I find the solution)
Java code snippet:
sLogger.info("Preparing call...");
stmt = mCon.prepareCall("SP_WB200_POPULATE_TABLE_limited_rows");
sLogger.info("Call prepared. Executing procedure...");
stmt.executeQuery();
sLogger.info("Procedure complete.");
I have run sql profiler, and found the following:
Java app :
CPU: 466,514 Reads: 142,478,387 Writes: 284,078 Duration: 983,796
SSMS :
CPU: 466,973 Reads: 142,440,401 Writes: 280,244 Duration: 769,851
(Both with DBCC DROPCLEANBUFFERS run prior to profiling, and both produce the correct number of rows)
So my conclusion is that they both execute the same reads and writes, it's just that the way they are doing it is different, what do you guys think?
It turns out that the query plans are significantly different for the different clients (the Java client is updating an index during an insert that isn't in the faster SQL client, also, the way it is executing joins is different (nested loops Vs. gather streams, nested loops Vs index scans, argh!)). Quite why this is, I don't know yet (I'll re-post when I do get to the bottom of it)
Epilogue
I couldn't get this to work properly. I tried homogenising the connection properties (arithabort
, ansi_nulls
etc) between the Java and Mgmt studio clients. It ended up the two different clients had very similar query/execution plans (but still with different actual plan_ids). I posted a summary of what I found to the MSDN SQL Server forums as I found differing performance not just between a JDBC client and management studio, but also between Microsoft's own command line client, SQLCMD, I also checked some more radical things like network traffic too, or wrapping the stored proc inside another stored proc, just for grins.
I have a feeling the problem lies somewhere in the way the cursor was being executed, and it was somehow giving rise to the Java process being suspended, but why a different client should give rise to this different locking/waiting behaviour when nothing else is running and the same execution plan is in operation is a little beyond my skills (I'm no DBA!).
As a result, I have decided that 4 days is enough of anyone's time to waste on something like this, so I will grudgingly code around it (if I'm honest, the stored procedure needed re-coding to be more incremental instead of re-calculating all data each week anyway), and chalk this one down to experience. I'll leave the question open, big thanks to everyone who put their hat in the ring, it was all useful, and if anyone comes up with anything further, I'd love to hear some more options...and if anyone finds this post as a result of seeing this behaviour in their own environments, then hopefully there's some pointers here that you can try yourself, and hope fully see further than we did.
I'm ready for my weekend now!
-James
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您可以附加事件探查器和监视器 SQL:BatchCompleted 和SP:已完成,过滤器持续时间 > > 1000. 从 Java 客户端和 SSMS 运行该过程。比较两个事件的读取和写入(Java 与 SSMS)。它们有显着不同吗?这将表明执行路径或计划有很大不同,并且 I/O 存在显着差异。
还尝试捕获两者的 Showplan XML 事件并比较计划(将事件保存为 .sqlplan 文件,在 SSMS 中打开以方便分析)。他们有类似的计划吗?估计与实际(行、倒带、重新绑定)是否存在巨大差异?它们具有相同的并行度吗?这些计划也可以从 sys.dm_exec_requests 视图中检索。
是否引发了任何警告事件,例如缺少列统计信息、对警告进行排序,哈希警告,执行警告,阻止的进程< /a>?
关键是您拥有一整套调查工具可供您使用。一旦找到差异的根本原因,您就可以追溯到 Java 环境设置和 SSMS 环境 (ADO.Net SqlClient) 之间的差异。例如默认事务隔离级别、ANSI 设置等。
You can attach the Profiler and monitor for the events SQL:BatchCompleted and SP:Completed, with a filter on duration > 1000. Run the procedure from your Java client and from SSMS. Compare the Reads and the Writes of the two events (Java vs. SSMS). Are they significantly different? This would indicate considerably different execution paths or plans, with significant difference in I/O.
Also try to capture the Showplan XML event of the two and compare the plans (save the event as a .sqlplan file, open it in SSMS to easy analysis). Do they have similar plans? Are there wild differences in Estimate vs. Actual (rows, rewinds, rebinds)? Do they have same degree of parallelism? The plans can aso be retrieved from sys.dm_exec_requests view.
Are there any warning events raised, like Missing Column Statistics, Sort Warnings, Hash Warning, Execution Warnings, Blocked Process?
the point is that you have at your disposal a whole arsenal of investigation tools. Once you find the root cause of the difference, you can trace it down to what is different between your Java environment settings and the SSMS environment (ADO.Net SqlClient). Things like default transaction isolation level, ANSI settings etc etc.
检查:您的问题是否是两个应用程序(SSMS、Java)对 SQL Server 进行完全相同的调用,并且 SQL Server 对每个应用程序的行为不同?如果是这样,我每隔一两年就会遇到这样的事情,它们会伤害我的大脑好几天。
有一次,我最终隔离了每个流程调用,并在 Profiler 中记录整个流程的所有内容。我最终注意到登录事件(在 TextData 下)显示了许多信息,如下所示:
“现有连接”事件也会显示此信息 - 但是,有时会立即进行后续调用(批次、RPC,我现在不记得了)发送 [ISQL 或 OSQL 执行此操作,我认为] 立即重置其中一些 - Arithabort 和 Quoted_Identifier 似乎是最喜欢的,其他 SET 选项也会根据任何设置或要求进行修改您的应用程序的数据库接口正在使用的连接协议。
另一种情况是:某些设置在“创建”时保留为过程的属性,而其他设置则在编译时考虑。一方面,您的连接的 SET 值可能会被创建过程时保存的配置覆盖;另一方面,您的两个连接可能差异很大,以至于为一个过程生成两个执行计划。 (经过充分的研究后,所有这些信息都可以在 sys.tables 和 DMV 中找到。)
简而言之,在我看来,SQL 的模糊性让你感到困惑。直到今天,我仍然讨厌所有这些 goombah 设置。我注意到的事情一直在困扰他们[我的意思是,真的,哪个傻瓜会为连接池设置implicit_transaction?但一旦他们这样做了……]当地面(规则)从你脚下不断变化时,就很难建造结构。毕竟,记住那家伙所说的在沼泽中建造城堡的话......
Checking: Is your problem that two applications (SSMS, Java) are making the exact same identical call to SQL Server, and SQL Server is acting differently for each? If so, I hit things like this every year or two, and they hurt my brain for days.
Once, I ultimately isolated each process call and logging everything for the entire process in Profiler. I eventually noticed that the Login event (under TextData) showed a host of information, like so:
The "Existing Connection" event will show this information as well--but, sometimes immediately subsequent calls (batches, RPCs, I disremember just now) are sent [ISQL or OSQL did this, I think] to immediately reset some of these -- Arithabort and Quoted_Identifier seem to be favorites, and other SET options also get modified depending on the settings or requirements of whatever connectivity protocols your application's database interface is using.
Another one: some settings are kept as attributes of a procedure at "create" time, and others are factored in at compile time. On the one hand, your connection's SET values may be being overwritten by the configuration saved at the time the procedure was created; on the other hand, your two connections may differ so much that two execution plans are generated for one procedure. (All of this information is, after sufficient research, available in the sys. tables and DMVs.)
In short, it seems to me that SQL obscurities are messing you up. To this day, I loathe all these goombah settings. Things below my notice keep messing around with them [I mean, really, what fool would set implicit_transaction for a connection pool on? But once they did...] and it's hard to build structures when the ground (rules) keep changing out from underneath you. After all, remember what the guy said about building castles in a swamp...
我记得不久前也遇到过类似的问题,因为 JTDS 正在默默地将字符串参数转换为 Unicode 或类似的东西。由于该转换,SQL Server 无法使用我们从 SSMS 运行存储过程时正在使用的索引。
HIH
I recall having a similar issue a while ago, because JTDS was silently converting a string parameter to Unicode or something similar. As a result of that conversion, SQL Server was unable to use the index which is was using when we ran the stored proc from SSMS.
HIH
Java 案例是否包括将结果传输到 Java 服务器(网络开销)以及一些 Java 处理? 12 分钟的查询可能会产生相当大量的数据。
Does the Java case include transmission of the results to the Java server (network overhead) plus some Java processing? A 12 minute query might produce quite a large amount of data.
如果您正在查看探查器并且执行之间没有差异,那么差异一定在于客户端系统。
仅仅准备一份要发送的声明,4 分钟似乎就太长了,所以 12 分钟的等待一定会造成一些其他影响——不知道是什么。
If you are looking at the profiler and there is no difference between the executions then the difference must be with the client systems.
4 mins does seem like to long just to prepare a statement to send so the 12 min wait must cause some other effect -- no idea what it is.
我不确定这篇文章是否仍然相关。我们在应用程序中遇到了类似的问题。
在 SQL Management Studio 中运行存储过程与从 JDBC 运行存储过程之间的一项关键区别在于事务上下文。如果您在 Java 中使用 ORM,则默认情况下存储过程在事务上下文中运行。当您直接在 SQL Management Studio 中运行存储过程时,事务将关闭。存在显着的性能差异。
I am not sure if this post is still relevant. We faced a similar problem in our application.
One key difference between running a stored procedure in SQL Management studio and one running from JDBC is that of transaction context. If you are using an ORM in Java, by default the stored procedure runs in a transaction context. When you run a stored procedure directly in SQL management studio the transaction is off. There is a substantial performance difference.
抱歉,我没有找到正确的答案,所以我不想将其中任何一个分配为正确的答案,所以我将将此答案标记为正确的答案,并祝遇到类似问题的任何人好运!
Sorry, I've not found a correct answer to this, so I don't want to allocate any of these as correct, so I am going to mark this answer as correct, and wish anyone luck who comes across anything similar!
您是否知道 Microsoft 为其数据库提供了 JDBC 驱动程序?
这些可能性能更高。
显然..你现在可能已经解决了问题。
Did you know that Microsoft ship JDBC drivers for their databases?
These may be more performant.
Obviously.. you may have resolved the problem by now.