SQL 事务的实际成本是多少?
这需要一些背景知识。我正在创建一个 Web 应用程序(带有 SQL Server 后端的 ASP.NET/VB.NET),它将接受申请以接收资金,并且此表单包含三个人的联系信息字段。由于三个人的信息相同(姓名、电子邮件、电话),因此我选择将联系信息存储在与应用程序分开的表中。在应用程序表中,这三个联系人中的每一个都有一个 ID 字段,该字段指向联系人表中的联系人。
然而,这在做我的 CRUD 工作时提出了一个有趣的问题。我能想到的在一个 SQL 事务中创建、更新和检索应用程序和所有三个联系人信息的唯一方法需要非常复杂的存储过程。换句话说,(对我来说)通过多个交易检索此信息会简单得多。但是,由于永远不需要独立地使用此信息,因此我总是会进行多个事务来获取一个应用程序的信息。
所以我的问题是:
- 这种设计是否太过分了?每个申请的联系人永远不会超过三个,而且必须正好是三个。我将此信息删除到单独的表中是否过于复杂?
- 与编写复杂的存储过程并且只需要一个事务相比,执行多个 SQL 事务的实际成本是多少?
- 一般来说,使用 ADO.NET 的 Web 应用程序的 SQL 事务的成本是多少?
感谢您耐心等待冗长的解释。
*编辑*
在阅读了您的一些回复后,我似乎错误地使用了“交易”一词。我实际上好奇的是通过单个连接执行多个查询与执行一个查询的成本。抱歉造成误解。
This requires a bit of background. I'm creating a web app (ASP.NET/VB.NET with SQL Server backend) that will accept an application to receive funding, and this form has contact info fields for three people. Because the information is the same for all three people (name, email, phone), I chose to store contact info in a table separate from the applications. In the application table, there is an ID field for each of these three contacts that points to a contact in the contact table.
However, this presents an interesting question/problem in doing my CRUD stuff. The only ways I can think of to create, update, and retrieve information for both the application and all three contacts in one SQL transaction require very complicated stored procedures. In other words, it would be much simpler (for me) to retrieve this information via multiple transactions. However, since this information is never needed independently, I would ALWAYS be doing multiple transactions to get information for one application.
So my questions:
- Is this design overkill? There will never be more than three contact people per application, and there MUST be exactly three. Am I over-complicating by removing this information to a separate table?
- What is the real cost of doing several SQL transactions vs. writing my complex stored procedures and only needing one transaction?
- Generally speaking, what is the cost of a SQL transaction to a web application using ADO.NET?
Thanks for bearing with that long-winded explanation.
*EDIT*
After reading some of your responses, it appears I am using the term "transaction" wrong. What I am actually curious about is the cost of executing multiple queries across a single connection, versus doing one query. Sorry for the misunderstanding.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
需要事务将数据库从一种一致状态移动到另一种一致状态。这里的“一致”适用于数据库的应用程序视图。典型的例子是两个账户之间的转账:您必须借记一个账户并贷记另一个账户。在这两次操作之间,数据库不一致,有一些钱“消失”了(从一个帐户中扣除的金额无处)。但在事务结束时,数据库再次保持一致。您需要一个事务来跨越这两个操作,以防止读者查看不一致的数据库,并在崩溃时确保数据库的一致性。
您说,为了在单个事务中处理逻辑中的多个项目,您需要复杂的过程。事实并非如此,事务范围与请求范围正交:客户端可以启动事务,通过在 3 个调用中调用 3 个过程来执行 3 个操作,然后提交事务。 不要求所有操作都在一个存储过程中完成。
因此事务不应导致显着的过程开销。事实上,程序会忽略事务。一个编写良好的过程在事务内部调用以及在事务之外调用时都应该表现正确。请参阅异常处理和嵌套事务获取模板对于在存在事务和异常的情况下正常运行的过程。
最后,交易的真实成本是多少?事务写入数据(读取并不真正需要事务),因此它们锁定修改的行。当尝试读取这些被阻塞的行时,读取通常会被阻塞,然后事务越长,它锁定行的时间就越长,并且阻塞的读取就越多。但有一个非常简单的解决方案:快照读取。快照读取确实是神奇的仙尘,它们允许应用程序不受阻塞行的阻碍,因为读取器始终可以读取阻塞更新之前的行版本行。请参阅使用基于行版本控制的隔离级别。
所以结论很简单:交易没有成本。仅仅是因为除了交易之外别无选择。不是“X 比 Y 慢”的问题,而是“X 是唯一正确的选择”的问题。
更新
编辑后:多个请求与一个请求的成本可能相当大。到服务器的往返(即在开放连接上发出请求)具有固定成本。如果您进行多次往返,则您需要为每个请求支付固定费用。如果您通过多个过程调用执行一个请求,则此固定成本只需支付一次。在非常的系统上,这种差异是可以测量的,并且会影响整体性能。但我谈论的是非常的系统,比如每秒数千个请求。解决方案通常不是使过程复杂化,而是在一个请求中发出多个过程调用:
但我必须说,除非您的工作负载非常热,否则实际上并不需要此解决方案。作为一个粗略的规则,对于任何低于 1000 个请求/秒的请求,我会考虑清晰代码的好处超过性能好处。
但如果您必须为每个请求打开一个新连接,则情况会有所不同。登录握手确实非常昂贵,可以以数百毫秒为单位来衡量。但解决方案很简单:使用连接池(在 ADO.Net 中默认启用),并且不要过早丢弃应用程序中的连接,保留连接并重用它,直到整个工作单元完成。
Transactions are needed to move the database from one consistent state into another consistent state. The 'consistent' here applies to the application view of the database. The typical example is the money transfer between two accounts: you have to debit one account and credit another account. In between these two operations the database is inconsistent, there is some money that have 'vanished' (the sum debited from one account is nowhere). But at the end of the transaction, the database is again consistent. You need a transaction to span these two operations in order to protect the readers from viewing an inconsistent database, and in order to ensure a consistent database in case of a crash.
You say that in order to process multiple items in your logic in a single transaction you require complicated procedures. That is not true, the transaction scope is orthogonal to the request scope: a client can start a transaction, do 3 operations by invoking 3 procedures in 3 calls, then commit the transaction. It is not required that all the operations be done in one single stored procedure.
So transaction should not induce significant procedure overhead. In fact, a procedure would be oblivious to transactions. A well written procedure should behave correctly when it's invoked inside of a transaction as well as when it's invoked w/o a transaction. See Exception handling and nested transactions for a template for procedures that behave correctly in presence of transactions and exceptions.
An finally, what is the real cost of a transaction? Transaction write data (read aren't really needing transactions) and as such they lock the rows modified. Reads normally block when attempting to read these blocked rows, and then the longer a transaction is, the longer it locks rows, and the more reads it blocks. But there is a really simple solution: snapshot reads. Snapshot reads are really a magic pixie dust, they allow applications to go ahead unhindered by blocked rows, because a reader can always read the row version prior to the update that is blocking the row. See Using Row Versioning-based Isolation Levels.
So the conclusion is simple: transactions have no cost. Simply because there is no alternative to transactions. Is not an issue of 'X is slower that Y', is an issue of 'X is the only correct alternative'.
Updated
After your edit: the cost oh having multiple requests vs. one requests can be significant. A round-trip to the server (ie. issuing a request on an open connection) has a fixed cost. If you do multiple round-trips then you pay this fixed cost on each request. If you do one single request with multiple procedure invocations then this fixed cost is only payed once. On very hot systems, this difference is measurable and has a cost on overall performance. But I'm talking about really hot systems, as in thousands of requests per second. The solution is usually not to complicate the procedures, but issue multiple procedure calls in one single request:
But I must say that this solution is not really required unless you have a really hot workload. As a back-of-the-envelop rule, for anything under 1000 requests/sec, I would consider the benefits of clear code to outweigh the performance benefits.
One thing different though if you have to open a new connection for each request. The login handshake is really expensive, can be measured in hundreds of ms. But the solution is trivial: use connection pooling (which is enabled by default in ADO.Net) and don't discard connections in the app prematurely, keep the connection and reuse it until the entire unit of work is finished.
当你说“交易”时,我认为你的意思只是“查询”。当我想要获取一个应用程序的数据时,我会使用两个单独的简单查询:
这
是两个单独的查询,但很重要。更重要的部分是以有意义的方式设计数据库。
When you say "transaction" I think you just mean "query." When I wanted to get the data for one application, I would use two separate, simple queries:
and
It's two separate queries but big deal. The more important part is designing the database in a way that makes sense.
您在这里使用“交易”一词有点不正确。我认为你指的是一个简单的数据库查询,ACID事务助手。
话虽如此,这很难说。默认情况下启用的连接池将无需打开三/四个单独的连接。
您甚至可能不需要多个查询来获取数据,并且可以连接两个表来检索它,例如:
或类似的东西,考虑到您的情况(从应用程序向选择列表添加新字段,从每个表返回更多数据)如果表索引正确,则查找速度非常快,并且可能根本不会影响性能。如果
您要插入/更新数据,那么是的,您将需要多个查询来执行此操作。但是数据的选择可以发生在一个表中,
当然,另一种选择是将数据放回到原始应用程序表中,从而首先不需要第二个表,此时不需要额外的连接或。还需要其他任何东西。
You're using the term 'transaction' a bit incorrectly here. I think what you're referring to is a simple database query, ACID transactions aide.
That said, it's difficult to say. Connection pooling, enabled by default, will prevent the need to open three/four separate connections.
You'd likely not even need multiple queries to get the data, and can join the two tables to retrieve it, such as:
or something similar, given your situation (adding new fields to the select list from app, returning more data from each contact table, etc. If the tables are indexed properly, the lookup are very quick, and will likely not impact performance at all.
If you're inserting/updating the data, then yes, you'll need multiple queries to do that, but the selecting of the data can occur in one.
Of course, the other option would be to put the data back into the original application table, removing the need for the second table in the first place. At which point, no extra joins or anything else is needed.
给你几个想法...
如果您当前的设计是两个表,第一个表(应用程序)作为
ID,contact_id1,contact_id2,contact_id3,otherinfo
第二个表作为联系人只是
contact_id , info
这种格式的最大优点是如果您有 1 个联系人这是在多个应用程序上...联系人“Bob”在 8 个不同的应用程序上是 contact_1 意味着 Bob 的信息仅记录一次,并且仅将他的 ID 记录在应用程序表上。最大的限制是每个应用程序有 3 个联系人被永久编码
或者..应用程序可以定义为
application_ID,信息
并联系为
application_ID, id_sequence, info
此设置的优点是可以为每个应用程序存储无限数量的联系人。
虽然一切都可以满足您的需求...如果您总是有 3 个联系人,那么这里的第二个选项有点不必要。如果您不打算拥有超过 1000 条记录,那么将所有信息存储在一个表中并不是最糟糕的主意(选项一主要消除冗余数据)。回答你的问题1...在某些情况下这是多余的,但答案将取决于你当前的要求和未来的要求。从“最佳实践”设计来看,2 个表设置是优先的。
只是对问题#2 的评论 - 这将是一个更简单的存储过程...我看不出它对您的设置太复杂。您可能可以创建一个可更新的视图(但不确定那里可能会出现什么限制或问题)。
Couple thoughts for you...
If your current design is two tables, first table (applciation) as
ID, contact_id1,contact_id2,contact_id3,otherinfo
second table as contact is simply
contact_id , info
Biggest advantage on this format is if you have 1 contact that is on multiple applications...Conatact 'Bob' is contact_1 on 8 different applcations means bob's info is only recorded once and only his ID is recorded on the application table. Biggest restriction is 3 contacts per application is permanently coded
Alternatively..application can be defined as
application_ID, Info
and conact as
application_ID, id_sequence, info
The advantage of this setup is an infinate number of contacts can be stored for each application.
Although everything to meet your need...if you always have 3 contacts, the second option here is sorta unnessacary. If you're not planning to have more than 1000 records around, having all the info stored in one table isn't the worst idea (option one primarily eliminates redundant data). Answer to your question 1...it's overkill in some cases, but the answer will depend on your current requirements and future requirements. From a 'best practice' design, the 2 table setup is preferential.
Just a comment for Question # 2 - It'll be a simpler stored proc...I can't see it being too complicated with your setup. You could probably create an updateable view (Not sure what limitation or problems might arise there though).
除了纯粹的处理成本之外,每个查询还将花费一次客户端-数据库-服务器往返时间。因此,如果您能够批量处理简单的单个查询,则可以节省往返时间。
Every single query will cost you one client-database-server round trip time in addition to the pure processing cost. So, you can save on round trip times if you manage to batch simple single queries.
使用 1 个存储过程来完成所有操作,因此您只需要打开 1 个 sql 连接。
Web 表单中的每个变量都映射到存储过程中的一个参数(包括隐藏的联系人和应用程序键,对于新应用程序默认为 0)。
在该过程中,如果记录键为 0,则该过程会将记录添加到数据库中,并且使用scope_identity()快速检索密钥。如果您向过程发送非零密钥,它将更新记录。然后,该过程返回所有数据,并将所有存储过程变量声明为输出。
然后,您可以将该过程包装在 sql 开始/结束事务中,这样它就是一个全有或全无的 sql 命令。
Use 1 stored procedure to do everything so you only need to open 1 sql connection.
Every variable in your web form maps to a parameter in the stored procedure (including the hidden contact and application keys which you default to 0 for new apps)
In the procedure, if the record keys are 0, the procedure adds records to the database and retrieves the keys quickly using scope_identity(). If you send non-zero keys to the procedure it updates the record. The procedure then returns all data back having all the stored procedure variables declared as output.
You can then wrap the procedure inside a sql begin/end transaction so it's an all or nothing sql command.