两个不同数据库中的两个表之间的联接有什么问题?
我对您对连接不同数据库中的两个或多个表的陷阱的想法感兴趣。我将尝试举一个例子。
假设表 Table1
位于 DatabaseA
数据库中,而 Table2
位于 DatabaseB
中。 假设我在 DatabaseA
中有一个视图,它从 Table1
中提取一些数据,并在 DatabaseA
中提取一些其他表。
该视图用于将数据推送到另一个数据库,我们将其称为DatabaseC
。
如果我需要来自 Table2
的一些数据,我的本能是直接在此视图中加入 Table2
,有点像这样 table1 内联 DatabaseB..table2 on [some]列]
这样做非常简单快捷,但我脑子里有一个唠叨的声音一直告诉我不要这样做。我担心的是无法根据 Table2
跟踪所有对象,因此如果我更改其中的某些内容,我必须非常小心并记住我使用此表的所有位置。因此,有点像破坏此视图(和两个数据库)的 SRP,因为此视图可以通过两个不同的操作进行更改(在两个不同的数据库上执行:更改 Table1
或更改 Table2)
我对你的意见很感兴趣。这是好主意还是坏主意?这种方法会出现什么问题(性能方面、维护方面等等),如果您有现实世界的经验,这种方法要么是一个大错误,要么是您的救星。
PS:我在谷歌和SO上搜索过这个主题,但找不到与此相关的任何内容。我很乐意接受来自 SO 用户的负票、重复问题和其他“谴责”,只是为了对这个问题有不同的看法。
PPS:我正在使用 SQL Server 2005。
谢谢,希望我说清楚了:)
I am interested in your thoughts about the the pitfalls of joining two or more tables from different databases. I'll try to give an example.
Suppose table Table1
is located in DatabaseA
database and Table2
is located in DatabaseB
.
Let's say i have a view, in DatabaseA
that pulls out some data from Table1
, and some other tables in DatabaseA
'.
This view is used to push data to another database, let's call this one, unimaginatevely, DatabaseC
.
If i need some data from Table2
, my instinct is to join directly Table2
in this view, sort of like this table1 inner join DatabaseB..table2 on [some columns]
Doing this is pretty simple and quick, but i have a nagging voice in my head that keeps telling me not to do this. My worries are about not being able to track down all the objects depending on Table2
, so if I change something there, I have to be very carefull and remember everywhere i use this table. So, sort of like breaking SRP for this view (and two databases), because this view can change from two different actions (performed on two different databases: Changing Table1
or changing Table2
)
I am interested in your opinions. Is this a good or bad idea? What would be the problems with this approach (performance wise, maintainence wise and so on) and if you have a real world experience where this approach either was a big mistake or was a life saver for you.
P.S: I've searched this topic on google and SO, but could not find anything related to this. I will gladly take the minus votes, duplicate questions and other 'reprimands' from SO users just to have a different view on this problem.
P.P.S: I am using SQL Server 2005.
Thank you and hope i made myself clear:)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
你唠叨的声音可能是对的。
最重要的问题是如何强制声明性引用完整性,因为您无法在数据库之间创建外键,因此迟早您将不得不处理不一致、不匹配或不完整的数据。
但如果你不关心这一点,我认为没有问题:-)
Your nagging voice is probably right.
Not least of the problems will be how to enforce declarative referential integrity since you cannot create foreign keys between databases, therefore sooner or later you will have to cope with inconsistent or mismatched or incomplete data.
But if you don't care about that, I don't see a problem :-)
跨数据库连接的一些常见主题:
外键
正如其他人指出的,在没有外键的情况下,您需要滚动自己的引用完整性。本身不是问题,但当您无法控制一个或多个数据库中的数据时,问题可能会浮现出来。
一个相关的问题是 CASE 工具的使用。当对模式进行逆向工程时,他们将忽略不存在 FK->PK 关系的表之间的链接。
性能
如果数据库位于不同的服务器上,那么您将面临这些服务器上运行的任何其他内容的变幻莫测以及运行联接操作本身的成本。同样,如果服务器都在您的控制范围内,那么您可以监控这些内容,但情况可能并非如此。
耦合
如果您的解决方案依赖于其他数据库,则会出现多个故障点。如果数据库出现故障,这可能会级联到一个或多个系统。
数据修改
您的解决方案可能会与您认为是另一个数据库表中的静态数据耦合。但是,如果这些内容被意外(或故意)修改、复制或删除怎么办?同样,如果相关数据库超出了您的职权范围,其他团队/部门可能不知道您的系统是如何运行的。
确实,在很多情况下跨数据库连接是常态。我见过的几个例子:
Mart-Repository
高性能操作发生在集市上,而主数据存储保存在存储库中。 CRUD 操作在两者之间频繁或不频繁地发生(每晚更新、实时等)。
旧数据库
您可能会出于数据迁移和/或报告/审计目的公开旧数据库。
查找
您的一个或多个数据库可能包含可重复使用的静态查找信息。
因此,回答你的问题 - 这取决于你到底在做什么以及风险是否可以接受。还存在其他解决方案,例如复制,但同样,这的可行性取决于您的部门/公司的结构。
Some general themes re cross-database joins:
Foreign keys
As others have pointed out, in the absence of foreign keys, you'll need to roll your own referential integrity. Not a problem in itself, but issues can surface when you're not in control of the data in one or more of the databases.
A related issue is the use of CASE tools. When reverse-engineering a schema, they will overlook links between tables where a FK->PK relationship doesn't exist.
Performance
If the database are on different servers then you're exposed to the vagaries of whatever else is running on those servers as well as the cost of running the join operation itself. Again, if the servers are all within your control, this is something you can monitor but this may may not be the case.
Coupling
If your solution relies on other databases you have multiple points of failure. If a database goes down, this could cascade to one or more systems.
Data modification
Your solution may be coupled to what you believe to be static data in tables on another database. However, what if this were accidentally (or purposefully) amended, duplicated or deleted. Again, if the databases in question are out of your remit, other teams/departments may not be aware of how your system operates.
All this being, true, there are many cases where cross-database joins are the norm. A few examples I've seen:
Mart-Repository
Performant operations take place on the mart whilst the master data stash is kept on the repository. CRUD operations take place between the two on a frequent or infrequent basis (nightly update, real-time etc).
Legacy DB
You might expose a legacy database for data migration and or reporting/auditing purposes.
Lookup
One or more of your databases may contain static lookup information which can be re-used.
So to answer your question - it depends on what exactly you're doing and whether the risk is acceptable. Other solutions exist such as replication but again, how feasible this is will depend on the structure of your department/company.
你的问题的答案是......这取决于。
我注意到,当您保持查询良好且简单(更少的连接等)时,性能不会严重下降。
查询越复杂,优化器产生次优执行计划的可能性就越大。
优化器最终决定如何执行查询。查询越复杂,优化器获得“错误”操作顺序的机会就越大。
我最近尝试了这个问题...
我在单个数据库上运行了大约 8 个连接的查询。然后,我在同一服务器上以不同的名称放置该数据库的副本,然后修改查询,以便它将连接到数据库的第二个副本中的几个表。
作为单个数据库查询,它的运行时间不到 3 秒;考虑到数据量的预期。
跨数据库连接查询运行时间不到 3 分钟。
The answer to your questions is...it depends.
I have noticed that there is no serious degradation in performance when you keep the queries nice and simple (fewer join etc).
The more complex the queries, the more chance that the optimizer will produce a suboptimal execution plan.
The optimizer ultimately gets to decide how to execute the query. The more complex the query, the more opportunity for the optimizer to get the order of operations "wrong".
I recently experimented with this problem...
I ran a query with roughly 8 joins on a single database. I then put up a copy of that database on the same server with a different name, and then I modified the query so that it would join to a couple tables in the second copy of the database.
As a single database query, it ran in under 3 seconds; expected given the volume of data.
The cross database joined query run in just under 3 minutes.
如果它们位于同一服务器上,则从单独的数据库中提取数据就不存在真正的问题。事实上,您可能出于充分的理由想要将它们分开。例如,如果您有从文件导入的事务表和查找表的组合。事务数据需要完全恢复和频繁的事务日志备份才能正确恢复,而查找数据则不需要,并且可以从简单恢复模式下的数据库中受益。
我们的应用程序使用许多不同的数据库,并且我们一直在查询中跨数据库。只要索引正确完成,就不会有明显的性能差异。最大的潜在问题是数据完整性,因为您无法跨数据库设置外键。如果需要的话,这可以在触发器中处理。
现在,当数据库位于不同的服务器上时,可能会出现性能问题,并且获取数据会更加复杂。
If they are on the same server, there is no real problem pulling from separate database. In fact, you may want to separate them for good reasons. For instance if you have a combination of transactional tables and lookup tables that are imported from files. The transactional data needs full recovery and frequent transactional log backups to be able to properly restore, the lookup data does not and can benefit from being in a database in simple recovery mode.
We have many different databases our applications use and we cross databases in queries all the time. As long as the indexing is done properly, there has been no noticable performance difference. The biggest potential issue is for data integrity as you can't set up foreign keys across databases. This can be handled in triggers if need be though.
Now when the databases are on different servers, there can be a performance problem and getting the data is more complicated.
与 SQL 中的其他所有内容一样,这取决于情况。
在我的工作中,我们经常这样做。我们有非常大的数据集,以及用于标题和详细级别记录的单独数据库,然后是用于我们根据其他数据构建的报告或表的附加数据库,等等。
跨数据库连接实际上并没有出现性能问题,在某些情况下根据您的硬件设置,它可能会更快。如果 DatabaseA 和 DatabaseB 位于具有不同控制器的单独物理驱动器上,则运行连接这些驱动器的查询可能比它们位于同一卷上的同一数据库中更快。
维护可能是一个问题,但不会比任何其他数据库/表更重要。这并不是说同一个表有不同版本,只是这些表位于不同的数据库中。
唯一的主要缺点是 SQL Server 在显示数据库内依赖关系方面表现不佳,因此您需要自己跟踪这些依赖关系。有一些针对此功能的脚本以及第三方实用程序,我听说 SQL Server Denali 将为此添加额外的支持,但我不确定这是否准确。
Like everything else in SQL, it depends.
At my job, we do this a LOT. We have very large data sets, and separate DBs for header and detail level records, then additional DBs for reports or tables that we build off of other data, etc etc.
There's not really a performance issue from joining across DBs, and in some cases depending on your hardware setup it may be FASTER. If DatabaseA and DatabaseB are on separate physical drives with different controllers, it will likely be faster to run a query joining those than if they were in the same DB on the same volume.
Maintenance can be an issue but no more than for any other database/tables. It's not like you have different versions of the same tables, you just have those tables in different DBs.
The only major drawback is SQL Server does a poor job of showing intra-database dependencies, so you will need to keep track of these yourself. There are some scripts for this and also third party utilities, and I have heard that SQL Server Denali will add additional support for this but I'm not sure if that's accurate.