在 SQL Server 中创建单独的数据库会给我带来更好的性能吗?

发布于 2024-11-14 10:46:37 字数 501 浏览 1 评论 0原文

总之,我是一名程序员,但对于这个特定的项目,我发现自己也是一名 DBA。这是我面临的场景:

拥有 400-1000 名客户的 Web 应用程序。客户是一个“实体公司”,每个公司都有n个用户。每个客户(公司)平均拥有 1GB 的数据(总计约 2 亿行)。就存储的数据类型而言,每家公司可能都有 80% 的相似数据。另外 20% 是公司可以自行定义的自定义数据(基本上是自定义字段)。

当您考虑到客户需要相当好的反应时间时,我试图找出以便宜的方式扩展此规模的最佳方法。例如,客户 X 可能希望获取姓氏为“smith”、电话为“555”的所有记录,而客户 Y 可能希望获取帐号等于“1526A”的所有记录。

最重要的是,性能是关键,我发现很难决定索引什么,以及考虑到这些人基本上可以通过 UI 创建自己的查询,这是否会对我有帮助。

我的问题是,你会怎么做?您认为将每个客户分为自己的数据库是明智的吗?目前数据库总大小约为 400GB。

这是一个完整的重写,所以我有幸能够在需要时重新开始。任何想法、提示将不胜感激。

All, I'm a programmer by trade but for this particular project I'm finidng myself being the DBA as well. Here is the scenario I'm faced with:

Web app with anywhere from 400-1000 customers. A customer is a "physical company", each of which has n-number of uers. Each customer (company) has on average 1GB worth of data (total of about 200 million rows). Each company has probably 80% similar data in terms of the type of data stored. The other 20% is custom data that the companies can themselves define (basically custom fields).

I am trying to figure out the best way to scale this on the cheap when you conisder that the customers need pretty good reaction time. For example, customer X might want to grab all records where last name like 'smith' and phone like '555' where as customer Y might want to grab all records where account number equals '1526A'.

Bottom line, performance is key and I'm finding it hard to decide what to index and if that is even going to help me given the fact these guys can basically create their own query through the UI.

My question is, what would you do? Do you think it would be wise to break each customer out into it's own DB? Total DB size at the moment is around 400GB.

It is a complete re-write so I have the fortune of being able to start fresh if needed. Any thoughts, hints would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不醒的梦 2024-11-21 10:46:37

底线,性能是关键
我发现很难决定该做什么
索引,如果那是的话
帮助我,因为这些人可以
基本上创建自己的查询
通过用户界面。

底线是,您将数据库性能拱手让给客户的突发奇想。如果他们能够“创建自己的查询”,那么他们就能够“创建自己的非常糟糕的查询”。

因此,如果您在共享环境(即相同的硬件)中运行此程序,那么客户 A 糟糕的表扫描可能会导致其他所有人的 I/O 饱和。

如果它们位于同一数据库服务器上,则客户 A 的扫描将刷新数据缓存中的所有其他客户数据。

基本上,您“分享”的越多,一个客户对其他客户的运营的影响就越大。如果你让客户有能力做昂贵的事情,并分享其中的大部分,那么每个人都会受苦。

因此,选择是:a)不要让顾客做愚蠢的事情,或者 b)尽可能将顾客分开,这样当一个人做了愚蠢的事情时,其他顾客的电话就不会亮起来。

如果您不知道“对什么进行索引”,那么您就无法对客户可以做什么提供太多控制,因此愚蠢的事情因素就会增加。

通过提供几种流行的预制 SQL 视图供客户选择,您可能会取得很大的进展,然后他们只能简单地过滤结果并可能对结果进行排序。然后,您可以围绕这些视图的执行进行优化。

令人惊讶的是,很少有“一般”视图可以涵盖大量用例。

通用的、愚蠢的查询可以委托给在夜间、下班时间运行的批处理过程,或者委托给不影响事务性能的单独机器,例如包含“除了今天的数据之外的所有数据”的夜间快照。让他们针对该问题运行历史查询。

Bottom line, performance is key and
I'm finding it hard to decide what to
index and if that is even going to
help me given the fact these guys can
basically create their own query
through the UI.

Bottom line, you're ceding your DB performance to the whims of your clients. If they're able to "create their own query", then they're able to "create their own REALLY BAD queries".

So, if you run this in a shared environment (i.e. the same hardware), then customer A's awful table scans can saturate the I/O for everyone else.

If they're on the same database server, then Customer A's scans get to flush all of your other customers data from the data cache.

Basically, the more you "share", the more one customer can impact the operations of other customers. If you give customers the capability to do expensive things, and share much of it, then everyone suffers.

So, the options are a) don't let the customers do silly things or b) keep the customers as separated as practical so that when one does do silly things, the phones don't light up from all of the other customers.

If you don't know "what to index" then you are not offering much control over what the customers can do, and thus the silly thing factor goes way up.

You would probably get quite far by offering several popular, pre-made SQL views that the customers can select from, and then they're limited to simply filtering and possibly ordering the results. Then you optimize around execution of those views.

It's likely that surprisingly few "general" views can cover a large amount of the use cases.

Generic, silly queries can be delegated to a batch process that runs overnight, during off hours, or to a separate machine that doesn't impact transactional performance, such as a nightly snapshot with "everything but todays data" on it. Let them run historic queries against that.

甜味拾荒者 2024-11-21 10:46:37

SO问题如何设计多租户数据库有一篇不错的文章的链接,内容涉及从“不共享任何内容”到“共享一切”的权衡。另外,SO 有一个针对此类问题的标签;我给你加了。

The SO question How to design a multi tenant database has a link to a decent article on the tradeoffs along the spectrum from "shared nothing" to "shared everything". Also, SO has a tag for those kinds of questions; I added it for you.

红焚 2024-11-21 10:46:37

在同一服务器上创建单独的数据库不会帮助您获得更好的性能。使用多个数据库可获得的性能优化与使用一个数据库可获得的性能优化相同。

出于管理原因,单独的数据库可能是有意义的 - 例如,如果不同的备份或可用性要求适用于不同的客户。

构建应用程序以支持多个数据库可能仍然是明智的,这样您就可以选择在多个数据库服务器上进行扩展。

Creating separate databases on the same server won't help you get better performance. The performance optimisations available to you with multiple databases are just the same as you can achieve with one database.

Separate databases might make sense for administrative reasons - if different backup or availability requirements apply to different customers for example.

It's still probably sensible to build your application so that it can support multiple databases so that you have the option of scaling out over multiple DB servers.

挽手叙旧 2024-11-21 10:46:37

如果您有单独的数据库,那么随着时间的推移,80% 的数据库几乎不可能保持相同。您最终将花费更多的钱进行维护。

幸运的是,SQL Server 为您提供了一些选项。首先将客户特定信息放在同一数据库中的单独模式中,将通用信息放在不同的模式中(为每个客户创建一个通用模式和一个模式)。

接下来由客户端设置数据分区。这可能需要适当的硬件才能有效地完成此操作。

现在,您拥有一个通用代码库,它将立即向所有客户端传播更改,并且使用分区将客户端分开以提高性能。

If you have seperate databases the 80% that is the same beciomes almost impossible to keep the same over time. YOu will end up spending far more money for maintenance.

Luckly SQL Server has some options for you. First put the customer sspeicifc information in the same database in a separate schema and the common stuff in a differnt schema(create a common schema and a schema for each client).

Next set up data partitioning by client. This can require the proper hardware to do this effectively.

Now you have one code base for common which will promugate changes to all clients at once and clients are separated for performance using the partitions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文