在部署时将大量数据存入 SQL Server (Express) 数据库

发布于 2024-08-16 08:06:39 字数 1522 浏览 12 评论 0原文

对于我参与过的大多数数据库支持的项目，需要在部署项目之前将“启动”或测试数据放入数据库中。启动数据示例：列出世界上所有国家/地区的表格或列出将用于填充调色板的一堆颜色的表格。

我一直在使用一个系统，将所有启动数据存储在 Excel 电子表格中（每个工作表一个表），然后我有一个 SQL 实用程序脚本，它 (1) 创建数据库，(2) 创建模式，( 3) 创建表（包括主键和外键），(4) 作为链接服务器连接到电子表格，以及 (5) 将所有数据插入表中。

我最喜欢这个系统。我发现在 Excel 中布局列、使用简单的查找函数验证外键关系、执行串联操作、从 Web 表格或其他电子表格复制数据等非常容易。该系统的一个主要缺点是需要同步每当我更改表定义时，我的工作表中的列都会出现。

我一直在阅读一些教程来学习新的 .NET 技术或设计模式，我注意到这些通常涉及使用 Visual Studio 创建数据库和添加表（而不是脚本），并且通常使用以下命令输入数据内置设计器。这让我想知道我这样做的方式是否不是最有效或最可维护的。

问题

一般来说，您认为通过脚本或 GUI 设计器（例如 SSMSE 或 Visual Studio）构建整个数据库更可取吗？
您建议使用什么方法用启动或测试数据填充数据库？为什么？

澄清

从目前的答案来看，我认为我应该澄清一些事情。假设我有大量数据（数百或数千行）需要找到进入数据库的方式。这些数据可以来自不同的地方，例如文本文件、电子表格、网络表格等。我收到了一些使用 INSERT 语句编写此过程的建议，但这真的可行吗？正在谈论大量数据？

这让我想到...

新问题

如何编写 SQL 脚本来获取此页面并将其插入数据库？
使用 Excel，我只需将表复制/粘贴到工作表中并运行我的实用程序脚本，基本上就完成了。
如果您后来意识到需要一个新专栏《CapitalCity》怎么办？
使用 Excel，我可以从此页面获取该信息，将其粘贴到 Excel 中，然后通过快速的文本到列操作，我就可以获得所需格式的数据。

老实说，我写这个问题并不是为了捍卫 Excel 作为将数据输入数据库的最佳方法，甚至是一个好方法，但到目前为止的答案似乎并没有解决我主要关心的问题——如何获取所有这些数据进入您的数据库。手动编写包含数百条 INSERT 语句的脚本将非常耗时且容易出错。不知何故，这个脚本需要机器生成，但是如何生成呢？

原文

For most database-backed projects I've worked on, there is a need to get "startup" or test data into the database before deploying the project. Examples of startup data: a table that lists all the countries in the world or a table that lists a bunch of colors that will be used to populate a color palette.

I've been using a system where I store all my startup data in an Excel spreadsheet (with one table per worksheet), then I have a utility script in SQL that (1) creates the database, (2) creates the schemas, (3) creates the tables (including primary and foreign keys), (4) connects to the spreadsheet as a linked server, and (5) inserts all the data into the tables.

I mostly like this system. I find it very easy to lay out columns in Excel, verify foreign key relationships using simple lookup functions, perform concatenation operations, copy in data from web tables or other spreadsheets, etc. One major disadvantage of this system is the need to sync up the columns in my worksheets any time I change a table definition.

I've been going through some tutorials to learn new .NET technologies or design patterns, and I've noticed that these typically involve using Visual Studio to create the database and add tables (rather than scripts), and the data is typically entered using the built-in designer. This has me wondering if maybe the way I'm doing it is not the most efficient or maintainable.

Questions

In general, do you find it preferable to build your whole database via scripts or a GUI designer, such as SSMSE or Visual Studio?
What method do you recommend for populating your database with startup or test data and why?

Clarification

Judging by the answers so far, I think I should clarify something. Assume that I have a significant amount of data (hundreds or thousands of rows) that needs to find its way into the database. This data could be sourced from various places, such as text files, spreadsheets, web tables, etc. I've received several suggestions to script this process using INSERT statements, but is this really viable when you're talking about a lot of data?

Which leads me to...

New questions

How would you write a SQL script to take the country data on this page and insert it into the database?
With Excel, I could just copy/paste the table into a worksheet and run my utility script, and I'd basically be done.
What if you later realized you needed a new column, CapitalCity?
With Excel, I could take that information from this page, paste it into Excel, and with a quick text-to-column manipulation, I'd have the data in the format I need.

I honestly didn't write this question to defend Excel as the best way or even a good way to get data into a database, but the answers so far don't seem to be addressing my main concern--how to get all this data into your database. Writing a script with hundreds of INSERT statements by hand would be extremely time consuming and error prone. Somehow, this script needs to be machine generated, but how?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

反差帅 2024-08-23 08:06:39

我认为您当前的流程适合用初始数据播种数据库。它简单、易于维护并且适合您。如果您有一个良好的数据库设计并具有足够的约束，那么如何播种初始数据并不重要。您可以使用中间工具来生成脚本，但为什么还要麻烦呢？

SSIS 的学习曲线很陡，不能很好地与源代码管理配合使用（无法分辨版本之间发生了什么变化），并且对 Excel 的类型转换非常挑剔。还有一个问题是它会提前读取多少行来确定数据类型——如果您的前 x 行包含存储为文本的数字，那么您就会遇到很大的麻烦。

回复收藏 0 原文

仄言 2024-08-23 08:06:39

1) 我更喜欢使用脚本有几个原因。

• 脚本很容易修改，而且当我准备将应用程序部署到生产环境时，我已经编写了脚本，因此一切都已准备就绪。

• 如果我需要将数据库部署到不同的平台（例如Oracle 或MySQL），那么可以轻松地对脚本进行少量修改以在目标数据库上运行。

• 通过脚本，我不再依赖 Visual Studio 等工具来构建和维护数据库。

2）我喜欢使用脚本的老式插入语句。同样，在部署时脚本是您最好的朋友。在我们的商店，当我们部署应用程序时，我们必须准备好脚本供 DBA 运行，因为这是他们所期望的。

我只是发现脚本很简单，易于维护，并且是创建数据库并向其加载数据时的“最小公分母”。通过最小公分母，我的意思是大多数人（即 DBA、您商店中可能没有 Visual Studio 的其他人）将能够毫无问题地使用它们。

脚本的另一重要之处是它迫使您学习 SQL，更具体地说是 DDL（数据定义语言）。虽然手持式 GUI 工具很不错，但没有什么可以替代花时间从头到尾学习 SQL 和 DDL。我发现这些技能在几乎任何商店中都是无价的。

回复收藏 0 原文

无可置疑 2024-08-23 08:06:39

坦白说，我觉得在这里使用 Excel 的概念有点可怕。显然，它是有效的，但它创建了对临时数据源的依赖，而这种依赖要等到很久以后才能解决。您最不希望发生的事情就是急于部署数据库并发现 Excel 文件已损坏，或者更糟的是完全丢失。我认为这种情况的严重程度因公司的风险承受能力而异，但我会积极寻求将 Excel 从等式中删除，或者至少将其作为永久固定装置删除。

我总是使用脚本来创建数据库，因为脚本是可移植且可重复的——您可以使用（几乎）相同的脚本来创建开发数据库、QA 数据库、UAT 数据库和生产数据库。因此，使用脚本修改现有数据库同样重要。

我也总是使用脚本来创建引导数据（又称启动数据），这样做有一个非常重要的原因：之后通常需要编写更多脚本。或者至少应该有。引导数据几乎总是只读的，因此，您应该将其放置在只读文件组中以提高性能并防止意外更改。因此，您通常需要首先编写数据脚本，然后将文件组设置为只读。

不过，在更哲学的层面上，如果数据库正常工作需要此启动数据（大多数情况下确实如此），那么您确实应该将其视为数据定义本身（即元数据）的一部分。出于这个原因，我认为在任何地方定义数据都是不合适的，但是在用于创建数据库本身的同一个脚本或一组脚本中定义数据是不合适的。

测试数据有点不同，但根据我的经验，您通常会尝试以某种方式自动生成该数据，这使得使用脚本变得更加重要。您不希望出于测试目的而手动维护包含数百万行的临时数据库。

如果您的问题是测试或启动数据来自外部源（网页、CSV 文件等），那么我将使用实际的“配置数据库”来处理此问题。这样您就不必像在 Excel 中那样使用 VLOOKUPS 验证引用，您实际上可以强制执行它们。

使用 SQL Server Integration Services（以前称为 DTS）将外部数据从 CSV、Excel 或任何其他位置提取到配置数据库中 - 如果需要定期刷新数据，可以保存 SSIS 包，这样它就只剩下几个了点击次数。
如果您需要使用 Excel 作为中介，即格式化或重组网页中的某些数据，那没问题，但 IMO 重要的是尽快将其从 Excel 中取出，并且具有配置数据库的 SSIS 是实现此目的的一种出色的可重复方法。
当您准备好将数据从配置数据库迁移到应用程序数据库时，您可以使用 SQL Server Management Studio 为数据生成脚本（如果您还不知道 - 当您右键单击数据库时，请转到到任务、生成脚本，然后在脚本选项中打开“脚本数据”）。如果您真的很顽固，您实际上可以编写脚本过程的脚本，但我发现这通常需要不到一分钟的时间。

听起来可能需要很大的开销，但实际上，付出的努力是微乎其微的。您设置配置数据库一次，创建 SSIS 包一次，并刷新配置数据，可能每隔几个月刷新一次，也可能从不刷新一次（这是您已经完成的部分）做，这部分将变得更少工作）。一旦“设置”完成，实际上只需几分钟即可生成脚本，然后您可以在主数据库的所有副本上使用该脚本。

Frankly, I find the concept of using Excel here a bit scary. It obviously works, but it's creating a dependency on an ad-hoc data source that won't be resolved until much later. Last thing you want is to be in a mad rush to deploy a database and find out that the Excel file is mangled, or worse, missing entirely. I suppose the severity of this would vary from company to company as a function of risk tolerance, but I would be actively seeking to remove Excel from the equation, or at least remove it as a permanent fixture.

I always use scripts to create databases, because scripts are portable and repeatable - you can use (almost) the same script to create a development database, a QA database, a UAT database, and a production database. For this reason it's equally important to use scripts to modify existing databases.

I also always use a script to create bootstrap data (AKA startup data), and there's a very important reason for this: there's usually more scripting to be done afterward. Or at least there should be. Bootstrap data is almost invariably read-only, and as such, you should be placing it on a read-only filegroup to improve performance and prevent accidental changes. So you'll generally need to script the data first, then make the filegroup read-only.

On a more philosophical level, though, if this startup data is required for the database to work properly - and most of the time, it is - then you really ought to consider it part of the data definition itself, the metadata. For that reason, I don't think it's appropriate to have the data defined anywhere but in the same script or set of scripts that you use to create the database itself.

Test data is a little different, but in my experience you're usually trying to auto-generate that data in some fashion, which makes it even more important to use a script. You don't want to have to manually maintain an ad-hoc database of millions of rows for testing purposes.

If your problem is that the test or startup data comes from an external source - a web page, a CSV file, etc. - then I would handle this with an actual "configuration database." This way you don't have to validate references with VLOOKUPS as in Excel, you can actually enforce them.

Use SQL Server Integration Services (formerly DTS) to pull your external data from CSV, Excel, or wherever, into your configuration database - if you need to periodically refresh the data, you can save the SSIS package so it ends up being just a couple of clicks.
If you need to use Excel as an intermediary, i.e. to format or restructure some data from a web page, that's fine, but the important thing IMO is to get it out of Excel as soon as possible, and SSIS with a config database is an excellent repeatable method of doing that.
When you are ready to migrate the data from your configuration database into your application database, you can use SQL Server Management Studio to generate a script for the data (in case you don't already know - when you right click on the database, go to Tasks, Generate Scripts, and turn on "Script Data" in the Script Options). If you're really hardcore, you can actually script the scripting process, but I find that this usually takes less than a minute anyway.

It may sound like a lot of overhead, but in practice the effort is minimal. You set up your configuration database once, create an SSIS package once, and refresh the config data maybe once every few months or maybe never (this is the part you're already doing, and this part will become less work). Once that "setup" is out of the way, it's really just a few minutes to generate the script, which you can then use on all copies of the main database.

回复收藏 0 原文