MS-SQL Server 2000 全文索引速度慢

发布于 2024-07-22 21:19:39 字数 635 浏览 4 评论 0原文

我们在一个包含 633,569 条记录的相当大的表上有一个全文索引。 在运行大量删除/插入记录的 DTS 包之后,每天晚上都会从头开始重建索引,作为维护计划的一部分。 大块数据被删除,然后插入(以处理更新和插入),因此增量索引是不可能的。 将包更改为仅在必要时删除也是不可能的,因为它是最终将被替换的遗留应用程序。

FTI 包括两列 - 一列是 varchar(50) not null,另一列是 varchar(255) null。

主键列上有一个聚集索引,它只是一个标识列。 上面提到的整数列和 varchar(50) 列上还有一个组合索引。 出于性能原因添加后一个索引。

问题是重新索引的速度非常慢 - 大约需要 8 个小时。

该服务器相当强大(双处理器,4GB RAM),并且除了重新索引之外,一切都运行得很快。

关于如何加快速度有什么建议吗?

更新

我们的客户端可以访问 sql 框。 事实证明,他们在作为全文索引一部分的表上打开了更改跟踪。 我们将其关闭,整个填充过程只用了不到 3 个小时。 仍然不是很好,但比 8 好。

更新 2

FTI 再次需要大约 8 小时来填充。

We have a full text index on a fairly large table of 633,569 records. The index is rebuilt from scratch as part of a maintenance plan every evening, after a bunch of DTS packages run that delete / insert records. Large chunks of data are deleted, then inserted (to take care of updates and inserts), so incremental indexing is not a possibility. Changing the packages to only delete when necessary is not a possibility either as it is a legacy application that will eventually be replaced.

The FTI includes two columns - one a varchar(50) not null and a varchar(255) null.

There is a clustered index on the primary key column, which is just an identity column. There is also an combined index on an integer column and the varchar(50) column mentioned above. This latter index was added for performance reasons.

The problem is that the re-indexing is painfully slow - about 8 hours.

The server is fairly robust (dual processor, 4gb of ram), and everything runs quickly beyond this re-indexing.

Any tips on how to speed this up?

UPDATE

Our client has access to the sql box. Turns out they turned on change tracking on the table that is part of the full text index. We turned this off, and the full population took less than 3 hours. Still not great, but better than 8.

UPDATE 2

The FTI is again taking ~8 hours to populate.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

流年已逝 2024-07-29 21:19:39

SQL Server 的索引速度慢主要是因为它的异步数据提取方案。

  • 通过“更新”来使用更改跟踪

提高全文索引性能的最简单方法是通过“后台更新索引”选项来使用更改跟踪。当您对表建立索引时(FTI 与“标准”SQL 索引一样,适用 在每个表的基础上),您可以指定完全填充、增量填充或更改跟踪。当您选择完全填充时,全文索引表中的每一行都会被提取并建立索引。这是一个两步过程。首先

,您(或企业管理器)运行此系统存储过程:

sp_fulltext_getdata CatalogID, object_id

将所有时间戳和 PK 值的所有结果集返回到 MSSearch 后,MSSearch 将发出另一个 sp_fulltext_getdata,但这一次,针对表中的每一行发出一次。因此,如果您的数据库中有 5000 万行,则此过程将发出 5000 万次。

另一方面,如果您使用增量填充,MSSearch 将为

sp_fulltext_getdata CatalogID, object_id

表中已满的每一行 发出一个初始信号: -text 索引。因此,如果数据库中有 5000 万行,该语句也将发出 5000 万次。 为什么? 因为即使填充量不断增加,MSSearch 也必须准确找出哪些行已更改、更新和删除。 增量填充的另一个问题是,即使对未建立全文索引的列进行了更改,它们也会对行进行索引或重新索引。

虽然增量填充通常比完全填充更快,但您可以看到,对于大型表,这两种方法都非常耗时。

我建议您通过后台或计划更新启用更改跟踪。 如果这样做,您将看到 MSSearch 将首先发出另一个:

sp_fulltext_getdata CatalogID, object_id

对于表中启用了更改跟踪的每一行。然后,对于具有全文索引的列的每一行,并且该列在初始完整索引后进行了修改人口,行信息将被写入(在您正在索引的数据库中)到 sysfulltextnotify 表中。 然后,MSSearch 将仅针对此表中出现的行发出以下命令,然后将它们从 sysfulltextnotify 表中删除。

  • 考虑使用单独的构建
    服务器

在建立索引时大量更新的表可能会产生锁定问题,因此,如果您可以忍受定期过期的目录和有时不可用的 MSSearch 引擎,请考虑使用单独的构建服务器。 您可以通过确保索引服务器具有要全文索引的表的副本并导出目录来完成此操作。显然,如果您需要实时或近实时更新目录,这不是一个好的解决方案

  • 当人口减少时限制活动
    running

当填充正在运行时,不要运行 Profiler,并尽可能限制其他数据库活动。 Profiler 消耗大量资源。

  • 增加线程数
    索引进程

增加为索引进程运行的线程数。 默认值只有 5 个,在四路或 8 路上,您可以将其提高到更高的值。 然而,如果 MSSearch 从 SQL Server 获取过多数据,它会自我限制,因此请避免在单处理器或双处理器系统上执行此操作。

  • 停止任何防病毒软件或打开
    文件代理备份软件。

如果这不可能,请尝试阻止它们扫描 SQL FTI 使用的临时目录和目录目录

  • 将目录、临时目录和
    将目录放在自己的

控制器上,最好放在 RAID-1 阵列上。将临时目录放在 RAID-1 阵列上。 同样,考虑将页面文件放在具有自己的控制器的自己的 RAID-1 阵列上。

  • 考虑创建辅助数据
    临时 DB 文件 - 每个 CPU 1 个 /
    核。

SQL Server's indexing is slow primarily because of its asynchronous data extraction scheme.

  • Use change tracking with the "update
    index in background" option.

The easiest way to improve the performance of full-text indexing is to use change tracking with the "update index in background" option.When you index a table (FTI, like "standard" SQL indexes, works on a per-table basis), you specify full population, incremental population, or change tracking. When you opt for full population, every row in the table you're full-text indexing is extracted and indexed. This is a two-step process.

First, you (or Enterprise Manager) run this system stored procedure:

sp_fulltext_getdata CatalogID, object_id

After all the results sets of all of the timestamps and PK values are returned to MSSearch, MSSearch will issue another sp_fulltext_getdata, but this time, once for every row in your table.So if you have 50 million rows in your database, this procedure will be issued 50 million times.

On the other hand, if you use an incremental population, MSSearch will issue an initial:

sp_fulltext_getdata CatalogID, object_id

for each row in the table that you're full-text indexing. So if you have 50 million rows in your database, this statement will also be issued 50 million times. Why? Because even with an incremental population, MSSearch must figure out exactly which rows have been changed, updated, and deleted. Another problem with incremental populations is that they'll index or re-index a row even if the change was made to a column that you aren't full-text indexing.

Although an incremental population is generally faster than a full population, you can see that for large tables, either will be time-consuming.

I recommend you enable change tracking with background or scheduled updating. If you do, you'll see that MSSearch will first issue another:

sp_fulltext_getdata CatalogID, object_id

for every row in the table with change tracking enabled.Then, for every row that has a column that you're full-text indexing and that's modified after your initial full population, the row information will be written (in the database you're indexing) to the sysfulltextnotify table. MSSearch will then issue the following only for the rows that apear in this table and will then remove them from the sysfulltextnotify table.

  • Consider using a separate build
    server

Tables that are heavily updated while you're indexing can create locking problems, so if you can live with a catalog that's periodically out of date and an MSSearch engine that's sometimes unavailable consider using a separate build server. You do this by making sure the indexing server has a copy of the table to be full-text indexed and exporting the catalog .Clearly, if you need real-time or near real-time updates to your catalog, this is not a good solution

  • Limit activity when population is
    running

When population is running, don't run Profiler, and limit other database activity as much as possible. Profiler consumes significant resources.

  • Increase the number of threads for
    the indexing process

Increase the number of threads you're running for the indexing process. The default is only five, and on quads or 8-ways, you can bump this up to much higher values. MSSearch will, however, throttle itself if it's slurping too much data from SQL Server, so avoid doing this on single- or dual-processor systems.

  • Stop any anti-virus or open
    file-agent backup software.

If this is not possible, try to prevent them from scanning the temporary directories being used by SQL FTI and the catalog directories

  • Place the catalog,temp directory and
    pagefiles on their own controllers

If you can make that investment.Place the catalog on its own controller, preferably on a RAID-1 array.Place the temp directory on a RAID-1 array. Similarly, consider putting pagefile on its own RAID-1 array with its own controller.

  • Consider creating secondary data
    files for the Temp DB - 1 per CPU /
    Core.
寄离 2024-07-29 21:19:39
  • 你有足够的内存吗?

  • 您的文件驱动器在 RAID 配置方面的布局是什么?

  • 您是否看到高 tempDB 活动?

(顺便说一句,五十万条记录并不大;甚至不算中等......;))

  • Do you have enough RAM?

  • What are your file drive placements in terms of RAID configuration?

  • Are you seeing high tempDB activity?

(BTW, half a million records is not large; it's not even medium... ;) )

傾城如夢未必闌珊 2024-07-29 21:19:39

当您进行重新索引或实时运行时,系统是否处于离线状态?

这些是您的全文目录中唯一的项目吗? 如果不是,您可能需要考虑将它们与 FTS 数据的其余部分分开。 (也可能有助于监控)索引中的身份列是否配置为唯一键?

您能量化大量的变化吗? 重新填充有 3 个基本选项; 您可能想尝试切换到完整或增量,因为其中一种可能比您现在使用的更适合您。 根据我的经验,如果总数据库的更改小于 40%,则增量效果很好(在将大数据导入数据库期间也有类似的问题。)如果更改大于 40%,则完整可能会更好(根据我的经验 - i 索引)文档,因此它对您来说可能会有所不同)您可能要考虑的第三个选项是尝试使用计划更新重新索引选项进行更改跟踪。

如果您可以使服务器对用户离线,那么在重新索引时您运行 FTS 的性能设置是什么? 您可以检查此全文搜索服务属性/性能选项卡 - 系统资源使用情况作为滑块(认为有 4 或 5 个位置)。 可能有一个系统进程来改变这个,我不知道,也没有 2000 机器来检查了。

FTS / Reindexing 喜欢 ram,而且非常多; 一般经验法则是虚拟内存是物理内存的 3 倍; 如果您有多个物理磁盘,则创建多个 Pagefile.sys 文件,以便每个 Pagefile.sys 文件将放置在其自己的物理磁盘上。 您使用的是 NT 还是 Windows 2000 吗? 检查超过 2GB 的扩展内存是否实际配置正确。

Is the system offline whilst you are doing the reindex or live ?

Are these the only items in your full text catalog; if not you might want to consider separating them out from the remainder of your FTS data. (Might help with monitoring too) In the index is the identity column configured as the unique key ?

Can you quantify the large amounts of changes? There are 3 basic options for repopulation; You might want to try switching to full or incremental as one may suit you better than the one you are using now. In my experience incremental works well if changes to the total DB are less than 40% (had a similar issue during large data take ons into the database.) If >40% change then full is likely better (from my experience - i index documents so it might work differently for you) The third option you might want to consider try the Change Tracking with scheduled update reindex option.

If you can take the server off-line to users then what performance settings do you have FTS running under whilst reindexing? You can check this Full-Text Search Service Properties / Performance tab - System Resource Usage as a slider (think there are 4 or 5 positions). There is probably a system proc to change this dont know it and dont have a 2000 machine to check anymore.

FTS / Reindexing loves ram and lots of it; the general rule of thumb is have virtual memory 3x the physical memory; if you have several physical disks then create several Pagefile.sys files, so that each Pagefile.sys file will be placed on its own physical disk. Are you on NT or Windows 2000 ? check that extended memory over 2gb is actually configured properly.

凉月流沐 2024-07-29 21:19:39

尝试将索引放在与数据库不同的物理磁盘上。

编辑:斯科特报告说情况已经如此。

Try putting the index on a separate physical disk than the database.

EDIT: Scott reports this is already the case.

夜访吸血鬼 2024-07-29 21:19:39

禁止当前列中存在空值可能不会加快索引速度,但根据我的经验,这是一个更好的做法,特别是对于索引目的。 我唯一可以证明允许空值的列是日期列。

Disallowing nulls in the column that currently does might not speed up the index, but in my experience is a better practice, especially for indexing purposes. The only columns I can justify allowing nulls in are date columns.

无畏 2024-07-29 21:19:39

以下是SQL Server 上 FT 索引性能参数清单 。 其中大部分已在此处引用和检查。 但我在你的评论中没有找到其中之一:

应手动设置 SQL Server MAX SERVER MEMORY 设置(关闭动态内存分配),以便为全文搜索服务运行留下足够的虚拟内存。 要实现此目的,请选择“最大服务器内存”设置,该设置一旦设置,就会保留足够的虚拟内存,以便全文搜索服务能够访问等于服务器中物理 RAM 量 1.5 倍的虚拟内存量。 这将需要一些尝试和错误才能实现此设置。

Here is a checklist of parameters for FT-indexing performance on SQL Server. Most of them are already quoted and checked here. I don't find one of them on your comments though:

The SQL Server MAX SERVER MEMORY setting should be set manually (dynamic memory allocation is turned off) so that enough virtual memory is left for the Full-Text Search service to run. To achieve this, select a MAX SERVER MEMORY setting that once set, leaves enough virtual memory so that the Full-Text Search service is able to access an amount of virtual memory equal to 1.5 times the amount of physical RAM in the server. This will take some trial and error to achieve this setting.

暖伴 2024-07-29 21:19:39

Improve the Performance of Full-Text Indexes: http://msdn.microsoft.com/en-us/library/ms142560.aspx

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文