批量复制 - sybase

发布于 2024-09-29 23:55:03 字数 377 浏览 9 评论 0原文

我需要有选择地（行和列）将大约 2000 万行从一个表导出到另一个表。这是我尝试过的：

--Run this in batch:
INSERT INTO Table 2
Select A, B from Table1 
where A > a and B < b

---Table1 have columns A, B....Z and got around 50 million records.

大约需要 12 个小时才能完成。我不认为 sybase 允许 bcp out 与 Table1 中的选择性列和行以及 bcp in 到 Table2 。是否有可以使用的替代快速方法？如果能做到的话我会很高兴< 4小时。

感谢您阅读它。

原文

I need to selectively (both rows and columns) export around 20 million rows from one table to another. This is what I tried:

--Run this in batch:
INSERT INTO Table 2
Select A, B from Table1 
where A > a and B < b

---Table1 have columns A, B....Z and got around 50 million records.

It takes around 12 hours to finish. I don't think sybase allows bcp out with selective columns and rows from Table1 and bcp in to Table2. Is there an alternative fast approach that can be used? I would be happy if it can be done < 4 hours.

Thanks for reading it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

站稳脚跟 2024-10-06 23:55:03

我认为您的意思是：

WHERE PK > start_value AND PK < end_value

没有充分的理由在同一服务器上的两个表中复制数据，因此希望这些表位于不同的服务器上。如果您正在“归档”，请注意，这是错误的做法；相反，提高工作台速度。请参阅这篇文章。

INSERT-SELECT 将终止事务日志，这会使您的运行速度逐渐变慢，并阻止其他用户使用该数据库。如果你分成 1000 行的批次，它会更快、更友好。

如果能做到的话我会很高兴 4小时

应该没问题。取决于您的硬件和磁盘布局。我可以在运行 ASE 15.5 的小演示盒上在 13.2 秒内加载 1600 万行。
bcp 根据以下条件自动以两种模式运行：
- 快。这需要设置 SELECT_INTO/BULK_COPY sp_dboption，这允许 bcp 不记录插入，仅记录分配。还要求删除表上的索引（可以在 bcp 完成后创建它们）。
- 慢。不满足上述条件之一。所有插入都会被记录。确保日志上有一个转储它的阈值（它将填满）。
out_data_file 或 Table_2 作为 Table_1 列的子集完全没有问题。在 Table_1 服务器上创建 Table_2 的视图。 Bcp-out 视图。您还可以在视图中放置 WHERE 子句、进行转换等。
您可以并行执行 bcp（最多可达主机系统上的 CPU/内核数）。将提取拆分为多个并行流（例如，在 8 核计算机上，并行执行 8 个提取作业）。使用 -F 和 -L 参数指定 Table_1 的八分之一。使用“&”如果您有操作系统，则有 8 个 BAT 文件。
您还可以并行运行（例如）8 个 INSERT-SELECT 作业。按 PK 值而不是行号拆分。

I think you mean:

WHERE PK > start_value AND PK < end_value

There is no good reason to duplicating data in two tables on the same server, so hopefully the tables are on separate servers. If you are "archiving", then be advised, that is the wrong thing to do; enhance the table speed instead. Refer this post.

That INSERT-SELECT will kill the transaction log, which would run progressively slower for you, and prevent other users from using the db. If you break into batches of 1000 rows, it will be faster and more sociable.

I would be happy if it can be done < 4 hours

Should be no problem. Depends on your hardware and disk layout. I can load 16 million rows in 13.2 secs on my little demo box running ASE 15.5.
bcp runs in two modes, automatically, depending on the conditions as follows:
- FAST. this requires SELECT_INTO/BULK_COPY sp_dboption to be set, which allows bcp to NOT log INSERTS, only Allocations. Also requires the indices on the table to be dropped (they can be created after bcp finishes).
- SLOW. Either of the above conditions not being met. All INSERTS are logged. Ensure you have a Threshold on the log that dumps it (it WILL fill).
No problem at all for either the out_data_file or Table_2 to be a subset of columns of Table_1. Create a view of Table_2 on the Table_1 server. Bcp-out the view. You can also place a WHERE clause in the view, do transforms, etc.
You can exec bcp in parallel (up to the no of CPU/Cores you have on your host system). Split the extract into the no of parallel streams (eg. on an 8 core machine, exec 8 extract jobs in parallel). Use the -F and -L parameters to specify one eighth of Table_1. Use "&" if you have an o/s and 8 x BAT files if you don't.
You can also run (eg) 8 x INSERT-SELECT jobs in parallel. Split by PK value, not row number.

回复收藏 0 原文

~没有更多了~