如何有效地搜索/替换大型txt文件？

发布于 2024-09-15 11:01:31 字数 549 浏览 5 评论 0原文

我有一个相对较大的 csv/文本数据文件 (33mb)，我需要对其进行全局搜索并替换定界字符。（原因是似乎没有办法让 SQLServer 在表导出期间转义/处理数据中的双引号，但那是另一个故事......）

我成功完成了 Textmate 搜索并替换为较小的文件，但是这个更大的文件令人窒息。

看起来命令行 grep 可能是答案，但我不太掌握语法，唉：

grep -rl OLDSTRING . | xargs perl -pi~ -e ‘s/OLDSTRING/NEWSTRING/’

所以在我的例子中，我正在搜索 '^' （插入符号）字符并替换为 '"' （双引号））

grep -rl " grep_test.txt | xargs perl -pi~ -e 's/"/^'

这不起作用，我假设它与双引号或其他东西的转义有关，但我很迷茫。

（我想如果有人知道如何让 SQLServer2005 处理的话）。导出到 csv 期间文本列中的双引号，这确实解决了核心问题。）

原文

I have a relatively large csv/text data file (33mb) that I need to do a global search and replace the delimiting character on. (The reason is that there doesn't seem to be a way to get SQLServer to escape/handle double quotes in the data during a table export, but that's another story...)

I successfully accomplished a Textmate search and replace on a smaller file, but it's choking on this larger file.

It seems like command line grep may be the answer, but I can't quite grasp the syntax, ala:

grep -rl OLDSTRING . | xargs perl -pi~ -e ‘s/OLDSTRING/NEWSTRING/’

So in my case I'm searching for the '^' (caret) character and replacing with '"' (double-quote).

grep -rl " grep_test.txt | xargs perl -pi~ -e 's/"/^'

That doesn't work and I'm assuming it has to do with the escaping of the doublequote or something, but I'm pretty lost. Help anyone?

(I suppose if anyone knows how to get SQLServer2005 to handle double quotes in a text column during export to csv, that'd really solve the core issue.)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半边脸i 2024-09-22 11:01:31

你的 perl 替换似乎是错误的。尝试：

grep -rl \" . | xargs perl -pi~ -e 's/\^/"/g'

说明：

grep : command to find matches
-r : to recursively search
-l : to print only the file names where match is found
\" : we need to escape " as its a shell meta char
. : do the search in current working dir
perl : used here to do the inplace replacement
-i~ : to do the replacement inplace and create a backup file with extension ~
-p : to print each line after replacement
-e : one line program
\^ : we need to escape caret as its a regex meta char to mean start anchor

Your perl substitution seems to be wrong. Try:

grep -rl \" . | xargs perl -pi~ -e 's/\^/"/g'

Explanation:

grep : command to find matches
-r : to recursively search
-l : to print only the file names where match is found
\" : we need to escape " as its a shell meta char
. : do the search in current working dir
perl : used here to do the inplace replacement
-i~ : to do the replacement inplace and create a backup file with extension ~
-p : to print each line after replacement
-e : one line program
\^ : we need to escape caret as its a regex meta char to mean start anchor

回复收藏 0 原文

清浅ˋ旧时光 2024-09-22 11:01:31

sed -i.bak 's/\^/"/g' mylargefile.csv

更新：您也可以按照 Rein 的建议使用 Perl

perl -i.bak -pe 's/\^/"/g' mylargefile.csv

但在大文件上，sed 可能比 Perl 运行得快一点，正如我的结果在 600 万行文件上显示的那样

$ tail -4 file
this is a line with ^
this is a line with ^
this is a line with ^

$ wc -l<file
6136650

$ time sed 's/\^/"/g' file  >/dev/null

real    0m14.210s
user    0m12.986s
sys     0m0.323s
$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.993s
user    0m22.608s
sys     0m0.630s
$ time sed 's/\^/"/g' file  >/dev/null

real    0m13.598s
user    0m12.680s
sys     0m0.362s

$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.690s
user    0m22.502s
sys     0m0.393s

sed -i.bak 's/\^/"/g' mylargefile.csv

Update: you can also use Perl as rein has suggested

perl -i.bak -pe 's/\^/"/g' mylargefile.csv

But on big files, sed may run a bit faster than Perl, as my result shows on a 6million line file

$ tail -4 file
this is a line with ^
this is a line with ^
this is a line with ^

$ wc -l<file
6136650

$ time sed 's/\^/"/g' file  >/dev/null

real    0m14.210s
user    0m12.986s
sys     0m0.323s
$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.993s
user    0m22.608s
sys     0m0.630s
$ time sed 's/\^/"/g' file  >/dev/null

real    0m13.598s
user    0m12.680s
sys     0m0.362s

$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.690s
user    0m22.502s
sys     0m0.393s

回复收藏 0 原文

~没有更多了~