为什么我的 Bash 脚本添加到文件的开头?

发布于 2024-08-16 09:37:33 字数 923 浏览 13 评论 0原文

我编写了一个脚本,使用 sed 清理 .csv 文件,删除一些错误的逗号和错误的引号(不好的,意味着它们破坏了我们用来转换这些文件的内部程序):

# remove all commas, and re-insert the good commas using clean.sed
sed -f clean.sed $1 > $1.1st

# remove all quotes
sed 's/\"//g' $1.1st > $1.tmp

# add the good quotes around good commas
sed 's/\,/\"\,\"/g' $1.tmp > $1.tmp1

# add leading quotes
sed 's/^/\"/' $1.tmp1 > $1.tmp2

# add trailing quotes
sed 's/$/\"/' $1.tmp2 > $1.tmp3

# remove utf characters
sed 's/<feff>//' $1.tmp3 > $1.tmp4

# replace original file with new stripped version and delete .tmp files
cp -rf $1.tmp4 quotes_$1

这是 clean.sed:

s/\",\"/XXX/g;
:a
s/,//g
ta
s/XXX/\",\"/g;

然后它删除了临时文件和中提琴我们有一个以单词“quotes”开头的新文件,我们可以将其用于其他进程。

我的问题是:
为什么我必须制作 sed 语句来删除该临时文件中的 feff 标记?原始文件没有它,但它总是出现在替换文件中。起初我以为 cp 导致了这个问题,但是如果我在 cp 之前放入要删除的 sed 语句,则它不存在。

也许我只是错过了一些东西......

I've written a script that cleans up .csv files, removing some bad commas and bad quotes (bad, means they break an in house program we use to transform these files) using sed:

# remove all commas, and re-insert the good commas using clean.sed
sed -f clean.sed $1 > $1.1st

# remove all quotes
sed 's/\"//g' $1.1st > $1.tmp

# add the good quotes around good commas
sed 's/\,/\"\,\"/g' $1.tmp > $1.tmp1

# add leading quotes
sed 's/^/\"/' $1.tmp1 > $1.tmp2

# add trailing quotes
sed 's/$/\"/' $1.tmp2 > $1.tmp3

# remove utf characters
sed 's/<feff>//' $1.tmp3 > $1.tmp4

# replace original file with new stripped version and delete .tmp files
cp -rf $1.tmp4 quotes_$1

Here is clean.sed:

s/\",\"/XXX/g;
:a
s/,//g
ta
s/XXX/\",\"/g;

Then it removes the temp files and viola we have a new file that starts with the word "quotes" that we can use for our other processes.

My question is:
Why do I have to make a sed statement to remove the feff tag in that temp file? The original file doesn't have it, but it always appears in the replacement. At first I thought cp was causing this but if I put in the sed statement to remove before the cp, it isn't there.

Maybe I'm just missing something...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一个人的旅程 2024-08-23 09:37:33

U+FEFF 是字节顺序标记的代码点。您的文件很可能包含以 UTF-16 保存的数据,并且 BOM 已被您的“清理过程”损坏,而“清理过程”很可能需要 ASCII。删除 BOM 可能不是一个好主意,而是修复脚本以使其从一开始就不会损坏。

U+FEFF is the code point for a byte order mark. Your files most likely contain data saved in UTF-16 and the BOM has been corrupted by your 'cleaning process' which is most likely expecting ASCII. It's probably not a good idea to remove the BOM, but instead to fix your scripts to not corrupt it in the first place.

又爬满兰若 2024-08-23 09:37:33

要在 GNU emacs 中摆脱这些:

  1. 打开 Emacs
  2. 执行 find-file-literally 来打开文件
  3. 编辑掉前三个字节
  4. 保存文件

还有一种方法可以将具有 DOS 行终止约定的文件转换为 Unix 行终止约定。

To get rid of these in GNU emacs:

  1. Open Emacs
  2. Do a find-file-literally to open the file
  3. Edit off the leading three bytes
  4. Save the file

There is also a way to convert files with DOS line termination convention to Unix line termination convention.

时光无声 2024-08-23 09:37:33

当我想回显以前用以下命令清除的文件中的行时,就发生了这种情况: echo "" > somefile.txt

当我删除该文件并再次运行 echo 时,在第一次 echo 期间创建文件时不再出现“feff”。

It happend to me when I wanted to echo lines in a file I previously cleared with: echo "" > somefile.txt

When I removed the file and run echo's again, the "feff" is not appearing anymore at file creation during the first echo.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文