为什么我的 Bash 脚本添加到文件的开头?
我编写了一个脚本,使用 sed 清理 .csv 文件,删除一些错误的逗号和错误的引号(不好的,意味着它们破坏了我们用来转换这些文件的内部程序):
# remove all commas, and re-insert the good commas using clean.sed
sed -f clean.sed $1 > $1.1st
# remove all quotes
sed 's/\"//g' $1.1st > $1.tmp
# add the good quotes around good commas
sed 's/\,/\"\,\"/g' $1.tmp > $1.tmp1
# add leading quotes
sed 's/^/\"/' $1.tmp1 > $1.tmp2
# add trailing quotes
sed 's/$/\"/' $1.tmp2 > $1.tmp3
# remove utf characters
sed 's/<feff>//' $1.tmp3 > $1.tmp4
# replace original file with new stripped version and delete .tmp files
cp -rf $1.tmp4 quotes_$1
这是 clean.sed:
s/\",\"/XXX/g;
:a
s/,//g
ta
s/XXX/\",\"/g;
然后它删除了临时文件和中提琴我们有一个以单词“quotes”开头的新文件,我们可以将其用于其他进程。
我的问题是:
为什么我必须制作 sed 语句来删除该临时文件中的 feff 标记?原始文件没有它,但它总是出现在替换文件中。起初我以为 cp 导致了这个问题,但是如果我在 cp 之前放入要删除的 sed 语句,则它不存在。
也许我只是错过了一些东西......
I've written a script that cleans up .csv files, removing some bad commas and bad quotes (bad, means they break an in house program we use to transform these files) using sed:
# remove all commas, and re-insert the good commas using clean.sed
sed -f clean.sed $1 > $1.1st
# remove all quotes
sed 's/\"//g' $1.1st > $1.tmp
# add the good quotes around good commas
sed 's/\,/\"\,\"/g' $1.tmp > $1.tmp1
# add leading quotes
sed 's/^/\"/' $1.tmp1 > $1.tmp2
# add trailing quotes
sed 's/$/\"/' $1.tmp2 > $1.tmp3
# remove utf characters
sed 's/<feff>//' $1.tmp3 > $1.tmp4
# replace original file with new stripped version and delete .tmp files
cp -rf $1.tmp4 quotes_$1
Here is clean.sed:
s/\",\"/XXX/g;
:a
s/,//g
ta
s/XXX/\",\"/g;
Then it removes the temp files and viola we have a new file that starts with the word "quotes" that we can use for our other processes.
My question is:
Why do I have to make a sed statement to remove the feff tag in that temp file? The original file doesn't have it, but it always appears in the replacement. At first I thought cp was causing this but if I put in the sed statement to remove before the cp, it isn't there.
Maybe I'm just missing something...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
U+FEFF 是字节顺序标记的代码点。您的文件很可能包含以 UTF-16 保存的数据,并且 BOM 已被您的“清理过程”损坏,而“清理过程”很可能需要 ASCII。删除 BOM 可能不是一个好主意,而是修复脚本以使其从一开始就不会损坏。
U+FEFF is the code point for a byte order mark. Your files most likely contain data saved in UTF-16 and the BOM has been corrupted by your 'cleaning process' which is most likely expecting ASCII. It's probably not a good idea to remove the BOM, but instead to fix your scripts to not corrupt it in the first place.
要在 GNU emacs 中摆脱这些:
还有一种方法可以将具有 DOS 行终止约定的文件转换为 Unix 行终止约定。
To get rid of these in GNU emacs:
There is also a way to convert files with DOS line termination convention to Unix line termination convention.
当我想回显以前用以下命令清除的文件中的行时,就发生了这种情况: echo "" > somefile.txt
当我删除该文件并再次运行 echo 时,在第一次 echo 期间创建文件时不再出现“feff”。
It happend to me when I wanted to echo lines in a file I previously cleared with: echo "" > somefile.txt
When I removed the file and run echo's again, the "feff" is not appearing anymore at file creation during the first echo.