如何在 Linux 中重新添加 unicode 字节顺序标记?

发布于 2024-07-25 21:53:18 字数 196 浏览 3 评论 0原文

我有一个相当大的 SQL 文件,它以 FFFE 的字节顺序标记开头。 我使用 unicode 感知的 linux 分割工具将此文件分割成 100,000 行块。 但是,当将它们传递回 Windows 时,它与第一个部分以外的任何部分都不一样,因为它只有 FFFE 字节顺序标记。

如何使用 echo (或任何其他 bash 命令)添加这两个字节代码?

I have a rather large SQL file which starts with the byte order marker of FFFE. I have split this file using the unicode aware linux split tool into 100,000 line chunks. But when passing these back to windows, it does not like any of the parts other than the first one as only it has the FFFE byte order marker on.

How can I add this two byte code using echo (or any other bash command)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

心头的小情儿 2024-08-01 21:53:18

基于sed的Anonymous的解决方案sed -i '1s/^/\xef\xbb\xbf/' foo 将 BOM 添加到 UTF-8 编码的文件 foo 中。 有用的是它还可以将 ASCII 文件转换为带 BOM 的 UTF8

Based on sed's solution of Anonymous, sed -i '1s/^/\xef\xbb\xbf/' foo adds the BOM to the UTF-8 encoded file foo. Usefull is that it also converts ASCII files to UTF8 with BOM

深海夜未眠 2024-08-01 21:53:18

对于通用解决方案(无论文件是 UTF-8、UTF-16 还是 UTF-32,都设置正确的字节顺序标记)我将使用 vim 的 'bomb' 选项:

$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a                           hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a                   ...hello.

(-e 表示在 ex 模式而不是可视模式下运行;-s 表示不打印状态消息;-c 表示“执行这”)

For a general-purpose solution—something that sets the correct byte-order mark regardless of whether the file is UTF-8, UTF-16, or UTF-32—I would use vim’s 'bomb' option:

$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a                           hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a                   ...hello.

(-e means runs in ex mode instead of visual mode; -s means don’t print status messages; -c means “do this”)

一腔孤↑勇 2024-08-01 21:53:18

要将 BOM 添加到所有以“foo-”开头的文件中,可以使用 sed。 sed 有一个选项可以进行备份。

sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*

strace这表明 sed 创建了一个名称以“sed”开头的临时文件。 如果你确定已经没有BOM,你可以简化命令:

sed -i '1s/^/\xff\xfe/' foo-*

确保你需要设置UTF-16,因为ie UTF-8是不同的。

To add BOMs to the all the files that start with "foo-", you can use sed. sed has an option to make a backup.

sed -i '1s/^\(\xff\xfe\)\?/\xff\xfe/' foo-*

straceing this shows sed creates a temp file with a name starting with "sed". If you know for sure there is no BOM already, you can simplify the command:

sed -i '1s/^/\xff\xfe/' foo-*

Make sure you need to set UTF-16, because i.e. UTF-8 is different.

|煩躁 2024-08-01 21:53:18

尝试 uconv

uconv --add-signature

Try uconv

uconv --add-signature
平生欢 2024-08-01 21:53:18

像(先备份)):

for i in $(ls *.sql)
do
  cp "$i" "$i.temp"
  printf '\xFF\xFE' > "$i"
  cat "$i.temp" >> "$i"
  rm "$i.temp"
done

Something like (backup first)):

for i in $(ls *.sql)
do
  cp "$i" "$i.temp"
  printf '\xFF\xFE' > "$i"
  cat "$i.temp" >> "$i"
  rm "$i.temp"
done
笑看君怀她人 2024-08-01 21:53:18

马修·弗拉申(Matthew Flaschen)的答案是一个很好的答案,但它有一些缺陷。

  • 在原始文件被截断之前,不会检查复制是否成功。最好让一切都取决于成功的复制,或者测试临时文件是否存在,或者对副本进行操作。 如果你是一个喜欢穿腰带和背带裤的人,你会像我下面所示的那样进行组合。
  • ls 是不必要的。
  • 我会使用比“i”更好的变量名 - 也许是“file”。

当然,您可能会非常偏执,并在开始时检查临时文件是否存在,这样您就不会意外覆盖它和/或使用 UUID 或生成的文件名。 mktemp、tempfile 或 uuidgen 之一即可解决此问题。

td=TMPDIR
export TMPDIR=

usertemp=~/temp            # set this to use a temp directory on the same filesystem
                           # you could use ./temp to ensure that it's one the same one
                           # you can use mktemp -d to create the dir instead of mkdir

if [[ ! -d $usertemp ]]    # if this user temp directory doesn't exist
then                       # then create it, unless you can't 
    mkdir $usertemp || export TMPDIR=$td    # if you can't create it and TMPDIR is/was
fi                                          # empty then mktemp automatically falls
                                            # back to /tmp

for file in *.sql
do
    # TMPDIR if set overrides the argument to -p
    temp=$(mktemp -p $usertemp) || { echo "$0: Unable to create temp file."; exit 1; }

    { printf '\xFF\xFE' > "$temp" &&
    cat "$file" >> "$temp"; } || { echo "$0: Write failed on $file"; exit 1; }

    { rm "$file" && 
    mv "$temp" "$file"; } || { echo "$0: Replacement failed for $file; exit 1; }
done
export TMPDIR=$td

陷阱可能比我添加的所有单独的错误处理程序更好。

毫无疑问,对于一次性脚本来说,所有这些额外的谨慎都是多余的,但这些技术可以在紧要关头拯救您,尤其是在多文件操作中。

Matthew Flaschen's answer is a good one, however it has a couple of flaws.

  • There's no check that the copy succeeded before the original file is truncated. It would be better to make everything contingent on a successful copy, or test for the existence of the temporary file, or to operate on the copy. If you're a belt-and-suspenders kind of person, you'd do a combo as I've illustrated below
  • The ls is unnecessary.
  • I'd use a better variable name than "i" - perhaps "file".

Of course, you could be very paranoid and check for the existence of the temporary file at the beginning so you don't accidentally overwrite it and/or use a UUID or a generated file name. One of mktemp, tempfile or uuidgen would do the trick.

td=TMPDIR
export TMPDIR=

usertemp=~/temp            # set this to use a temp directory on the same filesystem
                           # you could use ./temp to ensure that it's one the same one
                           # you can use mktemp -d to create the dir instead of mkdir

if [[ ! -d $usertemp ]]    # if this user temp directory doesn't exist
then                       # then create it, unless you can't 
    mkdir $usertemp || export TMPDIR=$td    # if you can't create it and TMPDIR is/was
fi                                          # empty then mktemp automatically falls
                                            # back to /tmp

for file in *.sql
do
    # TMPDIR if set overrides the argument to -p
    temp=$(mktemp -p $usertemp) || { echo "$0: Unable to create temp file."; exit 1; }

    { printf '\xFF\xFE' > "$temp" &&
    cat "$file" >> "$temp"; } || { echo "$0: Write failed on $file"; exit 1; }

    { rm "$file" && 
    mv "$temp" "$file"; } || { echo "$0: Replacement failed for $file; exit 1; }
done
export TMPDIR=$td

Traps might be better than all the separate error handlers I've added.

No doubt all this extra caution is overkill for a one-shot script, but these techniques can save you when push comes to shove, especially in a multi-file operation.

人生戏 2024-08-01 21:53:18
$ printf '\xEF\xBB\xBF' > bom.txt

然后检查:

$ grep -rl 
\xEF\xBB\xBF' .
./bom.txt
$ printf '\xEF\xBB\xBF' > bom.txt

Then check:

$ grep -rl 
\xEF\xBB\xBF' .
./bom.txt
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文