行端不可知差异?

发布于 2024-07-14 00:12:11 字数 416 浏览 9 评论 0 原文

我正在 Mac 上工作,有一些相当旧的文件。 不同的文件由不同的程序创建,因此其中一些以 \r (Mac) 结尾,有些以 \n (Unix) 结尾。 我希望能够在这些文件上运行 diff、grep 等命令,但具有 \r 的文件被视为一大行。 是否有 diff 的版本,grep 等可以与所有换行符一起正常工作吗?

ETA:我还希望它们成为 Unix 实用程序,这样我就可以在脚本、Emacs 等中使用它们......

I'm working on a Mac, with some fairly old files. Different files were created by different programs, so some of them end with \r (Mac) and some with \n (Unix). I want to be able to run commands like diff, grep, etc. on these files, but the ones that have \r are treated as one giant line. Are there versions of diff, grep, etc. that will work correctly with all new-lines?

ETA: I'd also like them to be Unix utilities, so I can use them in scripts, Emacs, etc...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

萌吟 2024-07-21 00:12:11

如果您使用 diff -w 它将忽略文件中的空格,这可能足以满足您的需求。

编辑:刚刚意识到我第一次读错了这篇文章,而您实际上正在寻找适用于 \r 行结尾的差异。 我的建议是使用 flip 之类的内容转换文件 可以将文件转换为 \n 标准格式。

编辑2:刚刚找到了一些看起来像你想要的东西 - <强>差异补丁

Diff'nPatch 是 Macintosh 的端口
GNU 的“diff”、“patch”和“cmp”
公用事业。 它可以让您比较并
查找两个文件之间的差异或
文件夹,整理两个文件,生成
各种格式的差异(正常,
上下文、unidiff 等),应用
patch,逐字节比较文件。
它可以处理任何类型的行结尾
(mac、unix 或 windows)

If you use diff -w it will ignore whitespace in the files, which is probably sufficient for your needs.

EDIT: just realized I misread the post the first time and you're actually looking for a diff that will work with \r line endings. My suggestion would be to convert the files with something like flip that can convert the files to a \n standard format.

EDIT 2: Just found something that looks like what you want - Diff'nPatch:

Diff'nPatch is a port to the Macintosh
of the GNU 'diff', 'patch' and 'cmp'
utilities. It lets you compare and
find differences between two files or
folders, collate two files, generate
diffs in various formats (normal,
context, unidiff, etc.), apply
patches, compare files byte by byte.
It can handle any type of line endings
(mac, unix or windows)

_畞蕅 2024-07-21 00:12:11

正如 Jay 所说,Diff'nPatch 似乎正是您所寻找的。 或者,您可以在单个命令中将所有“\r”行结尾转换为“\n”,如下所示:

sed -ie 's/\r/\n/' filename

或者

find . | xargs -n1 sed -ie 's/\r/\n/'

(在后一种情况下,您可能希望以某种方式过滤文件列表,否则它将应用于所有所有子目录中的文件。)

As Jay said, Diff'nPatch seems what you are looking for. Alternatively you can convert all your '\r' line endings in '\n' in a single command like this:

sed -ie 's/\r/\n/' filename

or

find . | xargs -n1 sed -ie 's/\r/\n/'

(You may want to filter the list of files in some way in the latter case or it will be applied to all the files in all subdirectories.)

梦途 2024-07-21 00:12:11

OS X v10.7 (Lion) 捆绑的 diff 实用程序有一个选项 'strip -trailing-cr' 可以满足您的要求。 你像这样使用它:

diff -cpt a.c b.c --strip-trailing-cr

The diff utility bundled with OS X v10.7 (Lion) has an option 'strip-trailing-cr' that does that you want. You use it like so:

diff -cpt a.c b.c --strip-trailing-cr
当爱已成负担 2024-07-21 00:12:11

PhpStorm 的 diff 视图的“忽略空白”正常工作。 它会自动忽略回车符/EOL/换行符/what-have-you 中的差异。 你可以浪费时间摆弄晦涩的 Unix 命令或其他什么,或者你可以得到一些真正有效的东西并继续生活。

  • 在 OS X v10.8 (Mountain Lion) 上使用上述任何一种解决方案均失败(包括标记为正确答案的解决方案)。 “Diff-npatch”的所有下载链接均失败。 (我确实找到了 http://webperso.easyconnect.fr/bdesgraupes/tools.html,但我真的不喜欢必须诉诸使用无法从命令行调用的 diff 工具,从而与我可能使用的任何 IDE 或版本控制系统工具集成的想法,例如 BBEdit, Sourcetree 或 SmartSVN - 顺便说一句,所有这些都无法使用其内置 diff 工具忽略换行符。

是的,我的换行符是 \r,但是那又怎样?太愚蠢了,无法意识到 \r == \n 那么我将使用不同的软件, 足够聪明

PhpStorm 是唯一具有“刚刚工作”的 diff 工具的软件 - - 这就是我希望 Mac 软件能够做到的事情。我希望 Mac 软件能够正常工作。 我使用Mac,所以我可以完成我的工作,而不是每次都学习晦涩难懂的终端命令,这些命令几乎都没有很好的文档记录,期望你在没有任何明确示例的情况下理解命令应该如何格式化,所以你永远不知道如果你做错了或者命令就像所有其他糟糕的软件一样根本不起作用

以“man diff”为例:

<代码> -I RE --ignore-matching-lines=RE 
            忽略所有行都与 RE 匹配的更改。 
  

好吧,读完这篇文章后,我不知道这意味着什么。 没有其用法的示例。 什么是“RE”? 它没有说任何地方。

然后就是这颗宝石:

 --GTYPE-group-format=GFMT 
            类似,但使用 GFMT 格式化 GTYPE 输入组。 

     --行格式=LFMT 
            类似,但使用 LFMT 格式化所有输入行。 

     --LTYPE-行格式=LFMT 
            类似,但使用 LFMT 格式化 LTYPE 输入行。 

     LTYPE 是“旧”、“新”或“未更改”。 
            GTYPE 是 LTYPE 或“已更改”。 

            GFMT 可能包含: 

     %<   FILE1 中的行 

     %>   FILE2 中的行 

     %= FILE1 和 FILE2 共有的行 

     %[-][宽度][.[PREC]]{doxX}字母 
            LETTER 的 printf 样式规范 

            新组的字母如下,旧组的小写字母: 

     F 第一行号 

     L 最后行号 

     N 行数 = L-F+1 

     E F-1 

     中号+1 

            LFMT 可能包含: 

     %L 行内容 

     %l 行内容,不包括任何尾随换行符 

     %[-][宽度][.[PREC]]{doxX}n 
            输入行号的 printf 样式规范 

            GFMT 或 LFMT 可能包含: 

     %% % 

     %c'C' 单个字符 C 

     %c'\OOO' 
            八进制代码 OOO 的字符 
  

我无法理解这段话的任何意义。 什么是“输入”? 是两个文件还是只是“目标”文件或只是“来自”文件? “相似”指的是什么? “GFMT '是' LTYPE 或‘已更改’”这句话中的“是”是什么意思? 这是否意味着“可能被替换”? 如果是这样,那么为什么“GFMT”不在引号、括号等中? 由于没有举出例子,所以无从得知; 该文档的措辞完全不明确。 “GFMT 可能包含”...是什么意思? “包含”是否意味着替换首字母缩略词 GFMT 的文本可能包含该内容? 没有明确的例子,它是完全没有用的。

如果你想让它变得如此神秘和模棱两可,基本上对那些还不知道如何使用该软件的人来说毫无用处,为什么还要费心去写一个手册页呢? 到那时,它就不再是一本手册了;而是一本手册。 它只是为编写该软件的人提供的快速参考页面,以便他们记住如何使用它。 我猜他们认为如果您想知道它的实际用途,您只需阅读源代码本身即可。

我的时间很宝贵。 我宁愿花钱购买一个实际上可以正常工作并且有适当文档的软件。

因为这些都失败了:

 diff -d --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...无法忽略 \r 字符。

 diff -wd --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...无法忽略 \r 字符。

 diff -wd --suppress-common-lines --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...无法忽略 \r 字符。

 diff -wd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space

...无法忽略 \r 字符。

 diff -awd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space

...无法忽略 \r 字符。

就此而言,如果它们是 \n 字符,则在添加 \n 字符时也会失败。

其中 test.phtml ==

和rest.html ==

foobar

“diff”命令总是给你类似的东西:

<小时>

*** 1,2 **** ! 噗! bar \ 文件末尾没有换行符

--- 1 ---- ! foob​​ar \ 文件末尾没有换行符

...失败!

PhpStorm's diff view's "ignore whitespace" just works. It automatically ignores differences in the carriage return / EOL / newline / what-have-you. You can waste your time fiddling with arcane Unix commands or whatever, or you could just get something that actually works and move forward with life.

  • Using any of the above-mentioned solutions failed on OS X v10.8 (Mountain Lion) (including the one marked as the correct answer). All the download links for "Diff-npatch" failed. (I did find http://webperso.easyconnect.fr/bdesgraupes/tools.html, but I really don't like the idea of having to resort to using a diff tool that cannot be invoked from the command line and thus integrated with whatever IDE or version control system tool I might be using, like BBEdit, Sourcetree, or SmartSVN -- all of which, BTW, failed to ignore newlines with their built-in diff tool.

Yes, my newlines are \r, but so what? Arrr! If the software is too stupid to realize that \r == \n then I'm just going to use different software that is smart enough.

PhpStorm was the only software that had a diff tool that "just worked" -- which is what I expect Mac software to do. I expect Mac software to just work. I use a Mac, so I can do my job instead of learning arcane terminal commands at every turn, which are almost all poorly documented, expecting you to just understand how the commands are supposed to be formatted without any clear examples, so you never know if you're doing it wrong or if the command simply doesn't work just like all other bad software.

Take this example from "man diff":

   -I RE  --ignore-matching-lines=RE
          Ignore changes whose lines all match RE.

OK, so having read this, I have no idea what it means. There is no example of its usage. What is "RE"? It doesn't say anywhere.

Then there's this jewel:

  --GTYPE-group-format=GFMT
          Similar, but format GTYPE input groups with GFMT.

   --line-format=LFMT
          Similar, but format all input lines with LFMT.

   --LTYPE-line-format=LFMT
          Similar, but format LTYPE input lines with LFMT.

   LTYPE is `old', `new', or `unchanged'.
          GTYPE is LTYPE or `changed'.

          GFMT may contain:

   %<     lines from FILE1

   %>     lines from FILE2

   %=     lines common to FILE1 and FILE2

   %[-][WIDTH][.[PREC]]{doxX}LETTER
          printf-style spec for LETTER

          LETTERs are as follows for new group, lower case for old group:

   F      first line number

   L      last line number

   N      number of lines = L-F+1

   E      F-1

   M      L+1

          LFMT may contain:

   %L     contents of line

   %l     contents of line, excluding any trailing newline

   %[-][WIDTH][.[PREC]]{doxX}n
          printf-style spec for input line number

          Either GFMT or LFMT may contain:

   %%     %

   %c'C'  the single character C

   %c'\OOO'
          the character with octal code OOO

I could make no sense whatsoever of this passage. What is the "input"? Is it both files or just the "to" file or just the "from" file? What is "similar" referring to? What does "is" mean in the sentence, "GFMT 'is' LTYPE or `changed'"? Does it mean "may be replaced by"? If so then why isn't "GFMT" in quotations, brackets, etc.? Since no example is given, there is no way to know; the documentation's wording is totally ambiguous. What does "GFMT may contain"... mean? Does "contain" mean that the text replacing the acronym GFMT may contain that? Without a clear example it's completely useless.

Why even bother to write a man page if you're going to make it so cryptic and ambiguous it's useless to anyone who doesn't already know how to use the software, basically? At that point, it's not a manual; it's just a quick-reference page for the guys who wrote the software so they can remember how to use it. I guess they assume you'll just read the source-code itself if you want to know what it actually does.

My time is valuable. I'd rather just pay the money to have a piece of software that actually works correctly and has proper documentation.

Because these all failed:

 diff -d --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...failed to ignore \r characters.

 diff -wd --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...failed to ignore \r characters.

 diff -wd --suppress-common-lines --strip-trailing-cr --ignore-all-space --from-file=rest.phtml test.phtml

...failed to ignore \r characters.

 diff -wd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space

...failed to ignore \r characters.

 diff -awd test.phtml rest.phtml --suppress-common-lines --strip-trailing-cr --ignore-all-space

...failed to ignore \r characters.

For that matter if they were \n characters it also failed when the \n characters are added.

Where test.phtml ==

foo

bar

and rest.html ==

foobar

The "diff" command always gives you something like:


*** 1,2 **** ! foo ! bar \ No newline at end of file

--- 1 ---- ! foobar \ No newline at end of file

... fail!

囍笑 2024-07-21 00:12:11

dos2unix 命令可能有助于首先将文件转换为一致的格式。 我相信它几乎适用于您能想到的所有平台,并且可以同时运行大量文件。 我相信有一个适用于 Mac 的软件包。

The dos2unix command could be helpful in converting your files to a consistent format first. I believe it's available for just about every platform you can think of and can run on lots of files at once. I believe there's a package available for Mac.

拥抱没勇气 2024-07-21 00:12:11

我使用了以下快速修复,它有缺点(见下文):

1:进行比较并仅列出文件名

diff -r -q dir1/ dir2/

2:使用编辑器打开并保存每个列出的文件使用过,这将改变行结尾。

3:定期进行比较

缺点包括:

  • 不太健壮,容易出错,
  • 如果您有大量文件,则需要更多工作

I used the following quick fix, which has drawbacks (see below):

1: Do a diff and list only the filenames

diff -r -q dir1/ dir2/

2: Open and save every listed file with the editor that was used, this will change the line-endings.

3: Do a regular diff

Drawbacks include:

  • less robust, error prone
  • more work if you have lots of files
拥抱没勇气 2024-07-21 00:12:11

这对我有用:

diff -r --ignore-all-space dir1/ dir2/

我使用的是 OS X,并且混合了来自 OS X 和 Windows 的文件。

信用:http ://www.codealpha.net/514/diff-and-ignoring-spaces-and-end-of-lines-unix-dos-eol/

This worked for me:

diff -r --ignore-all-space dir1/ dir2/

I am on OS X and have mixed files from OS X and Windows.

Credit: http://www.codealpha.net/514/diff-and-ignoring-spaces-and-end-of-lines-unix-dos-eol/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文