确定文本文件中使用的行结尾

发布于 2024-09-05 07:12:32 字数 51 浏览 2 评论 0原文

C# 中确定文本文件(Unix、Windows、Mac)中使用的行结尾的最佳方法是什么?

Whats the best way in C# to determine the line endings used in a text file (Unix, Windows, Mac)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

送你一个梦 2024-09-12 07:12:32

请注意,文本文件可能具有不一致的行结尾。你的程序不应该因此而卡住。在 StreamReader(和类似方法)上使用 ReadLine 将自动处理任何可能的行结束。

如果您手动从文件中读取行,请确保接受任何行结尾,即使不一致。实际上,使用以下算法非常容易:

  • 向前扫描,直到找到 CR 或 LF。
  • 如果你读 CR,请向前看下一个字符;
  • 如果下一个字符是 LF,则消耗它(否则,将其放回去)。

Notice that text files may have inconsistent line endings. Your program should not choke on that. Using ReadLine on a StreamReader (and similar methods) will take care of any possible line ending automatically.

If you manually read lines from a file, make sure to accept any line endings, even if inconsistent. In practice, this is quite easy using the following algorithm:

  • Scan ahead until you find either CR or LF.
  • If you read CR, peek ahead at the next character;
  • If the next character is LF, consume it (otherwise, put it back).
意中人 2024-09-12 07:12:32

这里有一些高级猜测:读取文件,计算 CR 和 LF

if (CR > LF*2) then "Mac" 
else if (LF > CR*2) then "Unix"
else "Windows"

另请注意,较新的 Mac (Mac OS X) 使用 Unix 行结尾

Here is some advanced guesswork: read the file, count CRs and LFs

if (CR > LF*2) then "Mac" 
else if (LF > CR*2) then "Unix"
else "Windows"

Also note, that newer Macs (Mac OS X) use Unix line endings

甜是你 2024-09-12 07:12:32

我只需在文件中搜索第一个 \r\n ,如果它是 \n 我会查看上一个字符来查看它是否是 \r,如果是,则为 \r\n,否则以找到的为准。

I'd just search the file for the first \r or \n and if it was a \n I'd look at the previous character to see if it's a \r, if so, it's \r\n otherwise it's whichever found.

南…巷孤猫 2024-09-12 07:12:32

我想你不能确定,必须在编辑器中设置它。你可以使用一些人工智能,算法是:

  1. 搜索每种类型的行结尾,你会搜索那些特定的字符
  2. 测量它们之间的距离。
  3. 如果一种类型倾向于重复,那么你就认为就是这种类型。计算重复次数并使用某种分散度度量。

因此,例如,如果您在 38、40、45 处重复了 CRLF,并且这在容差范围内,那么您将默认假设行结尾是 CRLF。

I would imagine you couldn't know for sure, would have to set this in the editor. You could use some AI, the algorithm would be:

  1. Search for each type of line ending, you'd search those specific characters
  2. Measure the distances between the them.
  3. If one type tends to repeat then you assume that's the type. Count the repeats and use some measure of dispersion.

So, for example, if you had repeats of CRLF at 38, 40, 45, and that was within tolerance you'd default to assuming the line end was CRLF.

染墨丶若流云 2024-09-12 07:12:32

如果是我,我只会一次读取一个字符,直到遇到第一个 \r\n。这是假设你有理智的输入。

If it were me, I'd just read the file one char at a time until I came across the first \r or a \n. This is assuming you have sensical input.

兔姬 2024-09-12 07:12:32

阅读大多数文本格式时,我通常会查找 \n,然后 Trim() 整个字符串(开头和结尾的空格通常是多余的)。

Reading most of textual formats I usually look for \n, and then Trim() the whole string (whitespaces at beginning and end are often redundant).

不顾 2024-09-12 07:12:32

虽然有 Environment.NewLine ,但它仅用于确定当前系统上使用的内容,并且无助于从各种来源读取文件。

如果它正在读取,我通常会查找 \n (编辑:显然有一些只使用 \r)并假设该行在那里结束。

There is Environment.NewLine though that is only for determining what is used on the current system and won't help with reading files from various sources.

If it's reading I usually look for \n (Edit: apperantly there are some using only \r) and assume that the line ends there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文