在 C++ 中跨平台处理行尾字符

发布于 2024-11-06 11:44:15 字数 2877 浏览 1 评论 0原文

我正忙于编写通用文本文件阅读器类，并且正在努力编写代码以正确处理 Mac、Linux 和 Windows 的行尾 (EOL) 字符。

我已经对这个问题进行了一些阅读，一旦我使用 getline( ) 读取了文本文件的内容并将字符串存储在映射中，我就在 TextFileReader 类中提出了以下函数来去除 EOL 字符。

//! Strip End-Of-Line characters.
void TextFileReader::stripEndOfLineCharacters( )
{
    // Search through container of data and remove newline characters.
    string::size_type stringPosition_ = 0;
    string searchString_ = "\r";
    string replaceString_ = "";

    for ( unsigned int i = 0; i < 1; i++ )
    {
        for ( iteratorContainerOfDataFromFile_
              = containerOfDataFromFile_.begin( );
              iteratorContainerOfDataFromFile_
              != containerOfDataFromFile_.end( );
              iteratorContainerOfDataFromFile_++ )
            {
                while ( ( stringPosition_ = iteratorContainerOfDataFromFile_
                          ->second.find( searchString_,
                                         stringPosition_ ) ) != string::npos )
                {
                    // Replace search string with replace string.
                    iteratorContainerOfDataFromFile_->second
                        .replace( stringPosition_, searchString_.size( ),
                                  replaceString_ );

                    // Advance string position.
                    stringPosition_++;
                }
            }

        // Switch search string.
        searchString_ = "\n";
    }
}

我认为这会消除跨平台的所有 EOL 字符，但事实似乎并非如此。它在我的 Mac 上运行良好，运行 Mac OS 10.5.8。但它似乎不适用于 Windows 系统。奇怪的是，在 Windows 系统上运行此函数会删除映射中第一个字符串的 EOL 字符，而其余字符仍然太长。

这让我想到，也许我不能只替换“\r”和“\n”字符，但我读到的所有内容都表明它是 Windows 用来表示 EOL 字符的两个组合。

为了使其更明确，这是我尝试执行的操作的分步布局。我有两个名为 testFileMadeWithWindows.txt 和 testFileMadeWithMac.txt 的文本文件。

在 Windows 计算机上使用记事本打开第一个文件，其中包含以下内容。

这是第 1 行。
这是第 2 行。
这是第 3 行。

在 Mac 上使用 TextEdit 打开第二个文件，它包含以下内容。

这是第 1 行。
这是第 2 行。
这是第 3 行。

换句话说，两个文件的文件内容应该相同。我想使用 FileReader 类读取这两个文件并将字符串存储在地图中。为了实现这一点，我使用 getline() 函数。

当我使用 getline( ) 读取 testFileMadeWithWindows.txt 时，结果发现字符串大小如下：

16
16
15

同样，当我使用 getline( ) 读取 testFileMadeWithMac.txt 时，结果发现字符串大小如下：

16
16
15

现在，我执行在包含此数据的地图上的第一篇文章中发布的 stripEndOfLineCharacters( ) 函数。

对于 testFileMadeWithWindows.txt，这会产生以下字符串大小：

15
16
15

对于 testFileMadeWithMac.txt，这会产生以下字符串大小：

15
15
15

我使用 string::compare 将从文本文件中读取的字符串与预期的字符串数据进行比较，该数据应该是：

这是第 1 行。
这是第 2 行。
这是第 3 行。

比较失败，特别是与第二行的比较失败。对于所有三个字符串，Mac 比较均成功。我想知道如何解决这个问题，以便 Windows 比较也成功。

任何意见将不胜感激。提前致谢！

卡蒂克

原文

I'm busy writing a generic textfile reader class and I'm struggling to write the code to deal correctly with end-of-line (EOL) characters for Mac, Linux and Windows.

I've done a some reading on the issue and I came up with the following function within my TextFileReader class to strip EOL characters, once I've read the contents of a textfile using getline( ) and stored the strings in a map.

//! Strip End-Of-Line characters.
void TextFileReader::stripEndOfLineCharacters( )
{
    // Search through container of data and remove newline characters.
    string::size_type stringPosition_ = 0;
    string searchString_ = "\r";
    string replaceString_ = "";

    for ( unsigned int i = 0; i < 1; i++ )
    {
        for ( iteratorContainerOfDataFromFile_
              = containerOfDataFromFile_.begin( );
              iteratorContainerOfDataFromFile_
              != containerOfDataFromFile_.end( );
              iteratorContainerOfDataFromFile_++ )
            {
                while ( ( stringPosition_ = iteratorContainerOfDataFromFile_
                          ->second.find( searchString_,
                                         stringPosition_ ) ) != string::npos )
                {
                    // Replace search string with replace string.
                    iteratorContainerOfDataFromFile_->second
                        .replace( stringPosition_, searchString_.size( ),
                                  replaceString_ );

                    // Advance string position.
                    stringPosition_++;
                }
            }

        // Switch search string.
        searchString_ = "\n";
    }
}

I thought that this would eliminate all EOL characters cross-platform but that doesn't seem to be the case. It works fine on my Mac, running Mac OS 10.5.8. It doesn't seem to work on Windows systems though. Strangely, on Windows systems running this function on strips the EOL character for the first string in the map and the rest of them are still one character too long.

This leads me to thinking that maybe I can't just replace the "\r" and "\n" characters, but everything I read suggests that it's the combination of the two that Windows uses to represent EOL characters.

To make it more explicit, here's a step-by-step layout of what I'm attempting to do. I have two textfiles called testFileMadeWithWindows.txt and testFileMadeWithMac.txt.

Open the first file with Notepad on a Windows machine and it contains the follows.

This is line 1.
This is line 2.
This is line 3.

Open the second file with TextEdit on a Mac and it contains the follows.

This is line 1.
This is line 2.
This is line 3.

In other words, the file content of both files is intended to be identical. I want to read both these files using my FileReader class and store the strings in maps. To achieve this I use the getline() function.

When I read in testFileMadeWithWindows.txt using getline( ), it turns out that the string sizes are as follows:

16
16
15

Similarly, when I read in testFileMadeWithMac.txt using getline( ), it turns out that the string sizes are as follows:

16
16
15

I now execute the stripEndOfLineCharacters( ) function that I posted in my first post on maps containing this data.

For testFileMadeWithWindows.txt this results in the following string sizes:

15
16
15

For testFileMadeWithMac.txt this results in the following string sizes:

15
15
15

I use string::compare to compare the strings I have read in from the textfiles with the expected string data, which should be:

This is line 1.
This is line 2.
This is line 3.

The Windows comparison fails, specifically the comparison with the second line fails. The Mac comparison is successful for all three strings. I would like to know how to solve this such that the Windows comparison is successful too.

Any input would be appreciated. Thanks in advance!

Kartik

分享到QQ

分享到微博