C++阅读带有口音的文件

发布于 2025-01-19 16:33:35 字数 1191 浏览 0 评论 0原文

美好的一天,我在一个小项目中,我需要读取 .txt 文件,问题是有些是英语,有些是西班牙语,所呈现的情况是某些信息带有重音,我必须将其显示在带有重音的控制台。

我可以使用 setlocale(LC_CTYPE, "C"); 在控制台上显示重音符号,但

我的问题是在读取中读取 .txt 文件时,它不会检测重音符号并读取罕见字符。

我的练习代码是:

#include <iostream>
#include <locale.h>
#include<fstream>
#include<string>

using namespace std;

int main(){
    
    setlocale (LC_CTYPE, "C");

    ifstream file;
    string text;
    
    file.open("entryDisciplineESP.txt",ios::in);
    
    if (file.fail()){
        
        cout<<"The file could not be opened."<<endl;
        
        exit(1); 
        
    }
    
    while(!file.eof()){ 

        getline(file,text);
        
        cout<<text<<endl;
        
    }
    
    cout<<endl;
    
    system("Pause");
    return 0;
}

有问题的 .txt 文件包含:

Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

显然我遇到了 'ó' 问题,但同样,我还有其他 .txt 和其他带有重音符号的字符,所以我需要所有这些字符的解决方案。

研究中我已经阅读并尝试实现 wstring 和 wifstream 但我未能成功实现。

我正在尝试在 Windows 上实现这一目标,就像我需要在 Linux 上工作的解决方案一样,目前我正在使用 dev c++ 5.11

提前非常感谢您的时间和帮助。

Good day, I am in a small project where I need to read .txt files, the problem is that some are in English and others in Spanish, the case is being presented in which some information comes with an accent and I must show it on the console with the accent.

I have no problem displaying accents on console with setlocale(LC_CTYPE, "C");

my problem is when reading the .txt file in the reading it does not detect the accents and reads rare characters.

my practice code is:

#include <iostream>
#include <locale.h>
#include<fstream>
#include<string>

using namespace std;

int main(){
    
    setlocale (LC_CTYPE, "C");

    ifstream file;
    string text;
    
    file.open("entryDisciplineESP.txt",ios::in);
    
    if (file.fail()){
        
        cout<<"The file could not be opened."<<endl;
        
        exit(1); 
        
    }
    
    while(!file.eof()){ 

        getline(file,text);
        
        cout<<text<<endl;
        
    }
    
    cout<<endl;
    
    system("Pause");
    return 0;
}

The .txt file in question contains:

Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

obviously I'm having problems with 'ó' but in the same way I have other .txt with other characters with accents so I need a solution for all these characters.

Researching I have read and tried to implement wstring and wifstream but I have not been able to implement that successfully.

I'm trying to achieve this on windows, the same way I need the solution to work on linux, at the moment I'm using dev c++ 5.11

Thank you very much in advance for your time and help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

美人如玉 2025-01-26 16:33:35

你的错误在于你如何控制你的读取循环。请参阅:为什么循环条件内的 !.eof() 总是错误的。 相反,用流控制读取循环-由您的读取函数返回的状态,例如

    while (getline(file,text)) {
        
        std::cout << text << '\n';
        
    }

所讨论的字符是简单的扩展 ASCII(例如 c3),并且可以轻松地用 std::string 表示并使用std::cout。您的完整示例,修复 为什么是“using namespace std;”被认为是不好的做法吗?将是

#include <iostream>
#include <fstream>
#include <string>

int main() {
    
    setlocale (LC_CTYPE, "C");

    std::ifstream file;
    std::string text;
    
    file.open ("entryDisciplineESP.txt");
    
    if (file.fail()){
        
        std::cerr << "The file could not be opened.\n";
        
        exit(1); 
    }
    
    while (getline(file,text)) {
        
        std::cout << text << '\n';
    }
    
    std::cout.put('\n');
    
#ifdef _WIN32
    system("Pause");
#endif
    return 0;
}

示例输出

$ ./bin/accent_read
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

Windows 10使用UTF-8代码页

您尝试在Windows 10控制台下运行上述代码时遇到的问题(我假设是 DevC++ 启动输出的内容),默认代码页(437 - OEM United States)不支持 UTF-8 字符。要将代码页更改为 UTF-8,您将使用 (65001 - Unicode (UTF-8))。请参阅代码页标识符

以获取正确的在 VS 下使用 C++17 语言标准编译后的输出,所需要的只是在控制台中使用 chcp 65001 更改代码页。 (您还必须有 UTF-8 字体,我的设置为 Lucida Console

设置代码页后在 Windows 控制台(命令提示符)中输出

C:\Users\david\source\repos\accents>chcp 65001
Active code page: 65001

C:\Users\david\source\repos\accents>Debug\accents.exe
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

Press any key to continue . . .

您还需要设置由于 DevC++ 自动启动控制台,因此以编程方式更改代码页。您可以使用 SetConsoleOutputCP (65001) 来完成此操作。例如:

...
#include <windows.h>
...
#define CP_UTF8 65001 

int main () {

    // setlocale (LC_CTYPE, "C");           /* not needed */
    
    /* set console output codepage to UTF-8 */
    if (!SetConsoleOutputCP(CP_UTF8)) {
        std::cerr << "error: unable to set UTF-8 codepage.\n";
        return 1;
    }
    ...

请参阅 SetConsoleOutputCP 函数。用于设置输入代码页的类似函数是SetConsoleCP(uint codepage)

使用 SetConsoleOutputCP() 进行输出

将控制台设置为默认 437 代码页,然后使用 SetConsoleOutputCP (65001) 将输出代码页设置为 UTF-8,你会得到同样的结果,例如

C:\Users\david\source\repos\accents>chcp 437
Active code page: 437

C:\Users\david\source\repos\accents>Debug\accents.exe
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

Press any key to continue . . .

此外,检查 DevC++ 项目(或程序)设置并检查是否可以在那里设置输出代码页。 (我没用过,所以不知道是否可行)。

Your error is how you control your read-loop. See: Why !.eof() inside a loop condition is always wrong. Instead, control your read-loop with the stream-state returned by your read-function, e.g.

    while (getline(file,text)) {
        
        std::cout << text << '\n';
        
    }

The character in question is simple extended ASCII (e.g. c3) and easily representable in std::string and with std::cout. Your full example, fixing Why is “using namespace std;” considered bad practice? would be

#include <iostream>
#include <fstream>
#include <string>

int main() {
    
    setlocale (LC_CTYPE, "C");

    std::ifstream file;
    std::string text;
    
    file.open ("entryDisciplineESP.txt");
    
    if (file.fail()){
        
        std::cerr << "The file could not be opened.\n";
        
        exit(1); 
    }
    
    while (getline(file,text)) {
        
        std::cout << text << '\n';
    }
    
    std::cout.put('\n');
    
#ifdef _WIN32
    system("Pause");
#endif
    return 0;
}

Example Output

$ ./bin/accent_read
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

Windows 10 Using UTF-8 Codepage

The problem you experience attempting to run the above code under Windows 10 console (which I presume is what DevC++ is launching output in), is the default codepage (437 - OEM United States) does not support UTF-8 characters. To change the codepage to UTF-8, you will use (65001 - Unicode (UTF-8)). See Code Page Identifiers

To get the proper output after compiling under VS with the C++17 language standard, all that was needed was to change the codepage using chcp 65001 in the console. (you also must have an UTF-8 font, mine is set to Lucida Console)

Output In Windows Console (Command Prompt) After Setting Codepage

C:\Users\david\source\repos\accents>chcp 65001
Active code page: 65001

C:\Users\david\source\repos\accents>Debug\accents.exe
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

Press any key to continue . . .

You have the additional need to set the codepage programmatically due to DevC++ automatically launching the console. You can do that using SetConsoleOutputCP (65001). For example:

...
#include <windows.h>
...
#define CP_UTF8 65001 

int main () {

    // setlocale (LC_CTYPE, "C");           /* not needed */
    
    /* set console output codepage to UTF-8 */
    if (!SetConsoleOutputCP(CP_UTF8)) {
        std::cerr << "error: unable to set UTF-8 codepage.\n";
        return 1;
    }
    ...

See SetConsoleOutputCP function. The analogous function for setting the input codepage is SetConsoleCP(uint codepage).

Output Using SetConsoleOutputCP()

Setting the console to the default 437 codepage and then using SetConsoleOutputCP (65001) to set output codepage to UTF-8, you get the same thing, e.g.

C:\Users\david\source\repos\accents>chcp 437
Active code page: 437

C:\Users\david\source\repos\accents>Debug\accents.exe
Inicio
D1
Biatlón
S1
255
E1
Esprint 7,5 km (M); 100; 200
E2
Persecucion 10 km (M); 100; 200
ff

Press any key to continue . . .

Also, check the DevC++ project (or program) settings and check whether you can set the output codepage there. (I don't use it, so don't know if it is possible).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文