fopen 适用于一切 - 这可能吗?
我曾经编写过 Windows 程序,但我想尝试制作一个跨平台应用程序。如果您不介意的话,我有一些问题:
问题 1
有没有办法打开 UNICODE\ASCII 文件并使用裸 ANSI C 自动检测它的编码。MSDN 说 fopen() 可以切换如果我使用“ccs=UNICODE”标志,则可以在各种 UNICODE 格式(utf-8、utf-16、UNICODE BI\LI)之间进行转换。通过实验发现,从 UNICODE 切换到 ASCII 并没有发生,但为了解决这个问题,我发现文本 Unicode 文件有一些前缀,如 0xFFFE、0xFEFF 或 0xFEBB。
FILE *file;
{
__int16 isUni;
file = _tfopen(filename, _T("rb"));
fread(&(isUni),1,2,file);
fclose(file);
if( isUni == (__int16)0xFFFE || isUni == (__int16)0xFEFF || isUni == (__int16)0xFEBB)
file = _tfopen(filename, _T("r,ccs=UNICODE"));
else
file = _tfopen(filename, _T("r"));
}
那么,我可以做这样的跨平台的东西,而且不那么难看吗?
问题 2
我可以在 Windows 中执行类似的操作,但是在 Linux 中可以吗?
file = fopen(filename, "r");
fwscanf(file,"%lf",buffer);
如果没有,那么是否有某种 ANSI C 函数可以将 ASCII 字符串转换为 Unicode?我想在我的程序中使用 Unicode 字符串。
问题3
此外,我需要将Unicode字符串输出到控制台。 windows中有setlocale(*),但是Linux下该怎么办呢?看来控制台已经是 Unicode 了。
问题4
一般来说,我想在我的程序中使用Unicode,但我遇到了一些奇怪的问题:
f = fopen("inc.txt","rt");
fwprintf(f,L"Текст"); // converted successfully
fclose(f);
f = fopen("inc_u8.txt","rt, ccs = UNICODE");
fprintf(f,"text"); // failed to convert
fclose(f);
PS有没有一些关于跨平台编程的好书,比较Windows和Linux程序代码?还有一些关于使用 Unicode 的方法的书,即实用方法。我不想沉浸在简单的 UNICODE BI\LI 历史中,我对特定的 C/C++ 库感兴趣。
I used to programing windows, but I want to try my hand on making a cross-platform application. And I have some questions, if you don't mind:
Question 1
Is there some way to open UNICODE\ASCII file and automatically detect it's encoding using bare ANSI C. MSDN says that fopen() can switch between various UNICODE formats (utf-8, utf-16, UNICODE BI\LI) if I will use "ccs=UNICODE" flag. It has been found experimentally that switching from UNICODE to ASCII is not happening, but trying to solve this problem, I discovered that text Unicode files has some prefixes like 0xFFFE, 0xFEFF, or 0xFEBB.
FILE *file;
{
__int16 isUni;
file = _tfopen(filename, _T("rb"));
fread(&(isUni),1,2,file);
fclose(file);
if( isUni == (__int16)0xFFFE || isUni == (__int16)0xFEFF || isUni == (__int16)0xFEBB)
file = _tfopen(filename, _T("r,ccs=UNICODE"));
else
file = _tfopen(filename, _T("r"));
}
So, can I make something like this cross-platform and not so ugly?
Question 2
I can do something like this in windows, but will it work in Linux?
file = fopen(filename, "r");
fwscanf(file,"%lf",buffer);
If not, then is there some sort of ANSI C function to convert ASCII strings to Unicode? I want to work with Unicode strings in my program .
Question 3
Besides, I need to output Unicode strings into console. There is setlocale(*) in windows, but what should I do in Linux? It seems that console is already Unicode there.
Question 4
Generally speaking, I want to work with Unicode in my program, but I faced some strange problems:
f = fopen("inc.txt","rt");
fwprintf(f,L"Текст"); // converted successfully
fclose(f);
f = fopen("inc_u8.txt","rt, ccs = UNICODE");
fprintf(f,"text"); // failed to convert
fclose(f);
P.S. Is there some good book about cross-platform programming, something with comparison of windows and linux programs code? And some book about ways of using Unicode, practical methods, that is. I don't want to immerse in plain UNICODE BI\LI history, I am interested in specific C/C++ libraries.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题 1:
是的,您可以检测字节顺序标记,这是您发现的字节序列 - 如果您的文件有一个。
在 Google 和 stackoverflow 上搜索即可完成剩下的工作。
至于“不那么难看”:您可以重构/美化您的代码,例如编写一个用于确定BOM的函数,并在开始时执行此操作,然后根据需要调用fopen或_tfopen。
然后你可以再次重构它,并编写你自己的 fopen 函数。但它仍然会很丑。
问题 2:
是的,但是 unicode 函数在 Linux 上的调用并不总是与在 Windows 上相同。
使用定义。
也许编写您自己的 TCHAR.H
问题 3:
man 3 setlocale
问题 4:
只需使用 fwprintf 即可。
另一个不是标准。
您可以使用wxWidgets工具包。
它使用 unicode,并且使用在 Windows、Linux、Unix 和 Mac 上实现相同功能的类。
对您来说更好的问题是如何将 ASCII 转换为 Unicode,反之亦然。
事情是这样的:
编辑:
以下是对 Linux 上可用 Unicode 函数 (wchar.h) 的一些了解:
Question 1:
Yes, you can detect the byte order mark, which is the byte sequence you discovered - IF YOUR FILE HAS ONE.
A search on Google and stackoverflow will do the rest.
As for the 'not so ugly': you can refactor/beautify your code, e.g. write a function for determining the BOM, and do it in the beginning, then call fopen or _tfopen as required.
Then you can refactor that again, and write your own fopen function. But it will still be ugly.
Question 2:
Yes, but the unicode functions are not always called the same on Linux as they are on Windows.
Use defines.
Maybe write your own TCHAR.H
Question 3:
man 3 setlocale
Question 4:
Just use fwprintf.
The other is not a standard.
You can use the wxWidgets toolkit.
It uses unicode, and it uses classes that have implementations for the same thing on Windows and on Linux and Unix and Mac.
The better question for you is how do you convert ASCII to Unicode and vice-versa.
That goes like this:
Edit:
Here some insight into the available Unicode functions on Linux (wchar.h):
正如我在评论中建议的那样,您应该看看 ICU 这是一个跨平台 C 库Unicode 处理,由 IBM 创建。它通过非常强大的 String 类为 C++ 和 Java 提供了额外的支持。它在 Android 和 iOS 等很多地方都有使用,所以它非常稳定和成熟。
As I suggested in a comment, you should take a look at ICU which is a cross platform C library for Unicode handling, created by IBM. It provides additional support for C++ and Java with a very powerful String class. It's used in a lot of places like Android and iOS so it's very stable and mature.