另一个C数据类型问题

发布于 2024-08-29 06:00:45 字数 989 浏览 4 评论 0原文

嗯,我完全掌握了C语言最基本的数据类型,比如short、int、long、float,确切地说,都是数值类型。需要知道这些类型才能用正确的数字执行正确的操作。例如使用 FPU 将两个浮点数相加。所以编译器必须知道类型是什么。

但是,说到角色,我就有点不对劲了。我知道基本的 C 数据类型 char 用于 ASCII 字符编码。但我不知道的是,为什么你甚至需要另一种字符数据类型。为什么不能只使用 1 字节整数值来存储 ASCII 字符。如果调用 printf,则在调用中指定数据类型,因此您可以对 printf 说该整数代表 ASCII 字符。我不知道 cout 如何解析数据类型,但我想你可以以某种方式指定它。

另一件事是,当你想使用Unicode时,你必须使用数据类型wchar。但是,如果我想使用其他编码(例如 ISO 或 Windows 编码)而不是 UTF,该怎么办?因为 wchar 将字符编码为 UTF-16 或 UTF-32 (我阅读了它的编译器特定信息)。而且,如果我想使用一些虚构的新 8 字节文本编码,该怎么办?我应该使用什么数据类型?我实际上对此很困惑,因为我一直期望如果我想使用 UTF-32 而不是 ASCII,我只需告诉编译器“获取我输入的字符的 UTF-32 值并将其保存到 4 个字符字段中”。我认为文本编码是最后处理的,例如打印功能。我只需要指定编译器使用的编码,因为 Windows 在 win32 应用程序中不使用 ASCII,我猜 C 编译器必须将我键入的字符从 Windows 发送到 C 编辑器的类型转换为 ASCII。

最后一件事是,如果我想使用 25 字节整数进行一些高级数学运算怎么办? C 没有指定自己的数据类型。是的,我知道这会很困难,因为所有数学运算都需要更改,因为 CPU 无法将 25 字节数字加在一起。但有办法做到吗?或者有一些数学库吗?如果我想将 Pi 计算为 1000000000000000 位怎么办? :)

我知道我的问题很长,但我只是想尽我所能用英语解释我的想法,因为这不是我的母语,所以很难。我相信我的问题有一个简单的答案,我错过的东西可以解释一切。我读了很多关于文本编码、C 教程的内容,但对他的一无所知。感谢您抽出时间。

Well, I completely get the most basic datatypes of C, like short, int, long, float, to be exact, all numerical types.These types are needed to be known perform right operations with right numbers. For example to use FPU to add two float numbers. So the compiler must know what the type is.

But, when it comes to characters I am little bit off. I know that basic C datatype char is there for ASCII characters coding. But what I don´t know is, why you even need another datatype for characters. Why could not you just use 1 byte integer value to store ASCII character. If you call printf, you apecify the datatype in the call, so you could say to printf that the integer represents ASCII character. I dont know how cout resolves datatype, but I guess you could just specify it somehow.

Another thing is, when you want to use Unicode, you must use datatype wchar. But, what if I would like to use some another, for example ISO, or Windows coding instead of UTF? Becouse wchar codes characters as UTF-16 or UTF-32 (I read its compiler specific). And, what if I would want to use for example some imaginary new 8 byte text coding? What datatype should I use for it? I am actually pretty confused of this, becouse I always expected that if I want to use UTF-32 instead of ASCII, I just tell compiler "get UTF-32 value of the character I typed and save it into 4 char field." I thought that text coding is to be dealt with by the end, print function for example. That I just need to specify the coding for the compiler to use, since Windows doesent use ASCII in win32 apps, I guess C compiler must convert the char I typed to ASCII from whatever the type is that windows sends to the C editor.

And the last thing is, what if I want to use for example 25 Byte integer for some high math operations? C has no specify-yourself datatype. Yes, I know that this would be difficult since all the math operations would need to be changed, becouse CPU can not add 25 Bytes numbers together. But is there a way to do it? Or is there some math library for it? What if I want to compute Pi to 1000000000000000 digits? :)

I know my question is pretty long, but I just wanted to explain my thoughts the best I can in English, since its not my native language it is difficult. And I believe there is simple answer to my question(s), something I missed that explains everything. I read lot about text coding, C tutorials, but nothing about his. Thank you for your time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

请帮我爱他 2024-09-05 06:00:45

你的问题很广泛,我将尝试解决你提出的一些具体问题,希望它能让你更清楚地解决问题。

  • char 类型可以被视为另一种数字类型,就像 int、short 和 long 一样。写成char a=3;是完全可以的。不同之处在于,对于 char,编译器会为您提供一些附加值。除了数字之外,您还可以将 ASCII 字符分配给像 char a='U'; 这样的变量,然后该变量将获取该字符的 ASCII 值,您还可以使用文字初始化字符数组像这样的字符串:char *s="hello";.
    这并没有改变这样一个事实:毕竟 char 仍然是数字类型,而字符串只是数字数组。如果您查看字符串的内存,您将看到字符串的 ASCII 代码。

  • char 选择为 1 个字节是任意的,并且由于历史原因在 C 中很大程度上保留了这种方式。 C# 和 Java 等更现代的语言将 char 定义为 2 个字节。

  • 您不需要“另一种”类型的字符。 char 只是保存单个有符号/无符号字节的数字类型,与 short 是保存有符号 16 位字的数字类型相同。事实上,这种数据类型用于字符和字符串只是编译器提供的语法糖。 1 字节整数 == char

  • printf() 仅适用于字符,因为这是 C 的设计方式。如果它是今天设计的,它可能会与短裤搭配使用。事实上,在 Windows 中,您有一个与 Shorts 一起使用的 printf() 版本,它称为 wprintf()

  • 类型 wchar_t,在 Windows 中,它只是 short 的另一个名称。在 Windows 头文件中的某处有一个像这样的减速: typedef Short wchar_t; 这使得这种情况发生。您可以互换使用它们。使用 wchar_t 一词的优点是,任何阅读您代码的人都知道您现在想要使用字符而不是数字。另一个原因是,如果微软有机会决定现在要使用 UTF32,那么他们所需要做的就是将上面的 typedef 重新定义为 typedef int wchar_t; 就是这样(在实际上,实现起来会更加复杂,因此在可见的将来这种变化不太可能发生)

  • 如果您想使用一些非 ASCII 的 8 位编码,例如希伯来语的编码,称为“Windows” -1255”你只使用字符。有很多这样的编码,但现在使用 UNICODE 总是更好。事实上,Unicode 本身有一个适合 8 位字符串的版本,即 UTF-8。如果您正在处理 UTF-8 字符串,那么您应该使用 char 数据类型。没有什么限制它使用 ASCII,因为它只是一个数字,它可以表示任何内容。

  • 处理如此长的数字通常是使用所谓的“十进制类型”来完成的。 C 没有这个,但 C# 有。这些类型的基本思想是它们处理类似于字符串的数字。十进制表示的每个数字都使用 4 位进行保存,因此 8 位变量可以保存 0-99 范围内的数字,3 字节数组可以保存 0-999999 范围内的值,依此类推。这样您就可以保存任何范围的数字。
    这些的缺点是对它们进行计算比对普通二进制数进行计算要慢得多。
    我不确定是否有库可以在 C 中执行此类操作。使用 google 来查找。

Your question is very broad, I'll try to address some specific issues you raised, hopefully it will get you abit more sorted out.

  • The char type can be though of as just another numerical type, just like int, short and long. It is totally ok to write char a=3;. The difference is that with chars the compiler gives you some added value. instead of just numbers you can also assign ASCII characters to a variable like char a='U'; and then the variable will get the ASCII value of that character and you can also initialize arrays of character using literal strings like so: char *s="hello";.
    This doesn't change the fact that after all char is still a numeric type and a string is just an array of numbers. If you'll look at the memory of the string, you'll see the ASCII codes of the string.

  • The choice of char being 1 byte is arbitrary and is largely kept this way in C due to historical reasons. more modern languages like C# and Java define char to be 2 bytes.

  • You don't need "another" type for characters. char is just the numeric type that holds a single singed/unsigned byte the same as short is the numeric type that holds a signed 16 bit word. The fact that this data type is used for characters and strings is just syntactic sugar provided by the compiler. 1 byte integers == char.

  • printf() only works with chars since this is the way C was designed. it it was designed today it would possibly be working with shorts. Indeed in windows you have a version of printf() which works with shorts, it is called wprintf()

  • the type wchar_t, in windows, is just another name for short. somewhere in the windows header files there is a decleration like this: typedef short wchar_t; which makes this happen. You can use them interchangeably. The advantage of using the word wchar_t is that whoever reads your code knows that you now want to use characters rather than numbers. Another reason is that if there's a remote chance that sometime Microsoft will decide that now they want to use UTF32 then all they need to do is redefine the typedef above to be typedef int wchar_t; and that's it (in reality this will be quite abit more complicated to achieve so this change is unlikely in the for seeable future)

  • If you want to use some 8-bit encoding that is not ASCII, for instance the encoding for hebrew which is called "Windows-1255" you just use chars. There are many such encodings but these days using UNICODE is always preferable. Indeed there is actually a version of Unicode itself which fits in 8-bit strings that is UTF-8. If you're dealing with UTF-8 strings then you should work with the char data type. There is nothing that limits it to working with ASCII since it is just a number, it can mean anything.

  • Working with such long numbers is usually done using something called "decimal types". C doesn't have this but C# does. The basic idea of these types is that they handle a number similar to a string. Every digit of the decimal representation is saved using 4 bits so an 8 bit variable can save the numbers in the range 0-99, a 3 byte array can save values in the range of 0-999999 and so on. This way you can save numbers of any range.
    The downside to these is that making calculations on them is alot slower than making calculations on normal binary numbers.
    I am not sure if there are libraries which do this kind of thing in C. Use google to find out.

躲猫猫 2024-09-05 06:00:45

实际上,有很多语言在编译时未知变量的类型。但这确实会增加一些运行时开销。

为了回答你的第一个问题,我认为你对“char”这个名字很感兴趣。 char 类型 C 中的一个单字节整数(实际上并不完全正确 - 它是一个足够大的整数类型,可以容纳基本字符集中的任何字符,但其大小取决于实现。)请注意,您可以同时具有有符号字符和无符号字符,如果您谈论的是仅包含字符的数据类型,那么这没有多大意义。但是单字节整数在 C 中被称为“char”,因为这是它最常见的用途(再次参见上面的免责声明。)

你的问题的其余部分涵盖了很多基础 - 可能会更好将其分解为几个问题。与 char 类型一样,wchar_t 的大小取决于实现 - 唯一的要求是它足够大以容纳任何宽字符。重要的是要了解 Unicode 和一般字符编码实际上独立于 C 语言。了解字符集与字符编码不同也很重要。

这是一篇文章(我相信是 SO 的创始人之一),简要介绍了字符集和编码: http://www.joelonsoftware.com/articles/Unicode.html。一旦您更好地理解了它们的工作原理,您将能够更好地为自己提出一些问题。请注意,许多字符集(例如 Windows 代码页)只需要单个字节的存储。

Actually, there are plenty of languages where the types of variables arent known at compile-time. This does tend to add some run-time overhead though.

To answer your first question, I think you're getting hung up on the name "char". The char type is a one byte integer in C (actually not quite true- it's an integral type large enough to hold any character from the basic character set, but its size is implementation dependent.) Note that you can have both signed chars and unsigned chars, something that doesn't make a lot of sense if you're talking about a data type that only holds characters. But the one byte integer is called "char" in C because that's the most common use for it (again see disclaimer above.)

The rest of your question covers a lot of ground- might have been better to break this up into a few questions. Like the char type wchar_t's size is implementation dependent- the only requirement is that it be large enough to hold any wide character. It's important to understand that Unicode, and character encodings in general are actually independent of the C language. It's also important to understand that character sets are not the same thing as character encodings.

Here's an article (by one of SO's founders, I believe) that gives a brief intro to character sets and encodings: http://www.joelonsoftware.com/articles/Unicode.html. Once you have a better understanding of how they work you'll be in a better position to formulate some questions for yourself. Do note that a lot of character sets (the Windows code page, for instance) only require a single byte of storage.

花海 2024-09-05 06:00:45

在 C 中,char一个 1 字节整数,也用于存储字符。在 C 中,字符只是一个 1 字节整数。

而且,如果我想用于
一些虚构的新 8 字节文本示例
编码?

您必须根据编译器/硬件可用的类型自行构建它。一种方法可能是定义一个包含 8 个字符的数组的结构,并构建函数来使用您想要的所有操作来操纵该结构,

因为我一直期望如果我
我想使用 UTF-32 而不是 ASCII
只需告诉编译器“获取 UTF-32 值
我输入的字符并保存
进入 4 个字符字段。

您受限于 C 编译器的类型,这在很大程度上受到硬件(以及 C 标准 + 一些历史)的影响。 C 是一种低级语言,并没有提供太多魔力。也就是说,有一些库函数可能允许您在(某些)字符集之间进行转换,例如 mbtowc() 函数和类似的函数,它正是这样做的,您告诉它“这是 ISO8859 的 16 字节” -1 个字符,请帮我将它们转换为 UTF-16 到该缓冲区中”。

最后一件事是,如果我想要怎么办
例如使用 25 字节整数
一些高等数学运算? C没有
指定您自己的数据类型。

C 允许您定义自己的数据类型、结构。您可以在这些之上构建一个抽象。人们已经构建了这样的库,请参见此处。其他语言可能允许您更自然地对此类类型进行建模,例如 C++,它还允许您超载 +、-、* 等运算符来处理您自己的数据类型。

In C, char is an 1 byte integer, and that is also used to store a character. A character is just a 1 byte integer in C.

And, what if I would want to use for
example some imaginary new 8 byte text
coding?

You would have to build it yourself, based on the types available through your compiler/hardware. One approach could be to define a struct with an array of 8 chars, and build function to maniuplate said struct with all the operations you'd want on that,

becouse I always expected that if I
want to use UTF-32 instead of ASCII, I
just tell compiler "get UTF-32 value
of the character I typed and save it
into 4 char field.

You're limited to the types of your C compiler, which is heavily influenced by the hardware(and the C standard + a bit of history). C is a low level language, and does not provide much magic. That said, there are library functions that might allow you to translate between (some) character sets, e.g. the mbtowc() function and similar, which does exactly this, you tell it "here's 16 bytes of ISO8859-1 characters, translate them to UTF-16 into this buffer over there for me please".

And the last thing is, what if I want
to use for example 25 Byte integer for
some high math operations? C has no
specify-yourself datatype.

C lets you define your own data types, structs. You can build an abstraction on top of those. People have built libraries like this, see e.g. here . Other languages might allow you to even more naturally model such types, like C++ which also allow you to overlod operators like +,-,* etc. to work on your own data types.

回忆追雨的时光 2024-09-05 06:00:45

除了 char(及其 signedunsigned 变体)之外,(过去)没有“1 字节整数”类型。尽管 Windows NT(即不是 9x 或 ME)确实在内部使用 Unicode,但如果您这样编写,您的程序将仅使用 Unicode ——您必须使用 WCHAR 以及所有 W 版本win32 调用,或使用 TCHAR#define UNICODE

There is (was) no "1-byte integer" type other than char (and signed and unsigned variants thereof). And though Windows NT (i.e. not 9x or ME) does use Unicode internally, your program will only use Unicode if you write it that way -- you have to either use WCHAR and all of the W versions of win32 calls, or use TCHAR and #define UNICODE.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文