C 中的缓冲区大小
当在 C 中提供缓冲区大小时,我如何知道还剩下多少以及何时需要停止使用内存?
例如,如果我正在编写的函数是这样的:
void ascii_morse (lookuptable *table, char* morse, char* ascii, int morse_size) {
}
在此应用程序中,我将传递一个字符串 (ascii),然后使用其他函数将每个 ascii 字符转换为莫尔斯电码,将其转换为莫尔斯电码。我面临的问题是如何确保我不超过缓冲区大小。我什至不知道何时使用缓冲区大小或每次使用它时如何减少它。
当然,输出将是莫尔斯电码(所以我将向莫尔斯电码添加字符串,但我想我知道该怎么做,这只是缓冲区大小对我来说很难理解)
如果您需要更多信息了解问题请告诉我,我尽力解释了。
When provided with a buffer size in C, how do I know how much is left and when do I need to stop using the memory?
For example, if the function I am writing is this:
void ascii_morse (lookuptable *table, char* morse, char* ascii, int morse_size) {
}
In this application I will be passed a string (ascii) and I will convert it to morse using some other function to convert each ascii char to morse. The problem I'm facing is how to make sure I am not exceeding the buffer size. I don't even know when to use the buffer size or how I decrease it everytime I use it.
Of course the output will be to morse (so i will be adding string to morse, but I guess I know how to do that, it is just the buffer size is what is hard to understand to me)
If you need any more information to understand the problem please tell me, I tried my best to explain it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
听起来好像对“缓冲区”有些困惑。没有缓冲区。
morse-size
告诉您已分配给morse
的内存量(从技术上讲,morse
指向的内存块)。如果莫尔斯大小为 20,则有 20 个字节。这是 19 个字节的可用空间,因为字符串以空字节终止。您可以将莫尔斯大小视为“字符串的最大长度加一”。您需要检查
morse-size
以确保写入morse
的字节数不会超过其所能容纳的字节数。莫尔斯电码只不过是一个指向内存中单个位置的数字。不是一个范围,而是一个点。分配给morse
的内容紧随其后。如果您在莫尔斯电码
中输入的内容超过此数量,您就有可能会覆盖别人的记忆。 C不会为你检查这个,这是最大性能的代价。就像你去剧院,引座员告诉你,“你可以坐 A3 和接下来的 5 个座位”,然后就离开。你要有礼貌,不要坐6个座位,别人给了A8个座位。
像 valgrind 这样的工具对于发现 C 中的内存错误并保持理智非常有用。
C 中的字符串不是很麻烦吗?欢迎来到整个计算世界中错误的最大根源。
It sounds like there's some confusion about the "buffer". There is no buffer.
morse-size
is telling you how much memory has been allocated tomorse
(technically, the chunk of memory thatmorse
points to). If morse-size is 20 then you have 20 bytes. This is 19 bytes of usable space, because strings are terminated by a null byte. You can think ofmorse-size
as "maximum length of the string plus one".You need to check
morse-size
to make sure you're not writing more bytes intomorse
than it can hold.morse
is nothing more than a number pointing to a single spot in memory. Not a range, but a single spot. What's been allocated tomorse
comes after that. If you put more than that intomorse
you risk overwriting someone else's memory. C will NOT check this for you, this is the price of maximum performance.Its like if you went to a theater and the usher tells you, "you can have seat A3 and the next 5" and then leaves. You have to be polite and not take 6 seats, somebody else was given A8.
Tools such as valgrind are invaluable to spot memory mistakes in C and keep your sanity.
Aren't strings in C a hoot? Welcome to the single largest root cause of bugs in the entire computing world.
从上面的原型来看,您已经传入了输出缓冲区的大小。
ascii
无疑是一个以 null 结尾的字符串,morse
将是输出缓冲区:morse_size
(不是morse-size
正如你所拥有的,因为这不是一个有效的标识符)将是你可以写入的字符数。伪代码类似于:
您必须将其转换为 C 语言并实现查找函数,但这应该是一个好的开始。
指针用于从相关字符串中提取或插入相关字符串。对于每个字符,您基本上检查输出缓冲区中是否有足够的空间用于添加莫尔斯电码段。最后,您还需要检查是否有足够的空间容纳字符串终止符
'\0'
;检查是否有足够空间的方法是将
morse_size
变量减少您要添加到morse
的字符串的长度每次通过循环。这样,morse_size
将始终是缓冲区中剩余的大小供您使用。You have the size of the output buffer already passed in, by the looks of that prototype above.
ascii
will no doubt be a null terminated string andmorse
will be the output buffer:morse_size
(notmorse-size
as you have it, since that's not a valid identifier) will be how many characters you are allowed to write.The pseudocode will be something like:
You'll have to convert that to C and implement the lookup function but that should be a good start.
The pointers are used to extract from, and insert into, the relevant strings. For every character, you basically check whether there is enough room left in the output buffer for adding the morse code segment. And, at the end, you also need to check there's enough room for the string terminator character
'\0'
;The way in which you check if there is enough room is by reducing the
morse_size
variable by the length of the string you're adding tomorse
each time through the loop. That way,morse_size
will always be the size remaining in the buffer for your use.您需要将缓冲区大小与指针一起传递。
缓冲区大小不一定与字符串的当前长度(可以使用 strlen 找到)相同。
上面给出的函数将读取 ascii 字符串(不需要知道缓冲区大小,因此不会通过)并写入 morse 指向的缓冲区,大小为 morse_size。它返回写入的字节数(不包括空值)。
编辑:以下是此函数的实现,虽然它无法使用正确的莫尔斯电码值,但显示了如何管理缓冲区:
You need to pass the buffer size along with the pointer.
The buffer size is not necessarily the same as the current length of the string (which you can find using strlen).
The function as given above will read the ascii string (don't need to know the buffer size, so that is not passed) and writes into a buffer pointed to by morse, of size morse_size. It returns the number of bytes written (not counting the null).
Edit: Here's an implementation of this function which, while it fails to use the right values for morse code, shows how to manage the buffer:
缓冲区大小不能仅从指针推断。它需要要么作为参数传递,要么以某种方式知道(如从 DEFINE 值或其他常量)或隐式地知道......(后一种隐式方法是“危险的”,因为如果大小以某种方式改变,但这种变化是没有反映在使用缓冲区的地方...)
或者,更典型的是在输入缓冲区(函数将从中读取的缓冲区)的情况下,缓冲区的末尾可以用特殊字符或序列来标记这样的人物。
The buffer size cannot be inferred from the pointer alone. It needs to either be passed as an argument, or be somehow know (as from DEFINE values or other constants) or implicitly known... (this latter, implicit approach is "dangerous" for if the size is somehow changed but such changes are not reflected in places where the buffer is used...)
Alternatively, and more typically in the the case of input buffers (buffers which the function will read from), the end of the buffer may be marked by a special character or a sequence of such charcters.
一种可能的(慢)解决方案是允许函数处理 NULL 缓冲区指针并返回所需的缓冲区大小。然后用适当大小的缓冲区第二次调用它
One of the possible (slow) solutions is to allow function to handle NULL buffer pointer and return the required buffer size. Then call it second time with buffer of proper size
另一个解决方案是您的函数执行分配并返回指向该字符串的指针,而不是传递要写入的预分配目标字符串。这更加安全,因为调用者不必猜测您的函数将需要多少内存。
您仍然需要为摩尔斯电码分配足够的内存。由于摩尔斯电码的长度不固定,因此有两种策略。第一个是简单地计算出给定长度字符串(最长的莫尔斯序列 * ascii 字符数)所需的最大可能内存并分配它。这可能看起来有点浪费,但无论如何,这都是调用者必须为您的原始计划做的事情。
另一种方法是使用 realloc 根据需要不断增长字符串。您计算出需要多少字节来编码下一个字符,重新分配该字节并将其附加到字符串中。这可能会慢一些,内存分配器现在相当复杂,但它会使用您需要的内存。
两者都避免了用户必须预先分配未知数量的内存的陷阱,并且都消除了不必要的“用户没有分配足够的内存”错误情况。
如果你真的想节省内存,我会将摩尔斯电码中的每个点/划存储为 2 位而不是 8 位。你有三个“单词”,短字母和长字母。这至少是 2 位空间。
Another solution is instead of passing in a pre-allocated destination string to be written to, your function does the allocation and returns a pointer to that. This is a whole lot safer as the caller doesn't have to guess how much memory your function will need.
You still have to allocate enough memory for the Morse code. Since Morse code isn't fixed length there's two strategies. The first is to simply figure out the maximum possible memory needed for the given length string (longest Morse sequence * number of characters in ascii) and allocate that. This might seem like a waste, but its what the caller will have to do for your original plan anyway.
The alternative is to use
realloc
to continually grow the string as you need it. You figure out how many bytes you need to encode the next character, reallocate that much and append it to the string. This might be slower, memory allocators are pretty sophisticated these days, but it will use exactly as much memory as you need.BOTH avoid the trap where the user has to preallocate an unknown amount of memory and BOTH eliminate the unnecessary "user didn't allocate enough memory" error condition.
If you really wanted to save memory I'd store each dot/dash in the Morse code as 2 bits rather than 8 bits. You have three "words", short and long letter break. That's a minimum of 2 bits of space.