在 C 中将制表符扩展到空格?
我需要扩展输入行中的制表符,使它们成为空格(宽度为 8 列)。我用以前的代码尝试过,我将每行超过 10 个字符的最后一个空格替换为 '\n' 以创建新行。 C中有没有办法让制表符变成8个空格以扩展它们?我的意思是我确信这很简单,但我似乎无法理解。
这是我的代码:
int v = 0;
int w = 0;
int tab;
extern char line[];
while (v < length) {
if(line[v] == '\t')
tab = v;
if (w == MAXCHARS) {
// THIS IS WHERE I GET STUCK
line[tab] = ' ';
// set y to 0, so loop starts over
w = 0;
}
++v;
++w;
}
I need to expand tabs in an input line, so that they are spaces (with a width of 8 columns). I tried it with a previous code I had replacing the last space in every line greater than 10 characters with a '\n' to make a new line. Is there an way in C to make tabs 8 spaces in order to expand them? I mean I am sure it is simple, I just can't seem to get it.
Here's my code:
int v = 0;
int w = 0;
int tab;
extern char line[];
while (v < length) {
if(line[v] == '\t')
tab = v;
if (w == MAXCHARS) {
// THIS IS WHERE I GET STUCK
line[tab] = ' ';
// set y to 0, so loop starts over
w = 0;
}
++v;
++w;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这实际上并不是一个关于 C 语言的问题;而是一个关于 C 语言的问题。这是一个关于找到正确算法的问题——您可以在任何语言中使用该算法。
无论如何,如果不重新分配
line[]
来指向更大的缓冲区,你根本无法做到这一点(除非它是一个很大的固定长度,在这种情况下你需要担心溢出);当您扩展选项卡时,您需要更多内存来存储新的、更大的行,因此您尝试执行的字符替换根本不起作用。我的建议:与其尝试就地操作(或者甚至尝试在内存中操作),我建议将其编写为过滤器 - 从 stdin 读取并一次向 stdout 写入一个字符;这样你就不需要担心内存分配或释放或行[]长度的变化。
如果此代码使用的上下文要求它在内存中操作,请考虑实现类似于
realloc()
的API,其中返回一个新指针;如果您不需要更改正在处理的字符串的长度,您可以简单地保留原始内存区域,但如果您确实需要调整它的大小,则可以使用该选项。This isn't really a question about the C language; it's a question about finding the right algorithm -- you could use that algorithm in any language.
Anyhow, you can't do this at all without reallocating
line[]
to point at a larger buffer (unless it's a large fixed length, in which case you need to be worried about overflows); as you're expanding the tabs, you need more memory to store the new, larger lines, so character replacement such as you're trying to do simply won't work.My suggestion: Rather than trying to operate in place (or trying to operate in memory, even) I would suggest writing this as a filter -- reading from stdin and writing to stdout one character at a time; that way you don't need to worry about memory allocation or deallocation or the changing length of line[].
If the context this code is being used in requires it to operate in memory, consider implementing an API similar to
realloc()
, wherein you return a new pointer; if you don't need to change the length of the string being handled you can simply keep the original region of memory, but if you do need to resize it, the option is available.您需要一个单独的缓冲区来写入输出,因为它通常比输入长:
您必须为
out
预先分配足够的空间(在最坏的情况下可能是8 * strlen (in) + 1
),并且out
不能与in
相同。编辑:根据 Jonathan Leffler 的建议,
max_len
参数现在可确保我们避免缓冲区溢出。生成的字符串将始终以 null 结尾,即使为了避免此类溢出而将其剪短。 (我还重命名了该函数,并将int
更改为size_t
以增加正确性:)。)You need a separate buffer to write the output to, since it will in general be longer than the input:
You must preallocate enough space for
out
(which in the worst case could be8 * strlen(in) + 1
), andout
cannot be the same asin
.EDIT: As suggested by Jonathan Leffler, the
max_len
parameter now makes sure we avoid buffer overflows. The resulting string will always be null-terminated, even if it is cut short to avoid such an overflow. (I also renamed the function, and changedint
tosize_t
for added correctness :).)我可能会做这样的事情:
original_size + 7 * number_of_tabs
字节的内存(其中original_size 计算空字节)。如果要就地替换而不是创建新字符串,则必须确保传入的指针指向具有足够内存来存储新字符串的位置(该位置将比原始字符串长)因为 8 个空格或 7 个字节多了一个制表符)。
I would probably do something like this:
original_size + 7 * number_of_tabs
bytes of memory (where original_size counts the null byte).If you want to do the replacement in-place instead of creating a new string, you'll have to make sure that the passed-in pointer points to a location with enough memory to store the new string (which will be longer than the original because 8 spaces or 7 bytes more than one tab).
这是一个可重入的递归版本,它自动分配正确大小的缓冲区:
如果使用
dest == NULL
调用expand_tabs()
,则该函数将返回展开的大小string,但实际上没有进行扩展;如果dest != NULL
但*dest == NULL
,则将分配正确大小的缓冲区,并且必须由程序员释放;如果dest != NULL
和*dest != NULL
,则扩展后的字符串将被放入*dest
中,因此请确保提供的缓冲区是足够大。Here's a reentrant, recursive version which automatically allocates a buffer of correct size:
If
expand_tabs()
is called withdest == NULL
, the function will return the size of the expanded string, but no expansion is actually done; ifdest != NULL
but*dest == NULL
, a buffer of correct size will be allocated and must be deallocated by the programmer; ifdest != NULL
and*dest != NULL
, the expanded string will be put into*dest
, so make sure the supplied buffer is large enough.未经测试,但类似这样的东西应该可以工作:
请注意,这是 O(n*m),其中 n 是行大小,m 是选项卡数量。这在实践中可能不是问题。
Untested, but something like this should work:
Note that this is O(n*m) where n is the line size and m is the number of tabs. That probably isn't an issue in practice.
有多种方法可以将字符串中的制表符转换为 1-8 个空格。有一些低效的方法可以进行原位扩展,但最简单的处理方法是使用一个函数来获取输入字符串和一个足以容纳扩展字符串的单独输出缓冲区。如果输入是 6 个制表符加上一个 X 和一个换行符(8 个字符 + 终止空),则输出将是 48 个空格、X 和一个换行符(50 个字符 + 终止空) - 因此您可能需要比输入缓冲区。
该测试的最大问题是很难证明它可以正确处理输出缓冲区的溢出。这就是为什么缓冲区大小有两个“#define”序列 - 实际工作的默认值非常大,压力测试的缓冲区大小可独立配置。如果源文件是 dt.c,请使用如下编译:
如果要在此文件外部使用“detab()”函数,则需要创建一个标头来包含其声明,并且当然,您可以在这段代码中包含该标头,并且该函数不会是静态的。
There are a myriad ways to convert tabs in a string into 1-8 spaces. There are inefficient ways to do the expansion in-situ, but the easiest way to handle it is to have a function that takes the input string and a separate output buffer that is big enough for an expanded string. If the input is 6 tabs plus an X and a newline (8 characters + terminating null), the output would be 48 blanks, X, and a newline (50 characters + terminating null) - so you might need a much bigger output buffer than input buffer.
The biggest trouble with this test is that it is hard to demonstrate that it handles overflows in the output buffer properly. That's why there are the two '#define' sequences for the buffer sizes - with very large defaults for real work and independently configurable buffer sizes for stress testing. If the source file is
dt.c
, use a compilation like this:If the 'detab()' function is to be used outside this file, you'd create a header to contain its declaration, and you'd include that header in this code, and the function would not be static, of course.
下面的代码将
malloc(3)
一个大小正好合适的更大缓冲区并返回扩展后的字符串。它不执行除法或模运算。它甚至还配备了一名测试驾驶员。如果使用 gcc,则使用 -Wall -Wno-括号是安全的。Here is one that will
malloc(3)
a bigger buffer of exactly the right size and return the expanded string. It does no division or modulus ops. It even comes with a test driver. Safe with -Wall -Wno-parentheses if using gcc.