在逗号之间删除白色字符,而不是在逗号内部的内容之间
我是C的新手,学习C90。 我正在尝试将字符串解析到命令中,但是我很难删除白色字符。
我的目标是解析这样的字符串:
NA ME, NAME , 123 456, 124 , 14134, 134. 134 , 1
在此:
NA ME,NAME,123 456,124,14134,134. 134,1
因此,参数内的白色炭仍然存在,但其他白色炭被删除。
我考虑过使用strtok,但是即使有多个连续的逗号,我仍然想保留逗号。
到目前为止,我使用了:
void removeWhiteChars(char *s)
{
int i = 0;
int count = 0;
int inNum = 0;
while (s[i])
{
if (isdigit(s[i]))
{
inNum = 1;
}
if (s[i] == ',')
{
inNum = 0;
}
if (!isspace(s[i]) && !inNum)
s[count++] = s[i];
else if (inNum)
{
s[count++] = s[i];
}
++i;
}
s[count] = '\0'; /* adding NULL-terminate to the string */
}
但是它仅跳过数字,并且在逗号之后才在数字之后删除白色字符,这是完全错误的。
我很感谢任何帮助,我已经陷入困境了两天。
I'm new to C and learning C90.
I'm trying to parse a string into a command, But I have a hard time trying to remove white chars.
My goal is to parse a string like this:
NA ME, NAME , 123 456, 124 , 14134, 134. 134 , 1
into this:
NA ME,NAME,123 456,124,14134,134. 134,1
so the white chars that were inside the arguments are still there, but the other white chars are removed.
I thought about using strtok, but I still want to keep the commas, even if there are multiple consecutive commas.
Until now I used:
void removeWhiteChars(char *s)
{
int i = 0;
int count = 0;
int inNum = 0;
while (s[i])
{
if (isdigit(s[i]))
{
inNum = 1;
}
if (s[i] == ',')
{
inNum = 0;
}
if (!isspace(s[i]) && !inNum)
s[count++] = s[i];
else if (inNum)
{
s[count++] = s[i];
}
++i;
}
s[count] = '\0'; /* adding NULL-terminate to the string */
}
But it only skips for numbers and does not remove white chars after the number until the comma, and it's quite wrong.
i would appreciate any kind of help, I'm stuck on this one for two days now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
每当您遇到可能的可跳动空间时,都需要进行lookaheads。下面的功能,每次看到一个空间时,都会向前检查是否以逗号结尾。同样,对于每个逗号,它都会检查并删除所有以下空间。
You need to do lookaheads whenever you encounter possible skippable whitespace. The function below, every time it sees a space, checks forward if it ends with a comma. Likewise, for every comma, it checks and removes all following spaces.
解决任何解析问题的一种简短而可靠的方法是使用 state-loop ,这无非是对原始字符串中所有字符的循环,其中您使用一个(或更多)标志变量跟踪您需要跟踪的任何事物的状态。在您的情况下,您需要知道您是否正在阅读帖子(之后)的状态。
这控制了您如何处理下一个字符。您将使用一个简单的计数器变量来跟踪所读的空间数量,当您遇到下一个字符时,如果您不是comma后,则将该数量的空间附加到新字符串中。如果您是后司,则丢弃缓冲空间。 (您可以将遇到
','''
本身用作不需要保存在变量中的标志)。要删除
',''
定界符周围的空格,您可以编写rmdelimws()
函数,该功能将新的字符串填充和旧字符串从AS AS AS AS AS AS CRAMENTS复制并进行操作类似的内容:(注:
如果您想将落后的空间保存在行中(例如,在示例中结束
1
之后的空格),您可以在nul终止上面的字符串之前添加以下内容:将其放在一起是一个简短的示例,您将拥有:
示例使用/输出
A short and reliable way to approach any parsing problem is to use a state-loop which is nothing more than a loop over all the characters in your original string where you use one (or more) flag variables to keep track of the state of anything you need to track. In your case here, you need to know the state of whether you are reading post (after) the comma.
This controls how you handle the next character. You will use a simple counter variable to keep track of the number of spaces you have read, and when you encounter the next character, if you are not post-comma, you append that number of spaces to your new string. If you are post-comma, you discard the buffered spaces. (you can use encountering the
','
itself as a flag that need not be kept in a variable).To remove spaces around the
','
delimiter, you can write armdelimws()
function that takes the new string to fill and the old string to copy from as arguments and do something similar to:(note: updated to affirmatively nul-terminate if
newstr
wasn't initialized all zero as shown below)If you want to save the trailing whitespace in the line (e.g. spaces after the ending
1
in your example), you can add the following before nul-terminating the string above:Putting it together is a short example you would have:
Example Use/Output
下面的作品,至少对于输入字符串。我绝对没有关于其效率或优雅的主张。我没有尝试修改
s
,而是写入新字符串。我遵循的算法是:startpos
to 0。s
,直到找到逗号为止。memcpy
从startpos
到该位置到新字符串。startpos
。,用
strcat
附加最终令牌,我还只用您的输入字符串对其进行测试,它可能会破坏其他情况。示范
Below works, at least for your input string. I make absolutely no claims as to its efficiency or elegance. I did not try to modify
s
in place, instead wrote to a new string. The algorithm I followed was:startPos
to 0.s
until you find a comma.memcpy
fromstartPos
to that position to a new string.startPos
.strcat
I have also only tested it with your input string, it may break for other cases.
Demonstration
您可以使用状态计算机在
o(n)
中对其进行修改。在此示例中,我使用 re2c 进行设置并为我保留状态。这是通过懒惰来起作用的。也就是说,在逗号或字符串的末端,我们可以确定字符串的写作,直到我们确定它已完成。这很简单,属于常规语言,没有lookahead。它保留了它们之间没有逗号的单词之间的空格。它还覆盖字符串,因此不使用额外的空间;我们可以这样做,因为编辑仅涉及删除。
You can modify this in place in
O(n)
using a state machine. In this example, I've used re2c to set-up and keep the state for me.This works by being lazy; that is, differing the writing of the string until we can be sure it's complete, either at a comma or the end of the string. It's quite simple, belonging to a regular language, without lookahead. It preserves whitespace between words that don't have commas between them. It also overwrites the string, so it doesn't use extra space; we can do this because the edits only involve deletion.
请尝试以下操作:
Please try this:
这是一种可能的算法。此处介绍的不一定是优化的,但存在于证明算法的一种可能实现。这是故意的部分抽象的。
以下是一种非常强大的O(n)时间算法,您可以用来修剪空格(如果您概括并扩展它)。
尚未确认此实现的工作原理,但是
。 ''} 或
{char_in_alphabet,''}
,您开始链,一个代表当前执行路径的值。当您看到任何其他字符时,如果检测到第二个序列,则该链应破裂,反之亦然。我们将定义一个函数:我们定义了一个已知所有执行路径的确定算法,因此对于每个可能的执行状态,您应该使用在函数中定义的可读性中定义的枚举来分配一个数字值,从零开始线性地增加。 ,然后切换语句(除非goto和标记更好地模拟算法的行为):
可能的执行状态的数量等于
2 ** ceil(log_2(n))
其中n < /代码>是与当前算法的操作相关的实际执行状态数。您应该在Switch语句中明确命名并为它们命名。
在
默认
案例中,我们只检查逗号和“法律”字符。如果以前的字符是逗号或法律角色,并且当前字符是一个空间,那么我们希望将状态设置为whitespace_chain
。在
whitespace_chain
案例中,我们根据我们开始的字符是逗号还是法律角色来测试当前链是否可以修剪当前链。如果可以修剪当前字符,则简单地跳过,我们转到下一个迭代,直到我们根据所需的内容达到另一个逗号或法律字符,然后将执行状态设置为。如果我们确定该链是不可限制的,则将所有跳过的所有字符添加并将执行状态设置回
默认
。循环应该看起来像这样:
请同时使用whitespace一词来形容这些字符,因为它们通常被称为“白色字符”。
Here is one possible algorithm. It is not necessarily well-optimized as presented here, but exists to demonstrate one possible implementation of an algorithm. It is intentionally partially abstract.
The following is a very robust O(n) time algorithm you may use to trim whitespace (among other things if you generalize and extend it).
This implementation has not been verified to work as-is, however.
You should track the previous character and relevant spaces so that if you see
{ ',', ' ' }
or{ CHAR_IN_ALPHABET, ' '}
, you begin a chain, and a value representing the current path of execution. When you see any other character, the chain should break if the first sequence, and vice versa if the second sequence is detected. We'll be defining a function:We are defining a definite algorithm in which all execution paths are known, so for each unique possible state of execution, you should assign a numeric value increasing linearly beginning from zero using enums defined within the function for readability, and switch statements (unless goto and labels better models the behavior of the algorithm):
The number of possible execution states is equal to
2**ceil(log_2(n))
wheren
is the number of actual execution states relevant to the operation of the current algorithm. You should explicitly name them and make cases for them in the switch statement.In the
DEFAULT
case, we're only checking for commas and "legal" characters. If the previous character was a comma or legal character, and the current character is a space, then we want to set the state toWHITESPACE_CHAIN
.In the
WHITESPACE_CHAIN
case, we test if the current chain can be trimmed based on whether the character we began with was a comma or legal character. If the current character can be trimmed, it is simply skipped and we go to the next iteration until we hit another comma or legal character depending on what we're looking for, then set the execution state toDEFAULT
. If we determine this chain to not be trimmable, then we add all the characters we skipped and set the execution state back toDEFAULT
.The loop should look something like this:
Please kindly also use the word whitespace to describe these characters as that is what they are commonly known as, not "white chars".