从C中的不规则字符串中获取所有整数
我正在寻找一种(相对)简单的方法来解析随机字符串并从中提取所有整数并将它们放入数组中 - 这与其他一些类似的问题不同,因为我的字符串没有标准格式。
示例:
pt112parah salin10n m5:isstupid::42$%&%^*%7first3
我最终需要获得一个包含以下内容的数组:
112 10 5 42 7 3
并且我想要一种比逐字符遍历字符串更有效的方法。
感谢您的帮助
I am looking for a (relatively) simple way to parse a random string and extract all of the integers from it and put them into an Array - this differs from some of the other questions which are similar because my strings have no standard format.
Example:
pt112parah salin10n m5:isstupid::42$%&%^*%7first3
I would need to eventually get an array with these contents:
112 10 5 42 7 3
And I would like a method more efficient then going character by character through a string.
Thanks for your help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
一个快速的解决方案。我假设没有数字超出
long
的范围,并且没有负号需要担心。如果这些都是问题,那么您需要做更多的工作来分析strtol()
的结果,并且需要检测'-'
后跟一个数字。该代码确实循环遍历所有字符;我认为你无法避免这一点。但它确实使用
strtol()
来处理每个数字序列(一旦找到第一个数字),并从strtol()
停止处继续(并且strtol ()
很友善地告诉我们它在哪里停止转换)。输出:
A quick solution. I'm assuming that there are no numbers that exceed the range of
long
, and that there are no minus signs to worry about. If those are problems, then you need to do a lot more work analyzing the results ofstrtol()
and you need to detect'-'
followed by a digit.The code does loop over all characters; I don't think you can avoid that. But it does use
strtol()
to process each sequence of digits (once the first digit is found), and resumes wherestrtol()
left off (andstrtol()
is kind enough to tell us exactly where it stopped its conversion).Output:
比逐个字符地浏览更高效?
不可能,因为你必须查看每个字符才能知道它不是整数。
现在,考虑到您必须逐个字符地遍历字符串,我建议您简单地将每个字符转换为 int 并检查:
array 将包含您的解决方案。
More efficient than going through character by character?
Not possible, because you must look at every character to know that it is not an integer.
Now, given that you have to go though the string character by character, I would recommend simply casting each character as an int and checking that:
array will contain your solution.
只是因为我整天都在写 Python,我想休息一下。声明一个数组会很棘手。您要么必须运行它两次才能计算出您有多少个数字(然后分配数组),要么像本示例中那样一一使用数字。
请注意,“0”到“9”的 ASCII 字符为 48 到 57(即连续的)。
编辑:以前的版本没有处理 0
Just because I've been writing Python all day and I want a break. Declaring an array will be tricky. Either you have to run it twice to work out how many numbers you have (and then allocate the array) or just use the numbers one by one as in this example.
NB the ASCII characters for '0' to '9' are 48 to 57 (i.e. consecutive).
EDIT: the previous verison didn't deal with 0
另一个解决方案是使用 strtok 函数
给出:
也许不是此任务的最佳解决方案,因为您需要指定将被视为标记的所有字符。但它是其他解决方案的替代方案。
Another solution is to use the
strtok
functionGives:
Perhaps not the best solution for this task, since you need to specify all characters that will be treated as a token. But it is an alternative to the other solutions.
如果您不介意使用 C++ 而不是 C(通常没有充分的理由不这样做),那么您可以将解决方案减少到只有两行代码(使用 AX 解析器生成器):
现在测试它:
并且确定够了,你已经拿回你的号码了。
作为奖励,您在解析 unicode 宽字符串时不需要更改任何内容:
果然,您得到了相同的数字。
And if you don't mind using C++ instead of C (usually there isn't a good reason why not), then you can reduce your solution to just two lines of code (using AXE parser generator):
now test it:
and sure enough, you got your numbers back.
And as a bonus, you don't need to change anything when parsing unicode wide strings:
and sure enough, you got the same numbers back.
查找整数是通过在偏移指针上重复调用
strpbrk()
来完成的,指针再次偏移等于整数中位数的数量,通过查找以 10 为底的对数来计算的整数并加 1(当整数为 0 时有特殊情况)。计算对数时无需在整数上使用abs()
,正如您所说,整数将是非负的。如果您想提高空间效率,可以使用unsigned char integers[]
而不是int integers[]
,正如您所说,整数都将 <256 ,但这不是必需的。Finding the integers is accomplished via repeated calls to
strpbrk()
on the offset pointer, with the pointer being offset again by an amount equaling the number of digits in the integer, calculated by finding the base-10 logarithm of the integer and adding 1 (with a special case for when the integer is 0). No need to useabs()
on the integer when calculating the logarithm, as you stated the integers will be non-negative. If you wanted to be more space-efficient, you could useunsigned char integers[]
rather thanint integers[]
, as you stated the integers will all be <256, but that isn't a necessity.