Python 中的 sscanf
我正在寻找Python中与sscanf()
等效的函数。我想解析 /proc/net/*
文件,在 CI 中可以做这样的事情:
int matches = sscanf(
buffer,
"%*d: %64[0-9A-Fa-f]:%X %64[0-9A-Fa-f]:%X %*X %*X:%*X %*X:%*X %*X %*d %*d %ld %*512s\n",
local_addr, &local_port, rem_addr, &rem_port, &inode);
我一开始想使用 str.split
,但是它不会拆分给定的字符,但 sep
字符串作为一个整体:
>>> lines = open("/proc/net/dev").readlines()
>>> for l in lines[2:]:
>>> cols = l.split(string.whitespace + ":")
>>> print len(cols)
1
应该返回 17,如上所述。
是否有相当于 sscanf 的 Python(不是 RE),或者标准库中的字符串分割函数可以分割我不知道的任何字符范围?
I'm looking for an equivalent to sscanf()
in Python. I want to parse /proc/net/*
files, in C I could do something like this:
int matches = sscanf(
buffer,
"%*d: %64[0-9A-Fa-f]:%X %64[0-9A-Fa-f]:%X %*X %*X:%*X %*X:%*X %*X %*d %*d %ld %*512s\n",
local_addr, &local_port, rem_addr, &rem_port, &inode);
I thought at first to use str.split
, however it doesn't split on the given characters, but the sep
string as a whole:
>>> lines = open("/proc/net/dev").readlines()
>>> for l in lines[2:]:
>>> cols = l.split(string.whitespace + ":")
>>> print len(cols)
1
Which should be returning 17, as explained above.
Is there a Python equivalent to sscanf
(not RE), or a string splitting function in the standard library that splits on any of a range of characters that I'm not aware of?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
还有
parse
模块。parse()
的设计与format()
(Python 2.6 及更高版本中较新的字符串格式化函数)相反。There is also the
parse
module.parse()
is designed to be the opposite offormat()
(the newer string formatting function in Python 2.6 and higher).当我使用 C 语言时,我通常使用 zip 和列表推导式来实现类似 scanf 的行为。像这样:
请注意,对于更复杂的格式字符串,您确实需要使用正则表达式:
另请注意,您需要为要转换的所有类型提供转换函数。例如,上面我使用了类似的东西:
When I'm in a C mood, I usually use zip and list comprehensions for scanf-like behavior. Like this:
Note that for more complex format strings, you do need to use regular expressions:
Note also that you need conversion functions for all types you want to convert. For example, above I used something like:
Python 没有等效的
sscanf
内置功能,大多数时候,通过直接处理字符串、使用正则表达式或使用解析来解析输入实际上更有意义工具。可能对翻译 C 最有用,人们已经实现了
sscanf
,例如在此模块中:http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/在这种特殊情况下,如果您只想根据多个分割字符分割数据,
re.split 确实是正确的工具。
Python doesn't have an
sscanf
equivalent built-in, and most of the time it actually makes a whole lot more sense to parse the input by working with the string directly, using regexps, or using a parsing tool.Probably mostly useful for translating C, people have implemented
sscanf
, such as in this module: http://hkn.eecs.berkeley.edu/~dyoo/python/scanf/In this particular case if you just want to split the data based on multiple split characters,
re.split
is really the right tool.您可以使用
re
模块分割一系列字符。You can split on a range of characters using the
re
module.您可以使用命名组<来解析模块
re
/a>.它不会将子字符串解析为其实际数据类型(例如int
),但在解析字符串时非常方便。给定来自
/proc/net/tcp
的示例行:使用变量模仿 sscanf 示例的示例可能是:
You can parse with module
re
using named groups. It won't parse the substrings to their actual datatypes (e.g.int
) but it's very convenient when parsing strings.Given this sample line from
/proc/net/tcp
:An example mimicking your sscanf example with the variable could be:
有一个 示例关于如何使用
libc
中的sscanf
的官方 Python 文档:There is an example in the official python docs about how to use
sscanf
fromlibc
:您可以将“:”转为空格,然后进行 split.eg
不需要正则表达式(对于本例)
you can turn the ":" to space, and do the split.eg
no regex needed (for this case)
您可以安装 pandas 并使用
pandas.read_fwf
用于固定宽度格式文件。使用/proc/net/arp
的示例:默认情况下,它会尝试自动找出格式,但是您可以提供一些选项以获得更明确的说明(请参阅 文档)。 pandas 中还有其他 IO 例程,它们对于其他文件格式。
You could install pandas and use
pandas.read_fwf
for fixed width format files. Example using/proc/net/arp
:By default it tries to figure out the format automagically, but there are options you can give for more explicit instructions (see documentation). There are also other IO routines in pandas that are powerful for other file formats.
如果分隔符是“:”,则可以按“:”进行拆分,然后在字符串上使用 x.strip() 来删除任何前导或尾随空格。 int() 将忽略空格。
If the separators are ':', you can split on ':', and then use x.strip() on the strings to get rid of any leading or trailing whitespace. int() will ignore the spaces.