Python正则表达式解析字符串并返回元组
我已经得到了一些可以使用的字符串。每个代表一个数据集,由数据集的名称和相关统计数据组成。它们都具有以下形式:
s = "| 'TOMATOES_PICKED' | 914 | 1397 |"
我正在尝试实现一个函数,该函数将解析字符串并返回数据集的名称、第一个数字和第二个数字。这些字符串有很多,每个字符串都有不同的名称和相关统计信息,因此我认为最好的方法是使用正则表达式。这是我到目前为止所得到的:
def extract_data2(s):
import re
name = re.search("'(.*?)'", s).group(1)
n1 = re.search('\|(.*)\|', s)
return name, n1
所以我已经阅读了一些正则表达式并弄清楚了如何返回名称。对于我正在使用的每个字符串,数据集的名称都以“ ”为界,这就是我找到名称的方式。那部分工作正常。我的问题是获取数字。
我现在的想法是尝试匹配前面有竖线(|
)的模式,然后是任何内容(这就是我使用.*
的原因) ,然后是另一个垂直条以尝试获取第一个数字。有谁知道我如何在 Python 中做到这一点?
我在上面的代码中对第一个数字所做的尝试基本上返回整个字符串作为我的输出,而我只想获取数字。
这个想法是,它将能够:
return name, n1, n2
以便当用户输入字符串时,它可以解析该字符串并返回重要信息。我注意到在尝试获取数字时,到目前为止它将以字符串形式返回数字。是否有办法将 n1 或 n2 作为数字返回?请注意,对于某些字符串,n1 和 n2 可以是整数,也可以是小数。
我对编程非常陌生,所以如果这个问题看起来很初级,我深表歉意,但我一直在非常努力地阅读和搜索与我的情况接近的答案,但没有运气。
I've been given some strings to work with. Each one represents a data set and consists of the data set's name and the associated statistics. They all have the following form:
s = "| 'TOMATOES_PICKED' | 914 | 1397 |"
I'm trying to implement a function that will parse the string and return the name of the data set, the first number, and the second number. There are lots of these strings and each one has a different name and associated stats so I've figured the best way to do this is with regular expressions. Here's what I have so far:
def extract_data2(s):
import re
name = re.search("'(.*?)'", s).group(1)
n1 = re.search('\|(.*)\|', s)
return name, n1
So I've done a bit of reading on regular expressions and figured out how to return the name. For each of the strings that I'm working with, the name of the data set is bounded by ' ' so that's how I found the name. That part works fine. My problem is with getting the numbers.
What I'm thinking right now is to try to match a pattern that is preceded by a vertical bar (|
), then anything (which is why I used .*
), and followed by another vertical bar to try to get the first number. Does anyone know how I can do this in Python?
What I tried in the above code for the first number returns basically the whole string as my output, whereas I want to get just the number.
The idea is that it will be able to:
return name, n1, n2
so that when the user inputs a string, it can just parse up the string and return the important information. I've noticed in my attempts to get the numbers so far that it will return the number as a string. Is there anyway to return n1 or n2 as just a number? Note that for some of the strings n1 and n2 could be either integers or have a decimal.
I am very new to programming so I apologize if this question seems rudimentary, but I have been reading and searching quite diligently for answers that are close to my case with no luck.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我将使用单个正则表达式来匹配整行,并将我想要的部分放在命名组中 (
(?Pexampl*e)
)。要将
n1
和n2
从字符串转换为数字,我使用float
函数。 (如果它们只是整数,我将使用int
函数。)我使用了
re.VERBOSE
标志和原始多行字符串 (r""".. ."""
) 使正则表达式更易于阅读。I would use a single regular expression to match the entire line, with the parts I want in named groups (
(?P<name>exampl*e)
).To convert
n1
andn2
from strings to numbers, I use thefloat
function. (If they were only integers, I would use theint
function.)I used the
re.VERBOSE
flag and raw multiline strings (r"""..."""
) to make the regex easier to read.使用正则表达式:
Using regex:
尝试使用拆分。
'
Try using split.
'
不确定我是否正确理解了你,但试试这个:
Not sure that i have correctly understood you but try this:
我必须同意其他海报,他们说在字符串上使用 split() 方法。如果你给定的字符串是,
你只需分割字符串,瞧,你现在有一个列表,其中第二个位置是名称,并且后面的条目中有两个值,即
当然你也有“|”字符,但这似乎在您的数据集中是一致的,因此这不是一个需要处理的大问题。忽略它们即可。
I would have to agree with the other posters that said use the split() method on your strings. If your given string is,
You just split the string and voila, you now have a list with the name in the second position, and the two values in the following entries, i.e.
Of course you do also have the "|" character but that seems to be consistent in your data set so it isn't a big problem to deal with. Just ignore them.
通过 pyparsing,您可以让解析器为您创建一个类似字典的结构,使用第一列值作为键,将后续值作为该键的值数组:
这已经理解了多个条目,因此您可以通过它是一个包含所有数据值的多行字符串,并且将为您构建一个单键数据结构。
(处理这种用管道分隔的表格数据是我最早的 pyparsing 应用程序之一。)
With pyparsing, you can have the parser create a dict-like structure for you, using the first column values as the keys, and the subsequent values as an array of values for that key:
This already comprehends multiple entries, so you can just pass it a single multiline string containing all of your data values, and a single keyed data structure will be built for you.
(Processing this kind of pipe-delimited tabular data was one of the earliest applications I had for pyparsing.)