如何仅从给定的字符串中仅选择数字/数字,然后使用Python Regex跳过文本?

发布于 2025-02-02 03:19:13 字数 615 浏览 2 评论 0原文

给定字符串:

57年,67天30,1789

61岁,125天至1797

年57岁57岁,325 Daysmar 4,1801

57岁,353 Daysmar 4,1809

58年58岁,310 Daysmar 4,1817

在Regex101中:

模式= (? (

? > REGEX模式的输出

in python(IDE:jupyter Notebook):注册): python output 在这里,它仅显示dataFrame中的NAN值,如何解决此问题?

Given Strings:

57 years, 67 daysApr 30, 1789

61 years, 125 daysMar 4, 1797

57 years, 325 daysMar 4, 1801

57 years, 353 daysMar 4, 1809

58 years, 310 daysMar 4, 1817

In regex101:

Pattern = (?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})

Output:
Output of Regex Pattern

In Python(IDE : Jupyter Notebook) :
Python Output
Here it is showing only nan values in dataframe, how to solve this ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

末が日狂欢 2025-02-09 03:19:13

仅供参考,您的代码非常适合我,也许您的数据框中有一些空格问题:

import pandas as pd
import numpy as np

from io import StringIO

st = StringIO("""57 years, 67 daysApr 30, 1789

61 years, 125 daysMar 4, 1797

57 years, 325 daysMar 4, 1801

57 years, 353 daysMar 4, 1809

58 years, 310 daysMar 4, 1817""")

df = pd.read_csv(st, sep='\s\s\s+', header=None, engine='python')

Pattern = '(?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})'

df[0].str.extract(Pattern)

输出:

  Years Days   Month  Year
0    57   67  Apr 30  1789
1    61  125   Mar 4  1797
2    57  325   Mar 4  1801
3    57  353   Mar 4  1809
4    58  310   Mar 4  1817

FYI, your code ran perfectly for me, maybe you have some whitespace issues in your dataframe:

import pandas as pd
import numpy as np

from io import StringIO

st = StringIO("""57 years, 67 daysApr 30, 1789

61 years, 125 daysMar 4, 1797

57 years, 325 daysMar 4, 1801

57 years, 353 daysMar 4, 1809

58 years, 310 daysMar 4, 1817""")

df = pd.read_csv(st, sep='\s\s\s+', header=None, engine='python')

Pattern = '(?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})'

df[0].str.extract(Pattern)

Output:

  Years Days   Month  Year
0    57   67  Apr 30  1789
1    61  125   Mar 4  1797
2    57  325   Mar 4  1801
3    57  353   Mar 4  1809
4    58  310   Mar 4  1817
卷耳 2025-02-09 03:19:13

使用:

#Preparing data
string = """57 years, 67 daysApr 30, 1789
61 years, 125 daysMar 4, 1797
57 years, 325 daysMar 4, 1801
57 years, 353 daysMar 4, 1809
58 years, 310 daysMar 4, 1817"""
df = pd.DataFrame(string.split('\n'))

#Solution
temp = df[0].str.extractall('(?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})')

输出:

        Years   Days    Month   Year
match               
0   0   57  67  Apr 30  1789
1   0   61  125 Mar 4   1797
2   0   57  325 Mar 4   1801
3   0   57  353 Mar 4   1809
4   0   58  310 Mar 4   1817

Use:

#Preparing data
string = """57 years, 67 daysApr 30, 1789
61 years, 125 daysMar 4, 1797
57 years, 325 daysMar 4, 1801
57 years, 353 daysMar 4, 1809
58 years, 310 daysMar 4, 1817"""
df = pd.DataFrame(string.split('\n'))

#Solution
temp = df[0].str.extractall('(?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})')

Output:

        Years   Days    Month   Year
match               
0   0   57  67  Apr 30  1789
1   0   61  125 Mar 4   1797
2   0   57  325 Mar 4   1801
3   0   57  353 Mar 4   1809
4   0   58  310 Mar 4   1817
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文