查找并替换数字(可以有小数点)id

发布于 2025-01-20 07:04:21 字数 1737 浏览 1 评论 0原文

我有一堆数字ID,我需要

id="12.03"

id="23.343.Fdf--"

id="12-B.fdas7232"

id="12."

id="1."

id="1.-2"

id="2.02-R.-vdfs--erev-j"

id="48-34JJf"

id="5.01-G.f"

使用此正则拨号来使用新的数字ID编号:

 id="[1-9]\d*(\.\d+)?

at https://regexr.com/ < /a>,我能够获得正确的匹配

但是,当我运行Python脚本时,我认为这与捕获返回太多值的群体有关。

以下是打印输出的两个示例:(

'id =“ 5.01','id =“','5.01','.01') ('id =“ 48','id =“','48','')

我不知道如何阻止它在上述两个示例中返回4个值'.01'或''。

我遇到了这个错误:太多的值无法打开包装(预期3)

我尝试了几种不同的正则正则差异来尝试使其返回单个字符串,例如添加其他括号, ^和$以标记字符串的起点和结尾等。

    PID_REPLACEMENTS = {
    "48":'9',
    "23.343":'8',
    "12.03":'7',
    "12":'6',
    "5.01":'5',
    "2.02":'4',
    "1":'3.08'}

    my_text = substitute_oldid_index(my_text)

def substitute_oldid_index(my_text):
    return substitute_newid(r"""((?P<pre> id=")(?P<post>[1-9]\d*(\.\d+)?))""", my_text)


def substitute_newid (findallnewid_regex, my_text):
    data_oldids = re.findall(findallnewid_regex, my_text, re.I)

    print(data_oldids)

    for combined, pre, post in data_oldids:
    if post.title() not in PID_REPLACEMENTS:
        continue

    my_text = re.sub(combined, "{}{}".format(pre, PID_REPLACEMENTS[post.title()]), my_text)

    return my_text

有一种更好的方法来查找数字ID(可能包含小数点点以及后的其他期间或文本,应该保持静态),并用新的数字ID替换它们包含小数点)?我认为我们想以相反的时间顺序进行操作,以便较低的数字不会超过一次?

有没有办法解决我的正则和脚本以实现此目标?


作为后续问题,我在电子表格中有很多范围,需要转换为新的ID号。

示例1: 5.01-48; 151.01-168; 224-382; 415-510; 218-249

示例2: 128-211; 257-281; 386-401

是否可以搜索这些数字并用新数字替换它们?

例如,查找5.01并将其替换为上述字典中的5个

I have a bunch of numeric IDs I need to number with new numeric IDs

id="12.03"

id="23.343.Fdf--"

id="12-B.fdas7232"

id="12."

id="1."

id="1.-2"

id="2.02-R.-vdfs--erev-j"

id="48-34JJf"

id="5.01-G.f"

Using this regex:

 id="[1-9]\d*(\.\d+)?

at https://regexr.com/, I am able to get the correct matches.

However, when I run the python script, I think it has to do with capturing groups returning too many values.

Here are two examples of the printed output:

(' id="5.01', ' id="', '5.01', '.01')
(' id="48', ' id="', '48', '')

I don't know how to stop it from returning the 4th value '.01' or '' in the above 2 examples.

I get this error: too many values to unpack (expected 3)

I've tried several different Regex variations to try to get it to return a single string, like adding additional parentheses, ^ and $ to mark the beginning and end of the string, etc.

    PID_REPLACEMENTS = {
    "48":'9',
    "23.343":'8',
    "12.03":'7',
    "12":'6',
    "5.01":'5',
    "2.02":'4',
    "1":'3.08'}

    my_text = substitute_oldid_index(my_text)

def substitute_oldid_index(my_text):
    return substitute_newid(r"""((?P<pre> id=")(?P<post>[1-9]\d*(\.\d+)?))""", my_text)


def substitute_newid (findallnewid_regex, my_text):
    data_oldids = re.findall(findallnewid_regex, my_text, re.I)

    print(data_oldids)

    for combined, pre, post in data_oldids:
    if post.title() not in PID_REPLACEMENTS:
        continue

    my_text = re.sub(combined, "{}{}".format(pre, PID_REPLACEMENTS[post.title()]), my_text)

    return my_text

Is there a better way to find numeric IDs (that may contain decimal points and additional periods or text after them that should remain static) and replace them with new numeric IDs (that may or may not contain decimal points)? I assume we want to do it in reverse chronological order so that lower numbers aren't found more than once?

Is there a way to fix my regex and script to achieve this goal?


As a follow-up question, I have a bunch of ranges in a spreadsheet that needs conversion to new ID numbers.

EXAMPLE 1:
5.01-48; 151.01-168; 224-382; 415-510; 218-249

EXAMPLE 2:
128-211; 257-281; 386-401

Is there a way to search these numbers and replace them with a new number?

For example, find 5.01 and replace it with 5 as above from the dictionary

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ˉ厌 2025-01-27 07:04:21

我认为,您的匹配前后,您的困难比需要更难。为什么不只是寻找数字,可选的是点和数字,如果该集合在您的列表中,请替换呢?这样做:

import re

PID_REPLACEMENTS = {
"48":'9',
"23.343":'8',
"12.03":'7',
"12":'6',
"5.01":'5',
"2.02":'4',
"1.":'3.08'}

sample = """
id="12.03"         12.03
id="23.343.Fdf--"  23.343
id="12-B.fdas7232"
id="12."           12.
id="1."            1.
id="1.-2"
id="2.02-R.-vdfs--erev-j"
id="48-34JJf"
id="5.01-G.f"      5.01
id="[1-9]\d*(\.\d+)?
EXAMPLE 1: 5.01-48; 151.01-168; 224-382; 415-510; 218-249
EXAMPLE 2: 128-211; 257-281; 386-401
"""

def subst(m):
    m = m.group(0)
    return PID_REPLACEMENTS.get(m,m)

def substitute_newid(my_text):
    return re.sub('(?<=id=")\d+(\.\d*)?', subst, my_text)

print( substitute_newid(sample) )
"""

def subst(m):
    m = m.group(0)
    return PID_REPLACEMENTS.get(m,m)

def substitute_newid(my_text):
    return re.sub('(?<=id=")\d+(\.\d*)?', subst, my_text)

print( substitute_newid(sample) )

输出:


id="7"         12.03
id="8.Fdf--"  23.343
id="6-B.fdas7232"
id="12."           12.
id="3.08"            1.
id="3.08-2"
id="4-R.-vdfs--erev-j"
id="9-34JJf"
id="5-G.f"      5.01
id="[1-9]\d*(\.\d+)?
EXAMPLE 1: 5.01-48; 151.01-168; 224-382; 415-510; 218-249
EXAMPLE 2: 128-211; 257-281; 386-401

I think you're making this harder than it needs to be, with the pre- and post-matches. Why not just look for digits, optionally followed by a dot and digits, and if that set is in your list, replace it? This does that:

import re

PID_REPLACEMENTS = {
"48":'9',
"23.343":'8',
"12.03":'7',
"12":'6',
"5.01":'5',
"2.02":'4',
"1.":'3.08'}

sample = """
id="12.03"         12.03
id="23.343.Fdf--"  23.343
id="12-B.fdas7232"
id="12."           12.
id="1."            1.
id="1.-2"
id="2.02-R.-vdfs--erev-j"
id="48-34JJf"
id="5.01-G.f"      5.01
id="[1-9]\d*(\.\d+)?
EXAMPLE 1: 5.01-48; 151.01-168; 224-382; 415-510; 218-249
EXAMPLE 2: 128-211; 257-281; 386-401
"""

def subst(m):
    m = m.group(0)
    return PID_REPLACEMENTS.get(m,m)

def substitute_newid(my_text):
    return re.sub('(?<=id=")\d+(\.\d*)?', subst, my_text)

print( substitute_newid(sample) )
"""

def subst(m):
    m = m.group(0)
    return PID_REPLACEMENTS.get(m,m)

def substitute_newid(my_text):
    return re.sub('(?<=id=")\d+(\.\d*)?', subst, my_text)

print( substitute_newid(sample) )

Output:


id="7"         12.03
id="8.Fdf--"  23.343
id="6-B.fdas7232"
id="12."           12.
id="3.08"            1.
id="3.08-2"
id="4-R.-vdfs--erev-j"
id="9-34JJf"
id="5-G.f"      5.01
id="[1-9]\d*(\.\d+)?
EXAMPLE 1: 5.01-48; 151.01-168; 224-382; 415-510; 218-249
EXAMPLE 2: 128-211; 257-281; 386-401
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文