re.split（）的问题，从字符串中提取数据（分开字符串）

发布于 2025-02-03 03:52:11 字数 753 浏览 3 评论 0原文

我一直在尝试将此字符串分开，但它只给了我想要的用户名的最后一个字符。例如

在此数据集中，我想将用户名与实际消息分开，但是执行此代码后 -

#how can we separate users from messages 
users = []
messages = []
for message in df['user_message']:
    entry = re.split('([a-zA-Z]|[0-9])+#[0-9]+\\n', message)
    if entry[1:]:
        users.append(entry[1])
        messages.append(entry[2])
    else:
        users.append('notif')
        messages.append(entry[0])
        
df['user'] = users
df['message'] = messages
df.drop(columns=['user_message'], inplace = True)

df.head(30)

我只能获取

有人可以告诉我，为什么它只给我我想拆分的字符串的最后一个字符以及如何修复它？多谢。这意味着很多

原文

I have been trying to split this string but it only gives me the last character of the username I want. for example

in this dataset I want to separate the username from the actual message but after doing this code-

#how can we separate users from messages 
users = []
messages = []
for message in df['user_message']:
    entry = re.split('([a-zA-Z]|[0-9])+#[0-9]+\\n', message)
    if entry[1:]:
        users.append(entry[1])
        messages.append(entry[2])
    else:
        users.append('notif')
        messages.append(entry[0])
        
df['user'] = users
df['message'] = messages
df.drop(columns=['user_message'], inplace = True)

df.head(30)

I only get

Could someone please tell me why it only gives me the last character of the string i want to split and how I can fix it? thanks a lot. This means a lot

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

动听の歌 2025-02-10 03:52:11

分裂并不是您在这里想要的字符串操作。相反，只需直接在user_message列上使用str.stract：

df["username"] = df["user_message"].str.extract(r'^([^#]+)')

上面的逻辑将从一开始就提取用户消息的主要部分，直到到达第一个哈希符号。

Splitting is not really the string operation you want here. Instead, just use str.extract directly on the user_message column:

df["username"] = df["user_message"].str.extract(r'^([^#]+)')

The above logic will extract the leading part of the user message, from the beginning, until reaching the first hash symbol.

回复收藏 0 原文

请远离我 2025-02-10 03:52:11

您只需使用string.split（）并将maxsplit设置为1。请参见下面的示例。

请注意，Regex非常有用，但是很容易获得错误的结果。如果您确实需要使用它，我建议使用在线正则验证器。至于实际的正则表达式，您的+位于错误的位置。您需要将其移入小组中。我使用 regex101.com 进行测试...

([a-zA-Z0-9]+)#[0-9]+\\n

string.string.split（）示例：示例：

line = "keikeo#2720\nAdded a recipient.\n\n\n"

user, message = line.split('\n', maxsplit=1)
print(user)
print(message)

You could do this a lot simpler, by just using string.split() and setting the maxsplit to 1. See the example below.

Note that regex is very useful, but it's very easy to get incorrect results with it. I advise to use a online regex validator if you really need to use it. As for the actual regex, your + is in the wrong place. You need move it inside the group. I used regex101.com for testing...

([a-zA-Z0-9]+)#[0-9]+\\n

string.split() example:

line = "keikeo#2720\nAdded a recipient.\n\n\n"

user, message = line.split('\n', maxsplit=1)
print(user)
print(message)

回复收藏 0 原文

~没有更多了~

关于作者

梅倚清风

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

re.split（）的问题，从字符串中提取数据（分开字符串）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

re.split（）的问题，从字符串中提取数据（分开字符串）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。