熊猫解析文本列

发布于 2025-01-18 18:56:58 字数 552 浏览 1 评论 0原文

我有一个 csv 表,其中有一列包含聊天日志中的文本。每个文本行都遵循相同的格式:消息的人员姓名和时间(带有额外的前后空格填充),后跟消息内容。文本列的单行示例:

'  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'

我想将这个单个字符串列转换为多列(列数取决于消息数量),每条消息对应一列,如下所示:

  • Siri (下午 3:15)您好,需要什么帮助吗
  • John Wayne (下午 3:17) 今天是星期几
  • Siri (下午 3:18) 今天是星期一 Siri (下午 3:18) 今天是星期一代码>

我如何解析这段文本pandas dataframe 列将聊天日志分成单独的消息列?

I have a csv table with a column that contains the text from a chat log. Each text row follows the same format of the name of the person and time of the message (with an additional front and back space padding) followed by the message content. An example of a single row of the text column:

'  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'

I would like to transform this single string column, into multiple columns (number of columns would depend on number of messages), with one column for each individual message like below:

  • Siri (3:15pm) Hello how can I help you
  • John Wayne (3:17pm) what day of the week is today
  • Siri (3:18pm) it is Monday

How can I parse this text in a pandas dataframe column to separate the chat logs into individual message columns?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不弃不离 2025-01-25 18:56:58

如果您有此数据框:

                                                                                                                     Messages
0  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.

那么您可以执行以下操作:

x = df["Messages"].str.split(r"\s{2,}").explode()

out = (x[::2] + " " + x[1::2]).to_frame()
print(out)

打印:

                                            Messages
0            Siri (3:15pm) Hello how can I help you?
0  John Wayne (3:17pm) what day of the week is today
0                        Siri (3:18pm) it is Monday.

注意:仅当名称和文本之间有 2 个以上空格时才有效。

If you have this dataframe:

                                                                                                                     Messages
0  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.

then you can do:

x = df["Messages"].str.split(r"\s{2,}").explode()

out = (x[::2] + " " + x[1::2]).to_frame()
print(out)

Prints:

                                            Messages
0            Siri (3:15pm) Hello how can I help you?
0  John Wayne (3:17pm) what day of the week is today
0                        Siri (3:18pm) it is Monday.

Note: It only works if there 2+ spaces between the Name and Text.

眼眸印温柔 2025-01-25 18:56:58

这就是我的做法,花了我一段时间,但我们做到了!

s = pd.Series(['  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'])
s = s.str.split(r"  ", expand=True)
s = s.drop(labels=[0], axis=1)
s = s.transpose()

for i in s.index:
    list_1 = list(s[0])

odd_i = []
even_i = []
for i in range(0, len(list_1)):
    if i % 2:
        even_i.append(list_1[i])
    else :
        odd_i.append(list_1[i])

d = {'Name': odd_i, 'Message': even_i}
df = pd.DataFrame(data=d)
df

Output:
                   Name                               Message
0         Siri (3:15pm)             Hello how can I help you?
1   John Wayne (3:17pm)         what day of the week is today
2         Siri (3:18pm)                         it is Monday.

This is how I did it, took me a while but we got to it!

s = pd.Series(['  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'])
s = s.str.split(r"  ", expand=True)
s = s.drop(labels=[0], axis=1)
s = s.transpose()

for i in s.index:
    list_1 = list(s[0])

odd_i = []
even_i = []
for i in range(0, len(list_1)):
    if i % 2:
        even_i.append(list_1[i])
    else :
        odd_i.append(list_1[i])

d = {'Name': odd_i, 'Message': even_i}
df = pd.DataFrame(data=d)
df

Output:
                   Name                               Message
0         Siri (3:15pm)             Hello how can I help you?
1   John Wayne (3:17pm)         what day of the week is today
2         Siri (3:18pm)                         it is Monday.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文