如何在特定字符之前插入新行 3 个空格或 2 个单词

发布于 2024-09-13 09:56:47 字数 436 浏览 2 评论 0原文

我试图为一个朋友格式化一个聊天日志,如下所示:

John Smith >你好简·多伊>嘿,你怎么样?约翰·史密斯>非常好,谢谢

,她想将其格式化为:

John Smith >你好

简·多伊>嘿,你怎么样?

约翰·史密斯>非常好,谢谢

只需在 > 之后输入新行即可不够好,因为它的格式不正确,所以我需要在“>”之前插入一个新行3个空格,或2个单词所以这个名字也被捕获了。

到目前为止我只在 > 之后有一个新行:

/usr/bin/perl -p -i -e "s/>/>\n/g" *.txt

编辑:涉及大约 20 多个不同的聊天名称,因此最好在不输入的情况下执行此操作他们所有的名字,因为它们可能有所不同,我想从练习中学习乐趣。 感谢您的阅读

I was trying to format a chat log for a friend that looks like this:

John Smith > hello Jane doe > hey how are you? John Smith > Pretty good thanks

and she wants to format it like this:

John Smith > hello

Jane doe > hey how are you?

John Smith > Pretty good thanks

Simply entering a new line after > is not good enough as it would not format correctly, so I need to insert a new line 3 white spaces, or 2 words prior to the ">" so the name is captured too.

So far I only have a new line after > :

/usr/bin/perl -p -i -e "s/>/>\n/g" *.txt

Edit: There are about 20+ different chat names involved so it would be great to do this without entering all their names since they may vary, and I'd like to learn from the exercise for fun.
Thanks for reading

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

半世蒼涼 2024-09-20 09:56:47

试试这个:

perl -p -i -e "s/(\w+\s\w+\s*>)/\n\1/g" log.txt

我用于正则表达式的测试:

[21:21:23] ~ $ echo 'John Smith > hello Jane doe > hey how are you? John Smith > Pretty good thanks Susie Someone > hi guys' > log.txt
[21:21:24] ~ $ more log.txt 
John Smith > hello Jane doe > hey how are you? John Smith > Pretty good thanks Susie Someone > hi guys
[21:21:27] ~ $ perl -p -i -e "s/(\w+\s\w+\s>)/\n\1/g" log.txt
[21:21:34] ~ $ more log.txt 

John Smith > hello 
Jane doe > hey how are you? 
John Smith > Pretty good thanks 
Susie Someone > hi guys
[21:21:37] ~ $ 

它确实在文件的开头添加了一个额外的换行符,但如果这不打扰您,那么我认为它应该可以工作。

编辑:如果有人出于某种原因在其中一条消息中使用 > 字符(无论如何,如果它前面有一个空格和两个单词),它也会失败。

Try this one:

perl -p -i -e "s/(\w+\s\w+\s*>)/\n\1/g" log.txt

Test I used for the regex:

[21:21:23] ~ $ echo 'John Smith > hello Jane doe > hey how are you? John Smith > Pretty good thanks Susie Someone > hi guys' > log.txt
[21:21:24] ~ $ more log.txt 
John Smith > hello Jane doe > hey how are you? John Smith > Pretty good thanks Susie Someone > hi guys
[21:21:27] ~ $ perl -p -i -e "s/(\w+\s\w+\s>)/\n\1/g" log.txt
[21:21:34] ~ $ more log.txt 

John Smith > hello 
Jane doe > hey how are you? 
John Smith > Pretty good thanks 
Susie Someone > hi guys
[21:21:37] ~ $ 

It does add an extra newline to the beginning of the file, but if that doesn't bother you then I think it should work.

Edit: It will also fail if someone used a > character in one of their messages for some reason (if it was preceded by a space and two words, anyway).

半步萧音过轻尘 2024-09-20 09:56:47

我知道你已经有了一个“足够好”的剧本。但我想无论如何我都会建议一种替代策略。

分两部分处理此任务。

第一部分:分析原始数据并提取用户名列表。

  • 查找 > 之前的重复单词组(长度最多为 X)。
  • 生成可能的用户名列表。

这里有人介入并批准用户名列表。

第二部分:根据用户名列表处理数据。

  • 处理文件并匹配用户名以用作分隔符

此过程的优点是您可以在最终输出中正确处理内联 > 字符。至少只要没有人输入后跟 > 的有效用户名即可。

当然代码会更复杂。增加的复杂性是否值得提高准确性取决于您的需求。

I know you've already got a script that is "good enough". But I thought I'd suggest an alternate strategy anyhow.

Handle this task in two parts.

Part one: Analyze the raw data and extract a list of user names.

  • Look for repeated word groups (of up to X length) that precede a >.
  • Generate a list of possible user names.

Here a human steps in and approves the list of user names.

Part two: Process the data based on a list of user names.

  • Process the file and match user names to use as delimiters

The advantage of this process is that you can handle inline > characters correctly in your final output. At least as long as no one types in a valid user name followed by a >.

Of course the code will be more complex. Whether the added complexity is worth the improved accuracy is dependent on your needs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文