/>\n/g" *.txt
Edit: There are about 20+ different chat names involved so it would be great to do this without entering all their names since they may vary, and I'd like to learn from the exercise for fun.
Thanks for reading
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(2)
试试这个:
我用于正则表达式的测试:
它确实在文件的开头添加了一个额外的换行符,但如果这不打扰您,那么我认为它应该可以工作。
编辑:如果有人出于某种原因在其中一条消息中使用
>
字符(无论如何,如果它前面有一个空格和两个单词),它也会失败。Try this one:
Test I used for the regex:
It does add an extra newline to the beginning of the file, but if that doesn't bother you then I think it should work.
Edit: It will also fail if someone used a
>
character in one of their messages for some reason (if it was preceded by a space and two words, anyway).我知道你已经有了一个“足够好”的剧本。但我想无论如何我都会建议一种替代策略。
分两部分处理此任务。
第一部分:分析原始数据并提取用户名列表。
>
之前的重复单词组(长度最多为 X)。这里有人介入并批准用户名列表。
第二部分:根据用户名列表处理数据。
此过程的优点是您可以在最终输出中正确处理内联
>
字符。至少只要没有人输入后跟>
的有效用户名即可。当然代码会更复杂。增加的复杂性是否值得提高准确性取决于您的需求。
I know you've already got a script that is "good enough". But I thought I'd suggest an alternate strategy anyhow.
Handle this task in two parts.
Part one: Analyze the raw data and extract a list of user names.
>
.Here a human steps in and approves the list of user names.
Part two: Process the data based on a list of user names.
The advantage of this process is that you can handle inline
>
characters correctly in your final output. At least as long as no one types in a valid user name followed by a>
.Of course the code will be more complex. Whether the added complexity is worth the improved accuracy is dependent on your needs.