使用 Perl,如何从具有两个可能的记录分隔符的文件中读取记录?
这就是我想要做的:
我想将文本文件读入字符串数组。我希望当文件读取某个字符(主要是 ;
或 |
)时字符串终止。
例如,以下文本
Would you; please hand me| my coat?
将像这样存放:
$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';
我可以得到一些关于这样的事情的帮助吗?
Here is what I am trying to do:
I want to read a text file into an array of strings. I want the string to terminate when the file reads in a certain character (mainly ;
or |
).
For example, the following text
Would you; please hand me| my coat?
would be put away like this:
$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';
Could I get some help on something like this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这样就可以了。在保留要拆分的标记的同时使用 split 的技巧是使用零宽度回溯匹配:
split(/(?<=[;|])/, ...)
。注意:mctylr 的答案(目前评价最高)实际上并不正确——它会在换行符上分割字段,因为它一次只能在文件的一行上工作。
gbacon 使用输入记录分隔符(
$/
)的答案非常聪明——它既节省空间又节省时间——但我认为我不想在生产代码中看到它。将一个分割令牌放在记录分隔符中,将另一个放在分割中,这让我觉得有点太不明显了(你必须用 Perl 来解决这个问题......),这将使其难以维护。我也不确定为什么他要删除多个换行符(我认为你没有要求?)以及为什么他只对以“|”结尾的记录的末尾这样做。This will do it. The trick to using split while preserving the token you're splitting on is to use a zero-width lookback match:
split(/(?<=[;|])/, ...)
.Note: mctylr's answer (currently the top rated) isn't actually correct -- it will split fields on newlines, b/c it only works on a single line of the file at a time.
gbacon's answer using the input record separator (
$/
) is quite clever--it's both space and time efficient--but I don't think I'd want to see it in production code. Putting one split token in the record separator and the other in the split strikes me as a little too unobvious (you have to fight that with Perl ...) which will make it hard to maintain. I'm also not sure why he's deleting multiple newlines (which I don't think you asked for?) and why he's doing that only for the end of '|'-terminated records.一种方法是注入另一个字符,例如
\n
,每当找到特殊字符时,然后 \n 上的“nofollow noreferrer">split:打印出:
更新:James 提出的原始问题将输入文本显示在一行上,如
__DATA__< 所示/代码> 上面。由于问题的格式很糟糕,其他人编辑了问题,将 1 行分成了 2 行。只有 James 知道 1 行还是 2 行是有意的。
One way is to inject another character, like
\n
, whenever your special character is found, then split on the\n
:Prints out:
UPDATE: The original question posed by James showed the input text on a single line, as shown in
__DATA__
above. Because the question was poorly formatted, others edited the question, breaking the 1 line into 2. Only James knows whether 1 or 2 lines was intended.我更喜欢 @toolic 的答案,因为它非常适合处理多个分隔符容易地。
但是,如果您想让事情变得过于复杂,您可以随时尝试:
I prefer @toolic's answer because it deals with multiple separators very easily.
However, if you wanted to overly complicate things, you could always try:
类似的东西或多或少
应该可以达到目的。
编辑:我已将“/;!/”更改为“/[;!]/”。
Something along the lines of
should do the trick more or less.
Edit: I've changed "/;!/" to "/[;!]/".
通过设置
$/
(输入记录分隔符)改为竖线,然后提取分号分隔的字段:输出:
Let Perl do half the work for you by setting
$/
(the input record separator) to vertical bar, and then extract semicolon-separated fields:Output: