使用 awk(或 sed)根据下一行的第一个字符删除换行符
这是我的情况:我有一个大文本文件,我想从中提取某些信息。我使用 sed 根据正则表达式提取所有相关信息,但是我提取的每条“信息”都在单独的行上,我希望每个“记录”都在自己的行上,以便可以轻松地将其导入到数据库。
这是我现在的数据示例:
92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755
理想情况下,我希望此输出如下所示:
92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755
这可能更难做到,所以我会满足于最后一个“记录”的输出只出现一次,附加的“PK...”是该行的第四个“字段”。< br> 最后,我能想到的最简单的方法是,如果该行以逗号( ^, )开头,则换行符应被删除...不过,我对 awk 不太熟悉,所以如果你能给我一个从此开始,我们将不胜感激!谢谢!
here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB.
Here's a sample of my data right now:
92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755
Ideally, I would want this output to look like:
92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
79217,999,000
,5417178
,PK90755
This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line.
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
翻译:批量读取,无需换行,只需用逗号替换换行符后面的每个逗号。
最短的代码在这里!
Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.
Shortest code here!
好吧,我想当我昨晚试图解决这个问题时,我应该仔细研究一下在 awk 中使用 Records 的情况……在查看它们 10 分钟后,我就开始工作了。对于任何感兴趣的人,我都是这样做的:
在我原来的 sed 脚本中,我在每条记录的开头放置了一个额外的换行符,因此现在有一个空行分隔每条记录。然后我使用以下 awk 命令:
就像一个魅力,完全按照我想要的方式输出!
Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this:
In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:
and it works like a charm outputting exactly the way I wanted!
无需特殊外壳字段3,简单。
有了,更复杂但仍然不太难。
Without special-casing field 3, easy.
With, more complex but still not too hard.
这可能对您有用:
说明:
这分为两部分:
追加下一行,然后如果追加的行以
,
开头,则删除嵌入的新行\n
并重新开始。如果没有打印到换行符,然后删除到新行符。重复。将第五个
,
替换为新行。然后将前四个字段插入嵌入的换行符和第六个字段之间。This might work for you:
Explanation:
This comes in two parts:
Append the next line and then if the appended line begins with a
,
, delete the embedded new line\n
and start again. If not print upto the newline and then delete upto the new line. Repeat.Replace the 5th
,
with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.