交换键和数组值对
我有一个这样布局的文本文件:
1 a, b, c
2 c, b, c
2.5 a, c
我想反转键(数字)和值(CSV)(它们由制表符分隔)以生成以下内容:(
a 1, 2.5
b 1, 2
c 1, 2, 2.5
注意 2 对于 c 来说如何不重复) )
我不需要这个确切的输出。 输入中的数字是有序的,而值则不是。 输出的键以及值都必须排序。
我怎样才能做到这一点? 我可以访问标准 shell 实用程序(awk、sed、grep...)和 GCC。 如果需要的话,我可能可以获取其他语言的编译器/解释器。
I have a text file layed out like this:
1 a, b, c
2 c, b, c
2.5 a, c
I would like to reverse the keys (the number) and values (CSV) (they are separated by a tab character) to produce this:
a 1, 2.5
b 1, 2
c 1, 2, 2.5
(Notice how 2 isn't duplicated for c.)
I do not need this exact output. The numbers in the input are ordered, while the values are not. The output's keys must be ordered, as well as the values.
How can I do this? I have access to standard shell utilities (awk, sed, grep...) and GCC. I can probably grab a compiler/interpreter for other languages if needed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果你有Python(如果你在Linux上你可能已经有)我会使用一个简短的Python脚本来做到这一点。 请注意,我们使用集合来过滤掉“双”项。
编辑以更接近请求者的要求:
If you have python (if you're on linux you probably already have) i'd use a short python script to do this. Note that we use sets to filter out "double" items.
Edited to be closer to requester's requirements:
如果你可以的话我会尝试 perl。 一次循环输入一行。 在制表符上分割线,然后在逗号上分割右侧部分。 将值推入一个关联数组,以字母为键,值作为另一个关联数组。 第二个关联数组将充当集合的一部分,以消除重复项。
读取输入文件后,根据关联数组的键进行排序,循环并输出结果。
I would try perl if that's available to you. Loop through the input a row at a time. Split the line on the tab then the right hand part on the commas. Shove the values into an associative array with letters as the keys and the value being another associative array. The second associative array will be playing the part of a set so as to eliminate duplicates.
Once you read the input file, sort based on the keys of the associative array, loop through and spit out the results.
这是 php 中的一个小实用程序:
没有真正优化或好看,但它可以工作......
here's a small utility in php:
not really optimized or good looking, but it works...
下面是一个使用 CPAN 的 Text::CSV 模块而不是手动解析 CSV 字段的示例:
请注意,它将打印到标准输出。 我建议仅重定向标准输出,如果您完全扩展此程序,请确保使用
warn()
打印任何错误,而不是仅使用print()
打印错误。 另外,它不会检查重复条目,但我不想让我的代码看起来像 Brad Gilbert 的代码,即使对于珍珠岩来说,这看起来也有点奇怪。Here is an example using CPAN's Text::CSV module rather than manual parsing of CSV fields:
Note that it will print to standard output. I recommend just redirecting standard output, and if you expand this program at all, make sure to use
warn()
to print any errors, rather than justprint()
ing them. Also, it won't check for duplicate entries, but I don't want to make my code look like Brad Gilbert's, which looks a bit wack even to a Perlite.这是 awk(1) 和 sort(1) 的答案:
您的数据基本上是一个多对多数据集,因此第一步是使用每行一个键和值来规范化数据。 我们还将交换键和值以指示新的主字段,但这并不是绝对必要的,因为下面的部分不依赖于顺序。 我们使用制表符或 [空格]、[空格] 作为字段分隔符,因此我们在键和值之间以及值之间在制表符上进行拆分。 这将在值中留下嵌入的空格,但从前后修剪它们:
然后我们要应用您的排序顺序并消除重复项。 我们使用 bash 功能来指定制表符作为分隔符 (-t $'\t')。 如果您使用的是 Bourne/POSIX shell,则需要使用“[tab]”,其中 [tab] 是文字制表符:
然后,将其放回您想要的形式:将
它们一起通过管道传输,您应该获得所需的输出。 我用 GNU 工具进行了测试。
Here's an awk(1) and sort(1) answer:
Your data is basically a many-to-many data set so the first step is to normalise the data with one key and value per line. We'll also swap the keys and values to indicate the new primary field, but this isn't strictly necessary as the parts lower down do not depend on order. We use a tab or [spaces],[spaces] as the field separator so we split on the tab between the key and values, and between the values. This will leave spaces embedded in the values, but trim them from before and after:
Then we want to apply your sort order and eliminate duplicates. We use a bash feature to specify a tab char as the separator (-t $'\t'). If you are using Bourne/POSIX shell, you will need to use '[tab]', where [tab] is a literal tab:
Then, put it back in the form you want:
Pipe them altogether and you should get your desired output. I tested with the GNU tools.