cygwin中的命令行文件解析工具

发布于 2024-07-05 14:30:33 字数 439 浏览 6 评论 0原文

我必须处理多种格式的文本文件。这是一个示例（列A和B是制表符分隔的）：

A   B
a   Name1=Val1, Name2=Val2, Name3=Val3
b   Name1=Val4, Name3=Val5
c   Name1=Val6, Name2=Val7, Name3=Val8

文件可以有标题，也可以没有标题，有混合的分隔方案，有带有上面的名称/值对的列等.
我经常临时需要以各种方式从此类文件中提取数据。例如，从上面的数据中，我可能想要与 Name2 相关的值（如果它存在）。即，

A   B
a   Val2
c   Val7

有哪些工具/技术可以执行诸如一行命令之类的操作，以上述为例但可以扩展到其他情况？

原文

I have to deal with text files in a motley selection of formats. Here's an example (Columns A and B are tab delimited):

A   B
a   Name1=Val1, Name2=Val2, Name3=Val3
b   Name1=Val4, Name3=Val5
c   Name1=Val6, Name2=Val7, Name3=Val8

The files could have headers or not, have mixed delimiting schemes, have columns with name/value pairs as above etc.
I often have the ad-hoc need to extract data from such files in various ways. For example from the above data I might want the value associated with Name2 where it is present. i.e.

A   B
a   Val2
c   Val7

What tools/techniques are there for performing such manipulations as one line commands, using the above as an example but extensible to other cases?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜宝宝 2024-07-12 14:30:33

您可以使用所有基本的 bash shell 命令，例如 grep、cut、sed 和 awk。您还可以使用 Perl 或 Ruby 来处理更复杂的事情。

回复收藏 0 原文

旧时模样 2024-07-12 14:30:33

据我所知，我会从 Awk 开始处理这类事情，然后如果你需要更复杂的东西，我会转向 Python。

回复收藏 0 原文

浅语花开 2024-07-12 14:30:33

我会使用 sed：

   # print section of file between two regular expressions (inclusive)
   sed -n '/Iowa/,/Montana/p'             # case sensitive

I would use sed:

   # print section of file between two regular expressions (inclusive)
   sed -n '/Iowa/,/Montana/p'             # case sensitive

回复收藏 0 原文

难以启齿的温柔 2024-07-12 14:30:33

既然你有 cygwin，我就选择 Perl。这是最容易学习的（查看 O'Reily 的书：学习 Perl) 并广泛适用。

回复收藏 0 原文

很酷又爱笑 2024-07-12 14:30:33

我会使用 Perl。编写一个小模块（或多个）来处理不同的格式。然后您可以使用该库运行 perl oneliners。举例说明它会发生什么
如下所示：

perl -e 'use Parser;' -e 'parser("in.input").get("Name2");'

不要在语法上引用我，但这就是总体思路。将手头的任务抽象出来，让你思考需要做什么，而不是如何做。 Ruby 是另一种选择，它往往具有更清晰的语法，但任何一种语言都可以。

I would use Perl. Write a small module (or more than one) for dealing with the different formats. You could then run perl oneliners using that library. Example for what it would
look like as follows:

perl -e 'use Parser;' -e 'parser("in.input").get("Name2");'

Don't quote me on the syntax, but that's the general idea. Abstract the task at hand to allow you to think in terms of what you need to do, not how you need to do it. Ruby would be another option, it tends to have a cleaner syntax, but either language would work.

回复收藏 0 原文

小梨窩很甜 2024-07-12 14:30:33

我不太喜欢 sed ，但它适用于这样的事情：

var="Name2";sed -n "1p;s/\([^ ]*\) .*$var=\([^ ,]*\).*/\1 \2/p" < filename

给你：

 A B
 a Val2
 c Val7

I don't like sed too much, but it works for such things:

var="Name2";sed -n "1p;s/\([^ ]*\) .*$var=\([^ ,]*\).*/\1 \2/p" < filename

Gives you:

 A B
 a Val2
 c Val7

回复收藏 0 原文

~没有更多了~

关于作者

我做我的改变

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

cygwin中的命令行文件解析工具

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

cygwin中的命令行文件解析工具

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。