平面文件数据分析

发布于 2024-11-05 11:57:50 字数 448 浏览 14 评论 0原文

我有一个由以下结构组成的平面文件：

A1 B1 C1 D1 E1 F1 G1  
A2 B2 C2 D2 E2 F2 G2  
A3 B3 C3 D3 E3 F3 G3

该文件大约有100 万行。

我想生成以下统计信息：

文件中的行数。
特定行中的唯一记录数（例如B）。
按行排序 F 并创建一个包含该行中前 n 条记录的文件。

进行此分析的最佳方法是什么？我目前使用的是 Mac OSX，因此首选 Linux/Mac 解决方案。

原文

I have a flat file that consists of the following structure:

A1 B1 C1 D1 E1 F1 G1  
A2 B2 C2 D2 E2 F2 G2  
A3 B3 C3 D3 E3 F3 G3

This file has around 1 million rows.

I would like to generate the following statistics:

Number of rows in the file.
Number of unique records in a particular row (e.g. B).
Sort by row F and create a file containing the top n records in that row.

What would be the best way of doing this analysis? I'm currently using Mac OSX, so a Linux/Mac solution would be preferred.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

轻许诺言 2024-11-12 11:57:50

在 bash（你的 mac 命令行 shell）中很容易做到。

像这样的东西：

# 1. row count
wc -l filename

# 2. uniq count in col 1
cut -d " " -f 1 <filename> | sort | uniq | wc -l

# 3. top n uniq values in col 6, and their counts
cut -d " " -f 6 <filename> | sort | uniq -c | sort -nr | head -n <numrows>

Pretty easy to do in bash (your mac command line shell).

Something like:

# 1. row count
wc -l filename

# 2. uniq count in col 1
cut -d " " -f 1 <filename> | sort | uniq | wc -l

# 3. top n uniq values in col 6, and their counts
cut -d " " -f 6 <filename> | sort | uniq -c | sort -nr | head -n <numrows>

回复收藏 0 原文

~没有更多了~