文本处理软件推荐

发布于 2024-09-27 11:45:27 字数 1618 浏览 6 评论 0原文

我需要处理文本文件以提取相关信息,以便稍后输入 R 进行统计分析。文本文件内容通常类似于下面显示的示例摘录。董事会能否就我应该为此目的使用哪种软件/编程语言提出任何建议?该软件的关键要求是:

  • 编程语法的易用性/清晰度,从每一行中提取相关信息(注意:并非所有行都包含相关信息)
  • 免费/开源
  • 可以在 Linux 和 Windows 系统上运行
  • 能够循环遍历许多行,文件夹/目录中包含许多单独的文本文件,但仅输出到一个(csv/文本)文件

示例

Full Tilt Poker Game #19911608402: Table Buggy - $0.01/$0.02 - No Limit Hold'em - 4:05:58 ET - 2010/04/08
Seat 2: BAD BeAts02 ($1.74)
Seat 3: VIVIVIVIV ($1.20)
Seat 4: pipelis ($2.87), is sitting out
Seat 5: trichinosis ($2.54)
Seat 6: Syrenski ($2)
Seat 9: evil-bunny1 ($1.20)
BAD BeAts02 posts the small blind of $0.01
VIVIVIVIV posts the big blind of $0.02
handrici sits down
pipelis stands up
Syrenski posts $0.02
The button is in seat #9
*** HOLE CARDS ***
Dealt to Syrenski [6d 3s]
handrici adds $2
trichinosis calls $0.02
Syrenski checks
pkmyers sits down
evil-bunny1 folds
BAD BeAts02 raises to $0.08
VIVIVIVIV folds
VIVIVIVIV adds $0.02
pkmyers adds $1.34
trichinosis calls $0.06
Syrenski folds
*** FLOP *** [Js 5s 8s]
pipelis sits down
BAD BeAts02 has 15 seconds left to act
BAD BeAts02 bets $0.18
AntHraX85 sits down
pipelis stands up
trichinosis folds
Uncalled bet of $0.18 returned to BAD BeAts02
BAD BeAts02 mucks
AntHraX85 adds $2
BAD BeAts02 wins the pot ($0.19)
*** SUMMARY ***
Total pot $0.20 | Rake $0.01
Board: [Js 5s 8s]
Seat 2: BAD BeAts02 (small blind) collected ($0.19), mucked
Seat 3: VIVIVIVIV (big blind) folded before the Flop
Seat 4: pipelis is sitting out
Seat 5: trichinosis folded on the Flop
Seat 6: Syrenski folded before the Flop
Seat 9: evil-bunny1 (button) didn't bet (folded)

I have a need to process text files to extract relevant information for later input into R for statistical analysis. The text file content typically looks like the example extract shown below. Can the board make any recommendations as to what software/programming language I should be looking to use for this purpose? The critical requirements for the software are:

  • ease/clarity of programming syntax to extract the relevant information from each line (note: not all lines will contain relevant information)
  • free/open source
  • can run on both Linux and Windows systems
  • ability to loop through many, many separate text files contained in a folder/directory but output to just one single (csv/text) file

EXAMPLE

Full Tilt Poker Game #19911608402: Table Buggy - $0.01/$0.02 - No Limit Hold'em - 4:05:58 ET - 2010/04/08
Seat 2: BAD BeAts02 ($1.74)
Seat 3: VIVIVIVIV ($1.20)
Seat 4: pipelis ($2.87), is sitting out
Seat 5: trichinosis ($2.54)
Seat 6: Syrenski ($2)
Seat 9: evil-bunny1 ($1.20)
BAD BeAts02 posts the small blind of $0.01
VIVIVIVIV posts the big blind of $0.02
handrici sits down
pipelis stands up
Syrenski posts $0.02
The button is in seat #9
*** HOLE CARDS ***
Dealt to Syrenski [6d 3s]
handrici adds $2
trichinosis calls $0.02
Syrenski checks
pkmyers sits down
evil-bunny1 folds
BAD BeAts02 raises to $0.08
VIVIVIVIV folds
VIVIVIVIV adds $0.02
pkmyers adds $1.34
trichinosis calls $0.06
Syrenski folds
*** FLOP *** [Js 5s 8s]
pipelis sits down
BAD BeAts02 has 15 seconds left to act
BAD BeAts02 bets $0.18
AntHraX85 sits down
pipelis stands up
trichinosis folds
Uncalled bet of $0.18 returned to BAD BeAts02
BAD BeAts02 mucks
AntHraX85 adds $2
BAD BeAts02 wins the pot ($0.19)
*** SUMMARY ***
Total pot $0.20 | Rake $0.01
Board: [Js 5s 8s]
Seat 2: BAD BeAts02 (small blind) collected ($0.19), mucked
Seat 3: VIVIVIVIV (big blind) folded before the Flop
Seat 4: pipelis is sitting out
Seat 5: trichinosis folded on the Flop
Seat 6: Syrenski folded before the Flop
Seat 9: evil-bunny1 (button) didn't bet (folded)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

飘过的浮云 2024-10-04 11:45:27

巧合的是,我也修改了手牌历史文件的解析:)
我认为最好的候选者是 python 和 perl。它们都是跨平台且开源的。
从概念上讲,程序设计很简单:它只涉及逐行输入的迭代以及应用各种正则表达式来提取信息。
您几乎可以用任何编程语言来做到这一点。 (你甚至可以在纯 R 中做到这一点,谁知道呢?)
然而,我会把票投给 Perl,因为它以作为一种出色的语言而闻名,尤其是在处理纯文本文件方面。

Coincidentially, I have tinkered with parsing of hand history files as well :)
I think the best candidates are python and perl. They are both cross-platform and open-source.
Conceptually, the program design is straightforward: it simply involves iteration over line-wise input and the application of various regular expressions in order to extract information.
And you could do that in almost any programming language. (You might even be able to do that in pure R, who knows?)
However, I would cast my vote on perl, since it is renowned for being a superb language especially for for the processing of plain text files.

同展鸳鸯锦 2024-10-04 11:45:27

看看“grep”(尝试维基百科)。

它可以在 PHP 中使用:
http://www.php.net/manual/en/function.preg -grep.php

有些桌面文本编辑器也可以执行 grep 操作。其中一些是免费的 - 例如 TextWrangler (Mac)

Have a look at 'grep' (Try Wikipedia).

It can be used in PHP:
http://www.php.net/manual/en/function.preg-grep.php

There are desktop text editors that will do grep too. Some of them are free - e.g. TextWrangler (Mac)

白馒头 2024-10-04 11:45:27

我专门为这类事情创建了一种语言,至少在最初是这样: http://www.nongnu.org/txr< /a>

I made a language specifically for this sort of thing, at least initially: http://www.nongnu.org/txr

朮生 2024-10-04 11:45:27

这个问题已经开放了一段时间,但无论如何我都会在这里发布一个代码片段。 grep 可以在 Linux 上运行,但不能在 Windows 上运行。 Perl 可以在两个平台上运行。 Linux 预装了 Perl;在 Windows 上,您需要自己安装 Perl。

假设您要提取的每一行都包含玩家的姓名(让我们使用 Syrenski),您可以执行以下操作:

perl -n -e'print if m{Syrenski}' directory/* >output.txt

-n 循环输入中的所有行,但不打印它们

print if m{Syrenski} 表示如果包含字符串“Syrenski”则打印该行

directory/* 表示处理目录下的所有文件

>output.txt 表示将输出打印到文件output.txt

This question has been open for a while, but I will post a code snippet here anyway. grep will work on Linux but not on Windows. Perl will work on both platforms. Linux has Perl pre-installed; on windows you will need to install Perl yourself.

Assuming that each line you want to extract will contain the name of the player (let's use Syrenski), you can do the following:

perl -n -e'print if m{Syrenski}' directory/* >output.txt

-n loops over all lines in the input but does not print them

print if m{Syrenski} says print the line if it contains the string 'Syrenski'

directory/* says process all files under directory

>output.txt says to print the output to the file output.txt

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文