如何使用Windows命令行查找文件中字符串出现的次数?
我有一个包含电子邮件地址的巨大文件,我想计算该文件中有多少个电子邮件地址。我怎样才能使用 Windows 命令行来做到这一点?
我已经尝试过,但它只打印匹配的行。 (顺便说一句:所有电子邮件都包含在一行中)
findstr /c:"@" mail.txt
I have a huge files with e-mail addresses and I would like to count how many of them are in this file. How can I do that using Windows' command line ?
I have tried this but it just prints the matching lines. (btw : all e-mails are contained in one line)
findstr /c:"@" mail.txt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
使用您所拥有的,您可以通过
find
管道传输结果。我时常看到类似的东西被使用。因此,您正在计算
findstr
命令产生的不包含垃圾字符串的行。有点黑客,但它可能对你有用。或者,只需在您确实关心的字符串上使用find /c
即可。最后,您提到每行一个地址,因此在这种情况下,上述方法有效,但每行多个地址,这会中断。Using what you have, you could pipe the results through a
find
. I've seen something like this used from time to time.So you are counting the lines resulting from your
findstr
command that do not have the garbage string in it. Kind of a hack, but it could work for you. Alternatively, just use thefind /c
on the string you do care about being there. Lastly, you mentioned one address per line, so in this case the above works, but multiple addresses per line and this breaks.为什么不简单地使用它(这决定了包含(至少)一个
@
字符的行数。):示例输出:
为了避免输出中出现文件名,请将其更改为:
示例输出:
要捕获结果数字并将其存储在变量中,请使用以下命令(在批处理文件中将
%N
更改为%%N
):Why not simply using this (this determines the number of lines containing (at least) an
@
char.):Example output:
To avoid the file name in the output, change it to this:
Example output:
To capture the resulting number and store it in a variable, use this (change
%N
to%%N
in a batch file):在 Windows 中使用 grep
非常简单的解决方案:
记住行尾有一个点!
这是更容易理解的方法:
首先 grep 仅选择“@”字符串并将每个字符串放在新行上。
第二个 grep 计算行数(或带 @ 的行数)。
grep 实用程序可以从 grep-for Windows 页面< /a>.它是非常小且安全的文本过滤器。 grep 是最有用的 Unix/Linux 命令之一,我每天在 Linux 和 Windows 中都使用它。
Windows的findstr很好,但是没有grep这样的功能。
如果您喜欢 CLI 或批处理脚本,在 Windows 中安装 grep 将是最好的决定之一。
下载和安装
grep.exe
文件放入C:\Windows
目录或使用命令echo %PATH%
获取的系统路径列表中的其他位置。仅此而已。
测试 grep 是否正常工作:
grep --help
卸载
从文件夹中删除
grep.exe
文件您放置它的位置。Using grep for Windows
Very simple solution:
Remember a dot at end of line!
Here is little bit more understandable way:
First grep selects only "@" strings and put each on new line.
Second grep counts lines (or lines with @).
The grep utility can be easy installed from grep-for Windows page. It is very small and safe text filter. The grep is one of most usefull Unix/Linux commands and I use it in both Linux and Windows daily.
The Windows findstr is good, but does not have such features as grep.
Installation of the grep in Windows will be one of the best decision if you like CLI or batch scripts.
Download and Installation
grep.exe
file to theC:\Windows
directory or another place from the system path list got using commandecho %PATH%
.That is all.
Test if grep is working:
grep --help
Uninstallation
Delete the
grep.exe
file from folder where you have placed it.可能有点晚了,但以下脚本对我有用(源文件包含引号字符,这就是我使用“usebackq”参数的原因)。
插入符号 (^) 在 Windows 批处理脚本语言中充当转义字符。
May be it's a little bit late, but the following script worked for me (the source file contained quote characters, this is why I used 'usebackq' parameter).
The caret sign(^) acts as escape character in windows batch scripting language.
我在网上找到了这个。看看是否有效:
I found this on the net. See if it works:
我会在你的系统上安装unix工具(在任何情况下都很方便:-),然后它真的很简单 - 看看这里:
使用 sed 计算字符串出现的次数?
(使用 awk:
)。
您可以在此处获取 Windows unix 工具:
http://unxutils.sourceforge.net/
I would install the unix tools on your system (handy in any case :-), then it's really simple - look e.g. here:
Count the number of occurrences of a string using sed?
(Using awk:
).
You can get the Windows unix tools here:
http://unxutils.sourceforge.net/
好的 - 太晚了,但是......似乎许多受访者都错过了所有电子邮件地址都出现在 1 行的原始规范。这意味着除非您在每次出现 @ 符号时都引入 CRLF,否则您使用 FINDSTR /c 变体的建议将无济于事。
用于 DOS 的 Unix 工具之一是功能非常强大的 SED.exe。谷歌一下。它震撼了正则表达式。这是一个建议:
说明:(假设包含数据的文件名为“Datafile.txt”)
1) 第一个 FIND 包含 3 行标头信息,这会引发行计数方法,因此将结果通过管道传输到第二个(相同)查找以去除不需要的标头信息。
2)将上述结果通过管道传递给SED,SED将搜索每个“@”字符并将其替换为自身+“\n”(这是一个“换行”,又名CRLF),它将每个“@”放在自己的行上输出流...
3) 当您将上述输出从 SED 传送到 FIND /n 命令时,您将在每行的开头添加行号。现在,您所要做的就是隔离每行的数字部分,并在其前面加上“SET /a”,以将每行转换为批处理语句,该语句(随着每行的增加)将变量设置为等于该行的编号。
4) 隔离每一行的数字部分,并按照上述方式在隔离数字前面添加:
<代码>| SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"
在上面的代码片段中,您将前面命令的输出通过管道传输到 SED,SED 使用此命令语法“s/WhatToLookFor/WhatToReplaceItWith/”,执行以下步骤:
a) 查找“[”(必须通过在其前面加上“\”来“转义”)
b) 开始保存(或“标记化”)什么接下来,直到结束“]”
c)
\(
和\)
之间的内容被“标记化”,这意味着它可以在以后引用,在“WhatToReplaceItWith”部分。第一个标记化的内容通过“\1”引用,然后第二个通过“\2”引用,等等。所以...我们忽略 [ 和 ],我们保存括号之间的数字并忽略每行的所有通配符剩余部分...因此我们用文字字符串替换该行:
Set /a NumFound=
+ 保存的或“标记化”的数字,即...第一行将显示:
Set /a NumFound=1
...&下一行内容为:
Set /a NumFound=2
等等。因此,如果您有 1,283 个电子邮件地址,您的结果将有 1,283 行。
最后执行的一项 = 重要的一项。
如果您使用“>”字符将上述所有输出重定向到批处理文件,即:
<代码>> CountChars.bat
...然后只需调用该批处理文件即可您的答案将包含一个名为“NumFound”的 DOS 环境变量。
OK - way late to the table, but... it seems many respondents missed the original spec that all email addresses occur on 1 line. This means unless you introduce a CRLF with each occurrence of the @ symbol, your suggestions to use variants of FINDSTR /c will not help.
Among the Unix tools for DOS is the very powerful SED.exe. Google it. It rocks RegEx. Here's a suggestion:
Explanation: (assuming the file with the data is named "Datafile.txt")
1) The 1st FIND includes 3 lines of header info, which throws of a line-count approach, so pipe the results to a 2nd (identical) find to strip off unwanted header info.
2) Pipe the above results to SED, which will search for each "@" character and replace it with itself+ "\n" (which is a "new line" aka a CRLF) which gets each "@" on its own line in the output stream...
3) When you pipe the above output from SED into the FIND /n command, you'll be adding line numbers to the beginning of each line. Now, all you have to do is isolate the numeric portion of each line and preface it with "SET /a" to convert each line into a batch statement that (increasingly with each line) sets the variable equal to that line's number.
4) isolate each line's numeric part and preface the isolated number per the above via:
| SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"
In the above snippet, you're piping the previous commands's output to SED, which uses this syntax "s/WhatToLookFor/WhatToReplaceItWith/", to do these steps:
a) look for a "[" (which must be "escaped" by prefacing it with "\")
b) begin saving (or "tokenizing") what follows, up to the closing "]"
c) the stuff between the
\(
and the\)
is "tokenized", which means it can be referred-to later, in the "WhatToReplaceItWith" section. The first stuff that's tokenized is referred to via "\1" then second as "\2", etc.So... we're ignoring the [ and the ] and we're saving the number that lies between the brackets and IGNORING all the wild-carded remainder of each line... thus we're replacing the line with the literal string:
Set /a NumFound=
+ the saved, or "tokenized" number, i.e....the first line will read:
Set /a NumFound=1
...& the next line reads:
Set /a NumFound=2
etc. etc.Thus, if you have 1,283 email addresses, your results will have 1,283 lines.
The last one executed = the one that matters.
If you use the ">" character to redirect all of the above output to a batch file, i.e.:
> CountChars.bat
...then just call that batch file & you'll have a DOS environment variable named "NumFound" with your answer.
这就是我的做法,使用 AND 条件和 FINDSTR (计算日志文件中的错误数):
注意:这计算的是“包含字符串匹配的行数”而不是“文件中出现的总次数”。
This is how I do it, using an AND condition with FINDSTR (to count number of errors in a log file):
NOTE: This counts "number of lines containing string match" rather than "number of total occurrences in file".
使用这个:
Use this: