读取.txt文件并分析
我正在对任何 .txt 文件进行霍夫曼编码,所以首先我需要分析这个文本文件。我需要阅读它,然后分析。 我需要“退出”,如表:
letter |频率(相同的后者重复了多少次)|霍夫曼代码(稍后会出现)
我开始于:
f = open('test.txt', 'r') #open test.tx
for lines in f:
print lines #to ensure if all work...
如何按字母顺序从文件中读取字符:
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
???非常感谢,
Well I tried like this:
frequencies = collections.defaultdict(int)
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
frequencies = [(count, char) for char, count in frequencies.iteritems()]
frequencies.sort(key=operator.itemgetter(1))
但编译器返回给我一个“错误” 在此处输入代码
我需要这个字母顺序在for循环中,而不是在频率末尾...
I'm working Huffman coding of any .txt file, so first I need to analyse this text file. I need to read it, then analyse.
I need "exit" like table:
letter | frequency(how many times same latter repeated) | Huffman code(this will come later)
I started with:
f = open('test.txt', 'r') #open test.tx
for lines in f:
print lines #to ensure if all work...
How can I order reading characters from file in alphabetic order:
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
???Many thanks
Well I tried like this:
frequencies = collections.defaultdict(int)
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
frequencies = [(count, char) for char, count in frequencies.iteritems()]
frequencies.sort(key=operator.itemgetter(1))
But compiler return me an "error"
enter code here
I need this alphabetic order in for loop, not at end at frequencies...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要获取频率表,我将使用
defaultdict
。这只会迭代数据一次。
To get your table of frequencies, I would use a
defaultdict
. This will only iterate over the data once.我使用
collections.Counter()
制作了这个解决方案:正则表达式
is_letter
用于仅过滤我们感兴趣的字符。它给出的输出看起来像这样。
I made this solution using a
collections.Counter()
:The regular expression
is_letter
is used to filter for only the characters we are interested in.It gives output that looks like this.