读取.txt文件并分析

发布于 2024-10-06 05:43:18 字数 893 浏览 4 评论 0原文

我正在对任何 .txt 文件进行霍夫曼编码,所以首先我需要分析这个文本文件。我需要阅读它,然后分析。 我需要“退出”,如表:


letter |频率(相同的后者重复了多少次)|霍夫曼代码(稍后会出现)


我开始于:

 f = open('test.txt', 'r')    #open test.tx
 for lines in f:
     print lines          #to ensure if all work...

如何按字母顺序从文件中读取字符:

with open("test.txt") as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1

???非常感谢,


Well I tried like this:
frequencies = collections.defaultdict(int)
with open("test.txt") as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1


 frequencies = [(count, char) for char, count in frequencies.iteritems()]
 frequencies.sort(key=operator.itemgetter(1))

但编译器返回给我一个“错误” 在此处输入代码

我需要这个字母顺序在for循环中,而不是在频率末尾...

I'm working Huffman coding of any .txt file, so first I need to analyse this text file. I need to read it, then analyse.
I need "exit" like table:


letter | frequency(how many times same latter repeated) | Huffman code(this will come later)


I started with:

 f = open('test.txt', 'r')    #open test.tx
 for lines in f:
     print lines          #to ensure if all work...

How can I order reading characters from file in alphabetic order:

with open("test.txt") as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1

???Many thanks


Well I tried like this:
frequencies = collections.defaultdict(int)
with open("test.txt") as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1


 frequencies = [(count, char) for char, count in frequencies.iteritems()]
 frequencies.sort(key=operator.itemgetter(1))

But compiler return me an "error"
enter code here

I need this alphabetic order in for loop, not at end at frequencies...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

红墙和绿瓦 2024-10-13 05:43:18

要获取频率表,我将使用 defaultdict。这只会迭代数据一次。

import collections
import operator

frequencies = collections.defaultdict(int)
with open(filename) as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1


frequencies = [(count, char) for char, count in frequencies.iteritems()]
frequencies.sort(key=operator.itemgetter(1))

To get your table of frequencies, I would use a defaultdict. This will only iterate over the data once.

import collections
import operator

frequencies = collections.defaultdict(int)
with open(filename) as f_in:
    for line in f_in:
        for char in line:
            frequencies[char] += 1


frequencies = [(count, char) for char, count in frequencies.iteritems()]
frequencies.sort(key=operator.itemgetter(1))
南城追梦 2024-10-13 05:43:18
with open('test.txt') as f: data = f.read()
table = dict((c, data.count(c)) for c in set(data))
with open('test.txt') as f: data = f.read()
table = dict((c, data.count(c)) for c in set(data))
念三年u 2024-10-13 05:43:18

我使用collections.Counter()制作了这个解决方案:

import re
import collections


if __name__ == '__main__':
    is_letter = re.compile('[A-Za-z]')

    frequencies = collections.Counter()
    with open(r'text.txt') as f_in:
        for line in f_in:
            for char in line:
                if is_letter.match(char):
                    frequencies[char.lower()] += 1

    # Sort characters 
    characters = [x[0] for x in frequencies.most_common()]
    characters.sort()
    for c in characters:
        print c, '|', str(frequencies[c])

正则表达式is_letter用于仅过滤我们感兴趣的字符。
它给出的输出看起来像这样。

a | 177
b | 29
c | 7
d | 167
e | 374
f | 58
g | 100
h | 44
i | 135
j | 21
k | 64
l | 125
m | 85
n | 191
o | 105
p | 34
r | 185
s | 130
t | 146
u | 34
v | 68
x | 1
y | 14

I made this solution using a collections.Counter():

import re
import collections


if __name__ == '__main__':
    is_letter = re.compile('[A-Za-z]')

    frequencies = collections.Counter()
    with open(r'text.txt') as f_in:
        for line in f_in:
            for char in line:
                if is_letter.match(char):
                    frequencies[char.lower()] += 1

    # Sort characters 
    characters = [x[0] for x in frequencies.most_common()]
    characters.sort()
    for c in characters:
        print c, '|', str(frequencies[c])

The regular expression is_letter is used to filter for only the characters we are interested in.
It gives output that looks like this.

a | 177
b | 29
c | 7
d | 167
e | 374
f | 58
g | 100
h | 44
i | 135
j | 21
k | 64
l | 125
m | 85
n | 191
o | 105
p | 34
r | 185
s | 130
t | 146
u | 34
v | 68
x | 1
y | 14
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文