如何计算每个用户组中的项目数

发布于 2024-11-02 07:41:47 字数 418 浏览 7 评论 0原文

我怎样才能输出这样的结果：

user    I   R   H
=================
atl001  2   1   0
cms017  1   2   1
lhc003  0   1   2

从这样的列表：

atl001 I
atl001 I
cms017 H
atl001 R
lhc003 H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R

即我想计算每个 I、H 和 R 的数量用户。请注意，在这种特殊情况下，我无法使用 itertools 中的 groupby 。预先感谢您的帮助。干杯！！

原文

How can I output a result like this:

user    I   R   H
=================
atl001  2   1   0
cms017  1   2   1
lhc003  0   1   2

from a list like this:

atl001 I
atl001 I
cms017 H
atl001 R
lhc003 H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R

i.e. I want to calculate the number of I, H and R per user. Just a note that I can't use groupby from itertools in this particular case. Thanks in advance for your help. Cheers!!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟织青萝梦 2024-11-09 07:41:47

data='''atl001 I
atl001 I
cms017 H
atl001 R
lhc003 H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R'''

stats={}
for i in data.split('\n'):
    user, irh = i.split()
    u = stats.setdefault(user, {})
    u[irh] = u.setdefault(irh, 0) + 1

print 'user  I  R  H'
for user in sorted(stats):
    stat = stats[user]
    print user, stat.get('I', 0), stat.get('R', 0), stat.get('H', 0)

data='''atl001 I
atl001 I
cms017 H
atl001 R
lhc003 H
cms017 R
cms017 I
lhc003 H
lhc003 R
cms017 R'''

stats={}
for i in data.split('\n'):
    user, irh = i.split()
    u = stats.setdefault(user, {})
    u[irh] = u.setdefault(irh, 0) + 1

print 'user  I  R  H'
for user in sorted(stats):
    stat = stats[user]
    print user, stat.get('I', 0), stat.get('R', 0), stat.get('H', 0)

回复收藏 0 原文

北斗星光 2024-11-09 07:41:47

data = 112*'cms017 R\n'

data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'

stats = {}
d = {'I':0,'R':1,'H':2}
L = 0
for line in data.splitlines():
    user,irh = line.split()
    stats.setdefault(user,[0,0,0])
    stats[user][d[irh]] += 1
    L = max(L, len(user))

LL = len(str(max(max(stats[user])
                 for user in stats )))

cale = ' %%%ds %%%ds %%%ds' % (LL,LL,LL)
ch = 'user'.ljust(L) + cale % ('I','R','H')

print '%s\n%s' % (ch, len(ch)*'=')
print '\n'.join(user.ljust(L) + cale % tuple(stats[user])
                for user in sorted(stats.keys()))

结果

user        I   R   H
=====================
atl001      2   1   0
cms017      0 117   1
lhcabc003   0   1   2

。

另外：

data = 14*'cms017 R\n'

data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'

Y = {}
L = 0
for line in data.splitlines():
    user,irh = line.split()
    L = max(L, len(user))
    if (user,irh) not in Y:
        Y.update({(user,'I'):0,(user,'R'):0,(user,'H'):0})
    Y[(user,irh)] += 1

LL = len(str(max(x for x in Y.itervalues())))

cale = '%%-%ds %%%ds %%%ds %%%ds' % (L,LL,LL,LL)
ch = cale % ('user','I','R','H')

print '%s\n%s' % (ch, len(ch)*'=')
li = sorted(Y.keys())
print '\n'.join(cale % (a[0],Y[b],Y[c],Y[a])
                for a,b,c in (li[x:x+3] for x in xrange(0,len(li),3)))

结果

user       I  R  H
==================
atl001     2  1  0
cms017     0 19  1
lhcabc003  0  1  2

。

PS：

用户的名字都是在L个字符中对齐的

在我的代码中，为了避免塞巴斯蒂安代码中的复杂性，I，R，H在相同数量的字符LL中对齐，这是最大的此列中出现的所有结果

data = 112*'cms017 R\n'

data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'

stats = {}
d = {'I':0,'R':1,'H':2}
L = 0
for line in data.splitlines():
    user,irh = line.split()
    stats.setdefault(user,[0,0,0])
    stats[user][d[irh]] += 1
    L = max(L, len(user))

LL = len(str(max(max(stats[user])
                 for user in stats )))

cale = ' %%%ds %%%ds %%%ds' % (LL,LL,LL)
ch = 'user'.ljust(L) + cale % ('I','R','H')

print '%s\n%s' % (ch, len(ch)*'=')
print '\n'.join(user.ljust(L) + cale % tuple(stats[user])
                for user in sorted(stats.keys()))

result

user        I   R   H
=====================
atl001      2   1   0
cms017      0 117   1
lhcabc003   0   1   2

Also:

data = 14*'cms017 R\n'

data = data + '''atl001 I
cms017 R
atl001 I
cms017 H
atl001 R
lhcabc003 H
cms017 R
lhcabc003 H
lhcabc003 R
cms017 R
cms017 R
cms017 R'''
print data,'\n'

Y = {}
L = 0
for line in data.splitlines():
    user,irh = line.split()
    L = max(L, len(user))
    if (user,irh) not in Y:
        Y.update({(user,'I'):0,(user,'R'):0,(user,'H'):0})
    Y[(user,irh)] += 1

LL = len(str(max(x for x in Y.itervalues())))

cale = '%%-%ds %%%ds %%%ds %%%ds' % (L,LL,LL,LL)
ch = cale % ('user','I','R','H')

print '%s\n%s' % (ch, len(ch)*'=')
li = sorted(Y.keys())
print '\n'.join(cale % (a[0],Y[b],Y[c],Y[a])
                for a,b,c in (li[x:x+3] for x in xrange(0,len(li),3)))

result

user       I  R  H
==================
atl001     2  1  0
cms017     0 19  1
lhcabc003  0  1  2

PS:

The names of users are all justified in a number L of characters

In my code the columns, to avoid complexity as in the Sebastian's code, I, R , H are justified in the same number LL of characters, which is the max of all the results present in this columns

回复收藏 0 原文

淡莣 2024-11-09 07:41:47

好吧，无论如何，使用groupby来解决这个问题是没有意义的。首先，您的数据未排序（groupby 不会为您对组进行排序），并且这些行非常简单。

处理每一行时，只需记数即可。我假设您不知道会得到什么标志：

from sets import Set as set # python2.3 compatibility
counts = {} # counts stored in user -> dict(flag=counter) nested dicts
flags = set()
for line in inputfile:
    user, flag = line.strip().split()
    usercounts = counts.setdefault(user, {})
    usercounts[flag] = usercounts.setdefault(flag, 0) + 1
    flags.add(flag)

之后打印信息是迭代计数结构的问题。我假设用户名总是 6 个字符长：

flags = list(flags)
flags.sort()
users = counts.keys()
users.sort()
print "user  %s" % ('  '.join(flags))
print "=" * (6 + 3 * len(flags))
for user in users:
    line = [user]
    for flag in flags:
        line.append(counts[user].get(flag, 0))
    print '  '.join(line)

上面的所有代码都未经测试，但应该大致可以工作。

Well, using groupby for this problem makes no sense anyway. For starters, your data isn't sorted (groupby doesn't sort the groups for you), and the lines are very simple.

Just keep count as you process each line. I am assuming you don't know what flags you'll get:

from sets import Set as set # python2.3 compatibility
counts = {} # counts stored in user -> dict(flag=counter) nested dicts
flags = set()
for line in inputfile:
    user, flag = line.strip().split()
    usercounts = counts.setdefault(user, {})
    usercounts[flag] = usercounts.setdefault(flag, 0) + 1
    flags.add(flag)

Printing the info after that is a question of iterating over your counts structure. I am assuming usernames are always 6 characters long:

flags = list(flags)
flags.sort()
users = counts.keys()
users.sort()
print "user  %s" % ('  '.join(flags))
print "=" * (6 + 3 * len(flags))
for user in users:
    line = [user]
    for flag in flags:
        line.append(counts[user].get(flag, 0))
    print '  '.join(line)

All code above is untested, but should roughly work.

回复收藏 0 原文

你是年少的欢喜 2024-11-09 07:41:47

这是一个使用嵌套字典来计算作业状态并在打印前计算最大字段宽度的变体：

#!/usr/bin/env python
import fileinput
from sets import Set as set # python2.3

# parse job statuses
counter = {}
for line in fileinput.input():
    user, jobstatus = line.split()
    d = counter.setdefault(user, {})
    d[jobstatus] = d.setdefault(jobstatus, 0) + 1

# print job statuses
# . find field widths
status_names = set([name for st in counter.itervalues() for name in st])
maxstatuslens = [max([len(str(i)) for st in counter.itervalues()
                      for n, i in st.iteritems()
                      if name == n])
                 for name in status_names]
maxuserlen = max(map(len, counter))
row_format = (("%%-%ds " % maxuserlen) +
              " ".join(["%%%ds" % n for n in maxstatuslens]))
# . print header
header = row_format % (("user",) + tuple(status_names))
print header
print '='*len(header)
# . print rows
for user, statuses in counter.iteritems():
    print row_format % (
        (user,) + tuple([statuses.get(name, 0) for name in status_names]))

示例

$ python print-statuses.py <input.txt
user   I H R
============
lhc003 0 2 1
cms017 1 1 2
atl001 2 0 1

这是一个使用平面字典并以元组 (user, status_name) 作为键的变体：

#!/usr/bin/env python
import fileinput
from sets import Set as set # python 2.3

# parse job statuses
counter = {}
maxstatuslens = {}
maxuserlen = 0
for line in fileinput.input():
    key = user, status_name = tuple(line.split())
    i = counter[key] = counter.setdefault(key, 0) + 1
    maxstatuslens[status_name] = max(maxstatuslens.setdefault(status_name, 0),
                                     len(str(i)))
    maxuserlen = max(maxuserlen, len(user))

# print job statuses
row_format = (("%%-%ds " % maxuserlen) +
              " ".join(["%%%ds" % n for n in maxstatuslens.itervalues()]))
# . print header
header = row_format % (("user",) + tuple(maxstatuslens))
print header
print '='*len(header)
# . print rows
for user in set([k[0] for k in counter]):
    print row_format % ((user,) +
        tuple([counter.get((user, status), 0) for status in maxstatuslens]))

用法和输出是一样的。

Here's a variant that uses nested dicts to count job statuses and computes max field widths before printing:

#!/usr/bin/env python
import fileinput
from sets import Set as set # python2.3

# parse job statuses
counter = {}
for line in fileinput.input():
    user, jobstatus = line.split()
    d = counter.setdefault(user, {})
    d[jobstatus] = d.setdefault(jobstatus, 0) + 1

# print job statuses
# . find field widths
status_names = set([name for st in counter.itervalues() for name in st])
maxstatuslens = [max([len(str(i)) for st in counter.itervalues()
                      for n, i in st.iteritems()
                      if name == n])
                 for name in status_names]
maxuserlen = max(map(len, counter))
row_format = (("%%-%ds " % maxuserlen) +
              " ".join(["%%%ds" % n for n in maxstatuslens]))
# . print header
header = row_format % (("user",) + tuple(status_names))
print header
print '='*len(header)
# . print rows
for user, statuses in counter.iteritems():
    print row_format % (
        (user,) + tuple([statuses.get(name, 0) for name in status_names]))

Example

$ python print-statuses.py <input.txt
user   I H R
============
lhc003 0 2 1
cms017 1 1 2
atl001 2 0 1

Here's a variant that uses flat dictionary with a tuple (user, status_name) as a key:

#!/usr/bin/env python
import fileinput
from sets import Set as set # python 2.3

# parse job statuses
counter = {}
maxstatuslens = {}
maxuserlen = 0
for line in fileinput.input():
    key = user, status_name = tuple(line.split())
    i = counter[key] = counter.setdefault(key, 0) + 1
    maxstatuslens[status_name] = max(maxstatuslens.setdefault(status_name, 0),
                                     len(str(i)))
    maxuserlen = max(maxuserlen, len(user))

# print job statuses
row_format = (("%%-%ds " % maxuserlen) +
              " ".join(["%%%ds" % n for n in maxstatuslens.itervalues()]))
# . print header
header = row_format % (("user",) + tuple(maxstatuslens))
print header
print '='*len(header)
# . print rows
for user in set([k[0] for k in counter]):
    print row_format % ((user,) +
        tuple([counter.get((user, status), 0) for status in maxstatuslens]))

The usage and output are the same.

回复收藏 0 原文