家庭邮件合并（代码高尔夫）

发布于 2024-10-01 11:57:31 字数 2194 浏览 8 评论 0原文

前几天我写了一些邮件合并代码，虽然它有效，但我对这些代码很感兴趣。我想看看它在其他语言中是什么样子。

因此，对于输入，例程采用联系人列表，

Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Erica,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Marge,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
Ted,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Raoul,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6

然后它将具有相同地址和姓氏的行合并到一条记录中。假设行未排序）。代码还应该足够灵活，可以按任何顺序提供字段（因此需要将字段索引作为参数）。对于两个人的家庭，它连接两个名字字段。对于三人或三人以上的家庭，名字设置为“the”，姓氏设置为“surname family”。

Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004

我的 C# 实现是：

var source = File.ReadAllLines(@"sample.csv").Select(l => l.Split(','));
var merged = HouseholdMerge(source, 0, 1, new[] {1, 2, 3, 4, 5});

public static IEnumerable<string[]> HouseholdMerge(IEnumerable<string[]> data, int fnIndex, int lnIndex, int[] groupIndexes)
{            
    Func<string[], string> groupby = fields => String.Join("", fields.Where((f, i) => groupIndexes.Contains(i)));

    var groups = data.OrderBy(groupby).GroupBy(groupby);

    foreach (var group in groups)
    {
        string[] result = group.First().ToArray();

        if (group.Count() == 2)
        {
            result[fnIndex] += " and " + group.ElementAt(1)[fnIndex];
        }
        else if (group.Count() > 2)
        {
            result[fnIndex] = "The";
            result[lnIndex] += " Family";
        }

        yield return result;
    }            
}

我不喜欢如何执行 groupby 委托。我希望 C# 是否有某种方法将字符串表达式转换为委托。例如 Func groupby = f => “f[2] + f[3] + f[4] + f[5] + f[1]；”我有一种感觉，类似的事情可能可以用 Lisp 或 Python 来完成。我期待看到其他语言更好的实现。

编辑：社区 wiki 复选框去了哪里？请某些模组修复该问题。

原文

I wrote some mail merge code the other day and although it works I'm a turned off by the code. I'd like to see what it would look like in other languages.

So for the input the routine takes a list of contacts

Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Erica,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Marge,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
Ted,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Raoul,Simpson,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6

It will then merge lines with the same address and surname into one record. Assume the rows are unsorted). The code should also be flexible enough that fields can be supplied in any order (so it will need to take field indexes as parameters). For a family of two it concatenates both first name fields. For a family of three or more the first name is set to "the" and the lastname is set to "surname family".

Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004

My C# implementation of this is:

var source = File.ReadAllLines(@"sample.csv").Select(l => l.Split(','));
var merged = HouseholdMerge(source, 0, 1, new[] {1, 2, 3, 4, 5});

public static IEnumerable<string[]> HouseholdMerge(IEnumerable<string[]> data, int fnIndex, int lnIndex, int[] groupIndexes)
{            
    Func<string[], string> groupby = fields => String.Join("", fields.Where((f, i) => groupIndexes.Contains(i)));

    var groups = data.OrderBy(groupby).GroupBy(groupby);

    foreach (var group in groups)
    {
        string[] result = group.First().ToArray();

        if (group.Count() == 2)
        {
            result[fnIndex] += " and " + group.ElementAt(1)[fnIndex];
        }
        else if (group.Count() > 2)
        {
            result[fnIndex] = "The";
            result[lnIndex] += " Family";
        }

        yield return result;
    }            
}

I don't like how I've had to do the groupby delegate. I'd like if C# had some way to convert a string expression to a delegate. e.g. Func groupby = f => "f[2] + f[3] + f[4] + f[5] + f[1];" I have a feeling something like this can probably be done in Lisp or Python. I look forward to seeing nicer implementation in other languages.

Edit: Where did the community wiki checkbox go? Some mod please fix that.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过期以后 2024-10-08 11:57:31

Ruby — 181 155

姓名索引位于代码：a 和 b 中。输入数据来自 ARGF。

a,b=0,1
[*lt;].map{|i|i.strip.split ?,}.group_by{|i|i.rotate(a).drop 1}.map{|i,j|k,l,m=j
k[a]+=' and '+l[a]if l
(k[a]='The';k[b]+=' Family')if m
puts k*','}

Ruby — 181 155

Name/surname indexes are in code:a and b. Input data is from ARGF.

a,b=0,1
[*lt;].map{|i|i.strip.split ?,}.group_by{|i|i.rotate(a).drop 1}.map{|i,j|k,l,m=j
k[a]+=' and '+l[a]if l
(k[a]='The';k[b]+=' Family')if m
puts k*','}

回复收藏 0 原文

高冷爸爸 2024-10-08 11:57:31

Python - 没有打高尔夫球，

如果输入文件的索引不是 0 和 1，我不确定行的顺序应该是什么

import csv
from collections import defaultdict

class HouseHold(list):
    def __init__(self, fn_idx, ln_idx):
        self.fn_idx = fn_idx
        self.ln_idx = ln_idx

    def append(self, item):
        self.item = item
        list.append(self, item[self.fn_idx])

    def get_value(self):
        fn_idx = self.fn_idx
        ln_idx = self.ln_idx
        item = self.item
        addr = [j for i,j in enumerate(item) if i not in (fn_idx, ln_idx)]
        if len(self) < 3:
            fn, ln = " and ".join(self), item[ln_idx]
        else:
            fn, ln = "The", item[ln_idx]+" Family"
        return [fn, ln] + addr

def source(fname):
    with open(fname) as in_file:
        for item in csv.reader(in_file):
            yield item

def household_merge(src, fn_idx, ln_idx, groupby):
    res = defaultdict(lambda:HouseHold(fn_idx, ln_idx))
    for item in src:
        key = tuple(item[x] for x in groupby)
        res[key].append(item)
    return res.values()

data =  household_merge(source("sample.csv"), 0, 1, [1,2,3,4,5,6,7])
with open("result.csv", "w") as out_file:
    csv.writer(out_file).writerows(item.get_value() for item in data)

Python - not golfed

I'm not sure what the order of the rows should be if the indices are not 0 and 1 for the input file

import csv
from collections import defaultdict

class HouseHold(list):
    def __init__(self, fn_idx, ln_idx):
        self.fn_idx = fn_idx
        self.ln_idx = ln_idx

    def append(self, item):
        self.item = item
        list.append(self, item[self.fn_idx])

    def get_value(self):
        fn_idx = self.fn_idx
        ln_idx = self.ln_idx
        item = self.item
        addr = [j for i,j in enumerate(item) if i not in (fn_idx, ln_idx)]
        if len(self) < 3:
            fn, ln = " and ".join(self), item[ln_idx]
        else:
            fn, ln = "The", item[ln_idx]+" Family"
        return [fn, ln] + addr

def source(fname):
    with open(fname) as in_file:
        for item in csv.reader(in_file):
            yield item

def household_merge(src, fn_idx, ln_idx, groupby):
    res = defaultdict(lambda:HouseHold(fn_idx, ln_idx))
    for item in src:
        key = tuple(item[x] for x in groupby)
        res[key].append(item)
    return res.values()

data =  household_merge(source("sample.csv"), 0, 1, [1,2,3,4,5,6,7])
with open("result.csv", "w") as out_file:
    csv.writer(out_file).writerows(item.get_value() for item in data)

回复收藏 0 原文

镜花水月 2024-10-08 11:57:31

Python - 178 个字符

import sys
d={}
for x in sys.stdin:F,c,A=x.partition(',');d[A]=d.get(A,[])+[F]
print"".join([" and ".join(v)+c+A,"The"+c+A.replace(c,' Family,',1)][2<len(v)]for A,v in d.items())

输出

Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004

Python - 178 chars

import sys
d={}
for x in sys.stdin:F,c,A=x.partition(',');d[A]=d.get(A,[])+[F]
print"".join([" and ".join(v)+c+A,"The"+c+A.replace(c,' Family,',1)][2<len(v)]for A,v in d.items())

Output

Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004

回复收藏 0 原文

命硬 2024-10-08 11:57:31

Python 2.6.6 - 287 个字符

这假设您可以硬编码文件名（名为 i）。如果您想从命令行获取输入，则会增加约 16 个字符。

from itertools import*
for z,g in groupby(sorted([l.split(',')for l in open('i').readlines()],key=lambda x:x[1:]), lambda x:x[2:]):
 l=list(g);r=len(l);k=','.join(z);o=l[0]
 if r>2:print'The,'+o[1],"Family,"+k,
 elif r>1:print o[0],"and",l[1][0]+","+o[1]+","+k,
 else:print','.join(o),

输出

Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004

我相信这可以改进，但已经晚了。

Python 2.6.6 - 287 Characters

This assumes you can hard code a filename (named i). If you want to take input from command line, this goes up ~16 chars.

from itertools import*
for z,g in groupby(sorted([l.split(',')for l in open('i').readlines()],key=lambda x:x[1:]), lambda x:x[2:]):
 l=list(g);r=len(l);k=','.join(z);o=l[0]
 if r>2:print'The,'+o[1],"Family,"+k,
 elif r>1:print o[0],"and",l[1][0]+","+o[1]+","+k,
 else:print','.join(o),

Output

Erica and Abraham,Johnson,2681 Eagle Peak,,Bellevue,Washington,United States,98004
Larry,Lyon,52560 Free Street,,Toronto,Ontario,Canada,M4B 1V7
The,Simpson Family,6388 Lake City Way,,Burnaby,British Columbia,Canada,V5A 3A6
Jim,Smith,2681 Eagle Peak,,Bellevue,Washington,United States,98004

I'm sure this could be improved upon, but it is getting late.

回复收藏 0 原文

︶￣淡然 2024-10-08 11:57:31

Haskell - 341 321

（根据评论进行更改）。

不幸的是，Haskell 没有标准的 split 函数，这使得这个函数变得相当长。

输入到 stdin，输出到 stdout。

import List
import Data.Ord
main=interact$unlines.e.lines
s[]=[]
s(',':x)=s x
s l@(x:y)=let(h,i)=break(==k)l in h:(s i)
t[]=[]
t x=tail x
h=head
m=map
k=','
e l=m(t.(>>=(k:)))$(m c$groupBy g$sortBy(comparing t)$m s l)
c(x:[])=x
c(x:y:[])=(h x++" and "++h y):t x
c x="The":((h$t$h x)++" Family"):(t$t$h x)
g a b=t a==t b

Haskell - 341 321

(Changes as per comments).

Unfortunately Haskell has no standard split function which makes this rather long.

Input to stdin, output on stdout.

import List
import Data.Ord
main=interact$unlines.e.lines
s[]=[]
s(',':x)=s x
s l@(x:y)=let(h,i)=break(==k)l in h:(s i)
t[]=[]
t x=tail x
h=head
m=map
k=','
e l=m(t.(>>=(k:)))$(m c$groupBy g$sortBy(comparing t)$m s l)
c(x:[])=x
c(x:y:[])=(h x++" and "++h y):t x
c x="The":((h$t$h x)++" Family"):(t$t$h x)
g a b=t a==t b

回复收藏 0 原文

江南月 2024-10-08 11:57:31

Lua，434字节

x,y=1,2 s,p,r,a=string.gsub,pairs,io.read,{}for j,b,c,d,e,f,g,h,i in r('*a'):gmatch('('..('([^,]*),'):rep(7)..'([^,]*))\n')
do k=s(s(s(j,b,''),c,''),'[,%s]','')for l,m in p(a)do if not m.f and (m[y]:match(c) and m[9]==k) then z=1
if m.d then m[x]="The"m[y]=m[y]..' family'm.f=1 else m[x]=m[x].." and "..b m.d=1 end end end if not z then
a[#a+1]={b,c,d,e,f,g,h,i,k} end z=nil end for k,v in p(a)do v[9]=nil print(table.concat(v,','))end

Lua, 434 bytes

x,y=1,2 s,p,r,a=string.gsub,pairs,io.read,{}for j,b,c,d,e,f,g,h,i in r('*a'):gmatch('('..('([^,]*),'):rep(7)..'([^,]*))\n')
do k=s(s(s(j,b,''),c,''),'[,%s]','')for l,m in p(a)do if not m.f and (m[y]:match(c) and m[9]==k) then z=1
if m.d then m[x]="The"m[y]=m[y]..' family'm.f=1 else m[x]=m[x].." and "..b m.d=1 end end end if not z then
a[#a+1]={b,c,d,e,f,g,h,i,k} end z=nil end for k,v in p(a)do v[9]=nil print(table.concat(v,','))end

回复收藏 0 原文

~没有更多了~