针对 Codejam 问题的 Python 更快或更高效的解决方案

发布于 2024-08-26 18:46:51 字数 2227 浏览 4 评论 0原文

我尝试了这个 Google Codejam Africa< /a> 问题（比赛已经结束，我只是为了提高我的编程技能）。

问题：
您正在与 G 位客人举办派对并注意有一个奇数的客人！当您计划聚会时特意只邀请情侣和给每对夫妇一个独特的数字C 他们的邀请。你愿意挑出单独过来的人询问所有客人的情况邀请号码。
输入：
输入的第一行给出案例数 N。接下来是 N 个测试用例。对于每个测试用例，将有：
一行包含值 G 客人人数。
一行包含以空格分隔的 G 整数列表。每个整数C表示客人的邀请码。输出
对于每个测试用例，输出一行包含“Case #x:”，后跟独自一人的客人的号码C。
限制：
1≤N≤50
0 < C≤2147483647
小数据集
3≤G＜ 100
大数据集
3≤G＜ 1000

示例输入：

3
3
1 2147483647 2147483647
5
3 4 7 4 3
5
2 10 2 10 5

示例输出：

Case #1: 1
Case #2: 7
Case #3: 5

这是我想出的解决方案：

with open('A-large-practice.in') as f:
    lines = f.readlines()

with open('A-large-practice.out', 'w') as output:
    N = int(lines[0])
    for testcase, i in enumerate(range(1,2*N,2)):
        G = int(lines[i])
        for guest in range(G):
            codes = map(int, lines[i+1].split(' '))
            alone = (c for c in codes if codes.count(c)==1)
        output.write("Case #%d: %d\n" % (testcase+1, alone.next()))

它在我的计算机上运行 12 秒，使用大输入。

现在，我的问题是，这个解决方案可以在 Python 中改进，以在更短的时间内运行或使用更少的内存吗？对问题的分析给出了关于如何在 Java 和 C++ 中执行此操作的一些指示，但我无法将这些解决方案转换回 Python。

编辑：

结合下面 Alex Martelli 和 IVlad 的提示，我现在有了这个运行时间为 0.079 秒的解决方案：

with open('A-large-practice.in') as f:
    lines = f.readlines()

with open('A-large-practice.out', 'w') as output:
    N = int(lines[0])
    for testcase, i in enumerate(range(1,2*N,2)):
        codes = map(int, lines[i+1].split(' '))
        alone = 0
        for c in codes: alone ^= c
        output.write("Case #%d: %d" % (testcase+1, alone))

原文

I tried my hand at this Google Codejam Africa problem (the contest is already finished, I just did it to improve my programming skills).

The Problem:
You are hosting a party with G guests
and notice that there is an odd number
of guests! When planning the party you
deliberately invited only couples and
gave each couple a unique number C on
their invitation. You would like to
single out whoever came alone by
asking all of the guests for their
invitation numbers.
The Input:
The first line of input gives the number of cases, N.
N test cases follow. For each test case there will be:
One line containing the value G the
number of guests.
One line containing
a space-separated list of G integers.
Each integer C indicates the
invitation code of a guest. Output
For each test case, output one line
containing "Case #x: " followed by the
number C of the guest who is alone.
The Limits:
1 ≤ N ≤ 50
0 < C ≤ 2147483647
Small dataset
3 ≤ G < 100
Large dataset
3 ≤ G < 1000

Sample Input:

3
3
1 2147483647 2147483647
5
3 4 7 4 3
5
2 10 2 10 5

Sample Output:

Case #1: 1
Case #2: 7
Case #3: 5

This is the solution that I came up with:

with open('A-large-practice.in') as f:
    lines = f.readlines()

with open('A-large-practice.out', 'w') as output:
    N = int(lines[0])
    for testcase, i in enumerate(range(1,2*N,2)):
        G = int(lines[i])
        for guest in range(G):
            codes = map(int, lines[i+1].split(' '))
            alone = (c for c in codes if codes.count(c)==1)
        output.write("Case #%d: %d\n" % (testcase+1, alone.next()))

It runs in 12 seconds on my machine with the large input.

Now, my question is, can this solution be improved in Python to run in a shorter time or use less memory? The analysis of the problem gives some pointers on how to do this in Java and C++ but I can't translate those solutions back to Python.

Edit:

Incorporating the tips from Alex Martelli and IVlad below I now have this solution which runs in 0.079s:

with open('A-large-practice.in') as f:
    lines = f.readlines()

with open('A-large-practice.out', 'w') as output:
    N = int(lines[0])
    for testcase, i in enumerate(range(1,2*N,2)):
        codes = map(int, lines[i+1].split(' '))
        alone = 0
        for c in codes: alone ^= c
        output.write("Case #%d: %d" % (testcase+1, alone))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜心小果奶 2024-09-02 18:46:51

我不懂python，但是这个问题本身就是经典。给定 2K - 1 个数字，除了出现偶数次的数字外，找出出现奇数次的数字。

所需公式：

x xor x == 0 对于所有 x
x xor y == y xor x 对于所有 x 和 y
x xor (y xor z) = = (x xor y) xor z（关联性）
x xor 0 == x 对于所有 x

所以异或所有数字。对所有数字进行异或运算的结果就是您的答案。我不知道如何在 python 中执行此操作，在 C 语言中，异或运算符是 ^。

这确实是最有效的方法，因为您只需执行简单的按位运算，甚至不必存储给定的数字。

您可以检查：
3 ^ 4 ^ 7 ^ 4 ^ 3 == 7，2 ^ 10 ^ 2 ^ 10 ^ 5 == 5等。

回复收藏 0 原文

夏日浅笑〃 2024-09-02 18:46:51

我的机器比你的慢——你的代码（放入 main 函数）在我的机器上花了 19 秒。删除无用的内部 for 的明显优化将其缩短为 0.46 秒；更改代码的核心以将

      alone = 0
      for c in codes: alone ^= c

时间缩短至 0.08 秒。我确信进一步的优化是可能的，但由于代码已经快了 235 倍，我认为这可能就足够了；-)。

My machine's slower than yours -- your code (put in a main function) took 19 seconds on my machine. The obvious optimization of removing the useless inner for cuts this to 0.46 seconds; changing the heart of the code to

      alone = 0
      for c in codes: alone ^= c

cuts the time to 0.08 seconds. I'm sure further optimizations are possible, but with the code already 235 times faster I think this may be enough;-).

回复收藏 0 原文

尽揽少女心 2024-09-02 18:46:51

考虑一下：

codes = map(int, lines[i+1].split(' '))

为什么要为 guest 的每个值重新分割行？线路不会改变。

另外，请考虑这一点：

for guest in range(G):
    codes = map(int, lines[i+1].split(' '))
    alone = (c for c in codes if codes.count(c)==1)

此循环中的所有代码都不依赖于 guest。为什么是循环呢？

Consider this:

codes = map(int, lines[i+1].split(' '))

Why resplit the line for each value of guest? The line won't change.

Also, consider this:

for guest in range(G):
    codes = map(int, lines[i+1].split(' '))
    alone = (c for c in codes if codes.count(c)==1)

None of the code in this loop depends on guest. Why is it in a loop?

回复收藏 0 原文

静水深流 2024-09-02 18:46:51

这对我来说立即完成并得到相同的结果：

src = open('A-large-practice.in')
dst = open('A-large-practice.out', 'w')

n = int(src.readline().rstrip())
for i in xrange(n):
    src.readline() # skip g
    nums = src.readline().split()
    seen = set()
    for n in nums:
        if n in seen:
            seen.remove(n)
        else:
            seen.add(n)
    dst.write("Case #%d: %s\n" % (i+1, tuple(seen)[0]))

src.close()
dst.close()

该死，我喜欢哈希映射！

This finishes instantly for me with the same results:

src = open('A-large-practice.in')
dst = open('A-large-practice.out', 'w')

n = int(src.readline().rstrip())
for i in xrange(n):
    src.readline() # skip g
    nums = src.readline().split()
    seen = set()
    for n in nums:
        if n in seen:
            seen.remove(n)
        else:
            seen.add(n)
    dst.write("Case #%d: %s\n" % (i+1, tuple(seen)[0]))

src.close()
dst.close()

Damn, I love hash maps!

回复收藏 0 原文

合久必婚 2024-09-02 18:46:51

是的，正如分析页面所解释的，您可以使用常量内存，尽管任何方法都将在 O(n) 时间内运行。以下是常量内存解决方案在 Python 中的工作原理：

numbers = [1, 1, 2, 2, 3, 3, 4, 4, 12345]
missingOne = 0
for number in numbers:
    missingOne = missingOne ^ number
assert missingOne == 12345

关键是理解异或运算符的作用。任何与零进行异或的数字都是其本身。任何与自身异或的数字都是零。如果您查看异或的真值表，您应该能够使自己相信这两个事实，然后证明从零开始对列表中的数字进行异或将产生不重复的数字。

Yes, as the analysis page explains, you can use constant memory, although any approach will run in O(n) time. Here's how the constant memory solution would work in Python:

numbers = [1, 1, 2, 2, 3, 3, 4, 4, 12345]
missingOne = 0
for number in numbers:
    missingOne = missingOne ^ number
assert missingOne == 12345

The key is understanding what the xor operator does. Any number xor'ed with zero is itself. And any number xor'ed with itself is zero. If you look at the truth table for xor, you should be able to convince yourself of both of those facts and then prove that xoring the numbers in a list, starting from zero will produce the non-duplicated number.

回复收藏 0 原文

~没有更多了~