获取每个给定范围之间的整数数量

发布于 2025-01-31 11:52:13 字数 1914 浏览 2 评论 0原文

任务:

编写一个接收3个数组并返回数组的函数。这 第一个数组包含n个整数,它们的值范围在0和 10^9。 “数字”。第二个数组是一个低距离阵列,其中包含 范围的下端包含Q整数。 “低的”。第三 数组是一个高范围的数组,包含一个范围的高端, 它包含Q整数。 “高”。

该功能应返回包含数量的数组 第一个数组中的整数落在其范围内,由 低距离和高距离阵列。

在返回的数组中,在索引i处,应该有 “数字”中的整数更大或等于低[i],较小或等于高[i]。

示例:
count_range([[12,13,14,15,17],[14],[14])应返回[1]
count_range([[12,13,14,15,17],[14,15],[14,18])应返回[1,2]
([[12,13,14,15,17],[12],[17])应返回[5]

count_range 看到阵列可能很长,而且数字可能真的很大。

我很高兴获得一些挑战的见解或测试,以帮助我朝着更好的方向思考。

def binarySearch(data, val):
    highIndex = len(data) - 1
    lowIndex = 0
    while highIndex > lowIndex:
        index = math.ceil((highIndex + lowIndex) / 2)
        sub = data[index]
        if sub > val:
            if highIndex == index:
                return sorted([highIndex, lowIndex])
            highIndex = index
        else:
            if lowIndex == index:
                return sorted([highIndex, lowIndex])
            lowIndex = index
    return sorted([highIndex, lowIndex])


def count_range(numbers, low, high):
    numbers.sort()
    result = []
    range_dict = {}
    for i in range(len(numbers)):
        if numbers[i] not in range_dict:
            range_dict[numbers[i]] = i
    for i in range(len(low)):
        low_r = low[i]
        high_r = high[i]
        if low_r not in range_dict:
            range_dict[low_r] = binarySearch(numbers, low_r)[0]
        low_index = range_dict.get(low_r)
        if high_r not in range_dict:
            range_dict[high_r] = binarySearch(numbers, high_r)[0]
        high_index = range_dict.get(high_r)
        while high_index+1 < len(numbers) and numbers[high_index + 1] == numbers[high_index]:
            high_index += 1
        if low_r in numbers or low_r < numbers[0]:
            low_index -= 1
        result.append(high_index - low_index)
    print(result)
    return result

The task:

Write a function that receives 3 arrays and returns an array. The
first array contains n integers, their values range between 0 and
10^9. "numbers". The second array is a low-range array, which contains
the lower end of a range, it contains q integers. "low". The third
array is a high-range array, which contains the higher end of a range,
it contains q integers. "high".

The function should return an array that contains the number of
integers in the first array, that fall in its range, given by the
low-range and high-range arrays.

In the returned array, at index i, there should be the number of
integers in "numbers" which are bigger or equal to low[i] and smaller or equal to high[i].

Examples:
count_range([12,13,14,15,17],[14],[14]) should return [1]
count_range([12,13,14,15,17],[14,15],[14,18]) should return [1,2]
count_range([12,13,14,15,17],[12],[17]) should return [5]

This is how I solved the question but I feel like there might be a more efficient way to solve seeing as the arrays could be long and the numbers could be really big.

I'd be glad to get some insights or tests that challenge this could to help me think in a better direction.

def binarySearch(data, val):
    highIndex = len(data) - 1
    lowIndex = 0
    while highIndex > lowIndex:
        index = math.ceil((highIndex + lowIndex) / 2)
        sub = data[index]
        if sub > val:
            if highIndex == index:
                return sorted([highIndex, lowIndex])
            highIndex = index
        else:
            if lowIndex == index:
                return sorted([highIndex, lowIndex])
            lowIndex = index
    return sorted([highIndex, lowIndex])


def count_range(numbers, low, high):
    numbers.sort()
    result = []
    range_dict = {}
    for i in range(len(numbers)):
        if numbers[i] not in range_dict:
            range_dict[numbers[i]] = i
    for i in range(len(low)):
        low_r = low[i]
        high_r = high[i]
        if low_r not in range_dict:
            range_dict[low_r] = binarySearch(numbers, low_r)[0]
        low_index = range_dict.get(low_r)
        if high_r not in range_dict:
            range_dict[high_r] = binarySearch(numbers, high_r)[0]
        high_index = range_dict.get(high_r)
        while high_index+1 < len(numbers) and numbers[high_index + 1] == numbers[high_index]:
            high_index += 1
        if low_r in numbers or low_r < numbers[0]:
            low_index -= 1
        result.append(high_index - low_index)
    print(result)
    return result

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

苄①跕圉湢 2025-02-07 11:52:13

使用Python的Bisect

from bisect import bisect_left, bisect_right

def count_range(numbers, low, high):
    numbers.sort()
    return [bisect_right(numbers, hi) - bisect_left(numbers, lo)
            for hi, lo in zip(high, low)]

在我的解决方案中,分类实际上花费了我90%的时间,然后处理查询仅花费了约10%的时间。

Benchmark with 100,000 numbers and 1000 ranges (Try it online! ):(

  71.6 ms  count_range_Sara
  24.1 ms  count_range_Kelly
6303.6 ms  count_range_Nin17

  70.5 ms  count_range_Sara
  23.1 ms  count_range_Kelly
6225.0 ms  count_range_Nin17

  64.5 ms  count_range_Sara
  23.1 ms  count_range_Kelly
6306.7 ms  count_range_Nin17

注意:我测量的萨拉的解决方案是原来的略带越野车,尽管这个问题是关于效率的问题,但我包括了它。我测量的NIN17解决方案是他们现在已删除的答案,而不是Numpy的解决方案。)

Using Python's bisect:

from bisect import bisect_left, bisect_right

def count_range(numbers, low, high):
    numbers.sort()
    return [bisect_right(numbers, hi) - bisect_left(numbers, lo)
            for hi, lo in zip(high, low)]

In my solution, the sorting actually took about 90% of my time, then handling the queries only took about 10% of my time.

Benchmark with 100,000 numbers and 1000 ranges (Try it online!):

  71.6 ms  count_range_Sara
  24.1 ms  count_range_Kelly
6303.6 ms  count_range_Nin17

  70.5 ms  count_range_Sara
  23.1 ms  count_range_Kelly
6225.0 ms  count_range_Nin17

  64.5 ms  count_range_Sara
  23.1 ms  count_range_Kelly
6306.7 ms  count_range_Nin17

(Note: Sara's solution I measured is the original slightly buggy one, which I included despite the buggyness as the question is about efficiency. And Nin17's solution that I measured is their original one from their now deleted answer, not the NumPy one.)

多情出卖 2025-02-07 11:52:13

使用numpy:

import numpy as np
def count_range(numbers, low, high):
    numbers.sort()
    return np.searchsorted(numbers, high, 'right') - np.searchsorted(numbers, low, 'left')

基准:

from timeit import timeit
def test_data():
    RANGE = range(10**9)
    numbers = random.choices(RANGE, k=10**5)
    ranges = (sorted(random.sample(RANGE, 2))
              for _ in range(1000))
    [*low], [*high] = zip(*ranges)
    return [numbers, low, high]
test = test_data()
%timeit kelly(*[t.copy() for t in test])
test2 = [np.array(i) for i in test]
%timeit Nin17(*[t.copy() for t in test2])

输出:

13.4 ms ± 77.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.63 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

With numpy:

import numpy as np
def count_range(numbers, low, high):
    numbers.sort()
    return np.searchsorted(numbers, high, 'right') - np.searchsorted(numbers, low, 'left')

Benchmark:

from timeit import timeit
def test_data():
    RANGE = range(10**9)
    numbers = random.choices(RANGE, k=10**5)
    ranges = (sorted(random.sample(RANGE, 2))
              for _ in range(1000))
    [*low], [*high] = zip(*ranges)
    return [numbers, low, high]
test = test_data()
%timeit kelly(*[t.copy() for t in test])
test2 = [np.array(i) for i in test]
%timeit Nin17(*[t.copy() for t in test2])

Output:

13.4 ms ± 77.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.63 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文