对一系列值进行哈希处理

发布于 2024-12-29 08:24:02 字数 998 浏览 2 评论 0原文

我知道我可以将奇异值哈希为 dict 中的键。例如,我可以将 5 哈希为 dict 中的键之一。

我目前面临一个问题,需要我对一系列值进行哈希处理。

基本上,我需要一种更快的方法来执行此操作:

if 0 <= x <= 0.1:
    # f(A)
elif 0.1 <= x <= 0.2:
    # f(B)
elif 0.2 <= x <= 0.3:
    # f(C)
elif 0.3 <= x <= 0.4:
    # f(D)
elif 0.4 <= x <= 0.5:
    # f(E)
elif 0.5 <= x <= 0.6:
    # f(F)

其中 x 是任意精度的浮点参数。

我能想到的最快的方法是散列,但问题是:我可以使用 (0.1, 0.2) 作为密钥,但这仍然会花费我 O(n) 运行时间,并且最终并不比大量 elif 更好(我必须迭代这些键并检查是否 key[0] <= x <= key[1])。

有没有办法对一系列值进行哈希处理,以便我可以检查哈希表中的 0.15 并仍然得到 #execute B

如果这样的散列不可能,我还能如何改进它的运行时间?我正在处理足够大的数据集,线性运行时间不够快。

编辑:为了回应奇琴的回答,我必须注意,不能假设间隔是规则的。事实上,我几乎可以保证他们不会

回应评论中的请求,我应该提到我这样做是为了尝试实现 遗传算法中基于适应度的选择。算法本身是为了做作业,但具体实现只是为了提高生成实验数据的运行时间。

I know that I can hash singular values as keys in a dict. For example, I can hash 5 as one of the keys in a dict.

I am currently facing a problem that requires me to hash a range of values.

Basically, I need a faster way to to do this:

if 0 <= x <= 0.1:
    # f(A)
elif 0.1 <= x <= 0.2:
    # f(B)
elif 0.2 <= x <= 0.3:
    # f(C)
elif 0.3 <= x <= 0.4:
    # f(D)
elif 0.4 <= x <= 0.5:
    # f(E)
elif 0.5 <= x <= 0.6:
    # f(F)

where x is some float parameter of arbitrary precision.

The fastest way I can think of is hashing, but here's the problem: I can use (0.1, 0.2) as a key, but that still is going to cost me O(n) runtime and is ultimately no better than the slew of elifs (I would have to iterate over the keys and check to see if key[0] <= x <= key[1]).

Is there a way to hash a range of values so that I can check the hash table for0.15 and still get #execute B?

If such a hashing isn't possible, how else might I be able to improve the runtime of this? I am working with large enough data sets that linear runtime is not fast enough.

EDIT: In response to cheeken's answer, I must note that the intervals cannot be assumed to be regular. As a matter of fact, I can almost guarantee that they are not

In response to requests in comments, I should mention that I am doing this in an attempt to implement fitness-based selection in a genetic algorithm. The algorithm itself is for homework, but the specific implementation is only to improve the runtime for generating experimental data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

埋葬我深情 2025-01-05 08:24:02

正如其他人所指出的,您将获得的最佳算法是 O(log N),而不是 O(1),类似于通过排序列表进行二分搜索的算法。

在 Python 中执行此操作的最简单方法是使用 bisect 标准模块 http://docs。 python.org/library/bisect.html。请特别注意第 8.5.2 节中关于进行数字表查找的示例 - 这正是您正在做的事情:

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

grades 字符串替换为函数列表,断点 列出您的较低阈值列表,然后就可以了。

As others have noted, the best algorithm you're going to get for this is something that's O(log N), not O(1), with something along the lines of a bisection search through a sorted list.

The easiest way to do this in Python is with the bisect standard module, http://docs.python.org/library/bisect.html. Note, in particular, the example in section 8.5.2 there, on doing numeric table lookups -- it's exactly what you are doing:

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

Replace the grades string with a list of functions, the breakpoints list with your list of lower thresholds, and there you go.

最美的太阳 2025-01-05 08:24:02

您不一定需要对整个值范围进行哈希处理。例如,在上面给出的比例中,如果给定 0.15,则可以将其四舍五入到 0.2(小数点后的第一位数字),然后哈希 0.2。

这必须有多高效?您可以尝试的另一种选择是二分搜索。将间隔值按排序顺序存储在列表中,并对其进行二分搜索。例如:

sorted_list = [ (0.1, function1), (0.2, function2), ....(0.6, function6) ] 

然后您只需进行二分查找即可找到大于 x 的最小元素。这将产生 O(log(n))。

You don't necessarily need to hash an entire range of values. For example, in the scale given above, if you were given 0.15, you can round it up to the 0.2 (the first digit after decimal point), and hash 0.2 instead.

How efficient does this have to be? An alternative you can try is a binary search. Have the interval values stores in sorted order in a list, and do binary search on it. For example:

sorted_list = [ (0.1, function1), (0.2, function2), ....(0.6, function6) ] 

Then you simply do binary search to find the smallest element that is bigger than x. This will yield O(log(n)).

萧瑟寒风 2025-01-05 08:24:02

如果您的间隔是规则的,您可以缩放操作数,然后将操作数下限到每个范围的最小值,然后将该结果直接传递到dict中,将这些下限映射到适当的值处理程序。

使用您提供的范围的示例实现。

# Integerize our 0.1 width intervals; scale by x10
handlerDict = {}
handlerDict[0] = lambda x: ... # 0.1
handlerDict[1] = lambda x: ... # 0.2
handlerDict[2] = lambda x: ... # 0.3
...

# Get the right handler, scaling x by x10; handle
handlerDict[int(10*x)](x, ...)

If your intervals are regular, you can scale and then floor your operands to the minimum of each range, then pass that result directly into a dict mapping those lower bounds to the appropriate handler.

An example implementation, using the ranges you provided.

# Integerize our 0.1 width intervals; scale by x10
handlerDict = {}
handlerDict[0] = lambda x: ... # 0.1
handlerDict[1] = lambda x: ... # 0.2
handlerDict[2] = lambda x: ... # 0.3
...

# Get the right handler, scaling x by x10; handle
handlerDict[int(10*x)](x, ...)
叶落知秋 2025-01-05 08:24:02

为了提高运行时间,您可以实现二分搜索。

否则,您可以将间隔阈值放在特里树上。

编辑:
让我提出一个实施方案:

class IntervalHash():
    def __init__(self,SortedList):
        #check it's sorted 
        self.MyList = []
        self.MyList.extend(SortedList) 
        self.lenlist = len(self.MyList)
    def get_interval(self,a):
        mylen = self.lenlist 
        mypos = 0
        while mylen > 1:
            mylen = (mylen/2 + mylen % 2)
            if mypos + mylen > self.lenlist - 1:
                if self.MyList[self.lenlist - 1] < a:
                    mypos = self.lenlist - 1
                break
            if self.MyList[mypos + mylen] < a:
                mypos += mylen
        if mypos == 0:
            if self.MyList[0] > a: 
                return ("-infty",self.MyList[0],0)
        if mypos == self.lenlist - 1:
            return (self.MyList[mypos],"infty",0)
        return (self.MyList[mypos],self.MyList[mypos+1],0)

A = [0.32,0.70,1.13]
MyHasher = IntervalHash(A)
print "Intervals are:",A
print 0.9 ," is in ",MyHasher.get_interval(0.9)
print 0.1 ," is in ",MyHasher.get_interval(0.1)
print 1.8 ," is in ",MyHasher.get_interval(1.8)

欢迎进一步编辑和改进!
trie 方法涉及更多,在我看来它更适合低级语言。

In order to improve the runtime, you could implement a bisection search.

Otherwise you can put the interval-thresholds on a trie.

EDIT:
let me propose an implementation:

class IntervalHash():
    def __init__(self,SortedList):
        #check it's sorted 
        self.MyList = []
        self.MyList.extend(SortedList) 
        self.lenlist = len(self.MyList)
    def get_interval(self,a):
        mylen = self.lenlist 
        mypos = 0
        while mylen > 1:
            mylen = (mylen/2 + mylen % 2)
            if mypos + mylen > self.lenlist - 1:
                if self.MyList[self.lenlist - 1] < a:
                    mypos = self.lenlist - 1
                break
            if self.MyList[mypos + mylen] < a:
                mypos += mylen
        if mypos == 0:
            if self.MyList[0] > a: 
                return ("-infty",self.MyList[0],0)
        if mypos == self.lenlist - 1:
            return (self.MyList[mypos],"infty",0)
        return (self.MyList[mypos],self.MyList[mypos+1],0)

A = [0.32,0.70,1.13]
MyHasher = IntervalHash(A)
print "Intervals are:",A
print 0.9 ," is in ",MyHasher.get_interval(0.9)
print 0.1 ," is in ",MyHasher.get_interval(0.1)
print 1.8 ," is in ",MyHasher.get_interval(1.8)

further edits and improvements are welcome!
The trie approach is much more involved, in my opinion it would be more appropriate for low level languages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文