以数学方式查找数字子字符串,无需字符串比较
这原本是我在工作中遇到的问题,但现在我只是出于自己的好奇心而试图解决这个问题。
我想以最有效的方式找出 int 'a' 是否包含 int 'b' 。 我写了一些代码,但似乎无论我写什么,将其解析为字符串然后使用indexOf比用数学方法快两倍。
内存不是问题(在合理范围内),只是纯粹的处理速度。
这是我编写的数学方法代码:
private static int[] exponents = {10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000 };
private static boolean findMatch(int a, int b) {
if (b > a) return false;
if (a == b) return true;
int needleLength = getLength(b);
int exponent = exponents[needleLength];
int subNum;
while (a >= 1) {
subNum = a % exponent;
if (subNum == b)
return true;
a /= 10;
}
return false;
}
private static int getLength(int b) {
int len = 0;
while (b >= 1) {
len++;
b /= 10;
}
return len;
}
这是我正在使用的字符串方法,它似乎胜过了上面的数学方法:
private static boolean findStringMatch(int a, int b) {
return String.valueOf(a).indexOf(String.valueOf(b)) != -1;
}
因此,虽然这并不是我完成工作所必需的,但我只是想知道是否任何人都可以想出任何方法来进一步优化我的数学方法,或者完全是一种全新的方法。 再说一遍,内存不是问题,我只是追求纯粹的速度。
我真的很想看到或听到任何人对此提供的任何信息。
编辑:当我说包含时,我的意思是可以在任何地方,例如, findMatch(1234, 23) == true
编辑: 对于每个说这个废话不可读并且不必要:你没有抓住重点。 重点是要研究一个有趣的问题,而不是想出一个在生产代码中使用的答案。
This originally was a problem I ran into at work, but is now something I'm just trying to solve for my own curiosity.
I want to find out if int 'a' contains the int 'b' in the most efficient way possible. I wrote some code, but it seems no matter what I write, parsing it into a string and then using indexOf is twice as fast as doing it mathematically.
Memory is not an issue (within reason), just sheer processing speed.
This is the code I have written to do it mathematically:
private static int[] exponents = {10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000 };
private static boolean findMatch(int a, int b) {
if (b > a) return false;
if (a == b) return true;
int needleLength = getLength(b);
int exponent = exponents[needleLength];
int subNum;
while (a >= 1) {
subNum = a % exponent;
if (subNum == b)
return true;
a /= 10;
}
return false;
}
private static int getLength(int b) {
int len = 0;
while (b >= 1) {
len++;
b /= 10;
}
return len;
}
Here's the string method I'm using, which seems to trump the mathematical method above:
private static boolean findStringMatch(int a, int b) {
return String.valueOf(a).indexOf(String.valueOf(b)) != -1;
}
So although this isn't really required for me to complete my work, I was just wondering if anyone could think of any way to further optimize my way of doing it mathematically, or an entirely new approach altogether. Again memory is no problem, I am just shooting for sheer speed.
I'm really interested to see or hear anything anyone has to offer on this.
EDIT: When I say contains I mean can be anywhere, so for example, findMatch(1234, 23) == true
EDIT: For everyone saying that this crap is unreadable and unnecessary: you're missing the point. The point was to get to geek out on an interesting problem, not come up with an answer to be used in production code.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
它应该更快的字符串方式,因为你的问题是文本的,而不是数学的。 请注意,您的“包含”关系没有说明数字,它只说明了它们的十进制表示形式。
还要注意,您想要编写的函数将是不可读的 - 其他开发人员永远不会理解您在做什么。 (看看你在这里遇到了什么麻烦。)另一方面,字符串版本非常清晰。
It should be faster string way, because your problem is textual, not mathematical. Notice that the your "contains" relationship says nothing about the numbers, it only says something about their decimal representations.
Notice also that the function you want to write will be unreadable - another developer will never understand what you are doing. (See what trouble you had with that here.) The string version, on the other hand, is perfectly clear.
这是沿着 Kibbee 的路线,但在他发帖之前我对此有点感兴趣并解决了这个问题:
由于 300 个字符太少而无法进行争论,我正在编辑这篇主要帖子来回应 Pyrolistical。
与 OP 不同,我对本机编译的 indexOf 比带有原语的 Java 代码更快并不感到惊讶。 所以我的目标不是找到比 Java 代码中调用无数次的本机方法更快的东西。
OP 明确表示这不是生产问题,更多的是出于闲置的好奇心,所以我的回答解决了这种好奇心。 我的猜测是,当他试图在生产中解决这个问题时,速度是一个问题,但出于一种闲置的好奇心,“这种方法将被调用数百万次”不再适用。 正如他必须向一位发帖人解释的那样,它不再被视为生产代码,因此复杂性不再重要。
另外,它提供了页面上唯一能够在“551241238”中找到“123”的实现,因此除非正确性是一个无关紧要的问题,否则它会提供这一点。 此外,“使用 Java 原语以数学方式解决问题但击败优化的本机代码的算法”的解决方案空间可能是空。
另外,从您的评论中不清楚您是否将苹果与苹果进行比较。 功能规范为 f( int, int )-> 布尔值,而不是 f( String, String )-> boolean (这是
indexOf
的域)。 因此,除非您测试了类似的东西(它仍然可以击败我的,我不会感到非常惊讶。)额外的开销可能会消耗掉多余的 40% 的一部分。它执行相同的基本步骤。 log10( a ) 编码 + log10( b ) 编码 + 实际找到匹配项,这也是 O(n) 其中 < em>n 是最大对数。
This is along Kibbee's line, but I got a little intrigued by this before he posted and worked this out:
Since 300 characters is far too little to make an argument in, I'm editing this main post to respond to Pyrolistical.
Unlike the OP, I wasn't that surprised that a native compiled indexOf was faster than Java code with primitives. So my goal was not to find something I thought was faster than a native method called zillions of times all over Java code.
The OP made it clear that this was not a production problem and more along the lines of an idle curiosity, so my answer solves that curiosity. My guess was that speed was an issue, when he was trying to solve it in production, but as an idle curiosity, "This method will be called millions and millions of times" no longer applies. As he had to explain to one poster, it's no longer pursued as production code, so the complexity no longer matters.
Plus it provides the only implementation on the page that manages to find the "123" in "551241238", so unless correctness is an extraneous concern, it provides that. Also the solution space of "an algorithm that solves the problem mathematically using Java primitives but beats optimized native code" might be EMPTY.
Plus, it's not clear from your comment whether or not you compared apples to apples. The functional spec is f( int, int )-> boolean, not f( String, String )-> boolean (which is kind of the domain of
indexOf
) . So unless you tested something like this (which could still beat mine, and I wouldn't be awfully surprised.) the additional overhead might eat up some of that excess 40%.It does the same basic steps. log10( a ) encoding + log10( b ) encoding + actually finding the match, which is as well O(n) where n is the largest logarithm.
我能想到的唯一优化是自己进行字符串转换,并在转换时比较数字(从右到左)。 首先转换b的所有数字,然后从a的右侧开始转换,直到找到b的第一个数字(从右侧)匹配。 进行比较,直到所有 b 都匹配或者出现不匹配。 如果遇到不匹配,请回溯到开始匹配 b 的第一个数字的点,前进到 a 并重新开始。
IndexOf 必须执行基本相同的回溯算法,除了从左侧开始。 根据实际数字,这可能会更快。 我认为如果数字是随机的,那应该是随机的,因为应该有很多时候不需要转换所有的 a。
The only optimization that I can think of is to do the conversion to string on your own and compare digits (right to left) as you do the conversion. First convert all the digits of b, then convert from the right on a until you find a match on the first digit of b (from right). Compare until all of b matches or you hit a mismatch. If you hit a mismatch, backtrack to the point where you starting matching the first digit of b, advance in a and start over.
IndexOf will have to do basically the same back tracking algorithm, except from the left. Depending on the actual numbers this may be faster. I think if the numbers are random, it should be since there should be many times when it doesn't have to convert all of a.
看起来你的函数实际上做得很好,但有一个小小的改进:
仅仅因为一旦 a 小于 b,就不值得继续寻找,不是吗?
祝你好运,如果你找到解决方案,请发布!
Looks like your function is actually doing pretty well, but an small improvement:
Just because once that a is smaller than b, is not worthy keeps looking, isnt it?
Good luck and post if you find the solution!
这是一个有趣的问题。 String.class 的许多函数实际上是本机的,这使得击败 String 成为一个困难的命题。 但这里有一些帮助:
提示 1:不同的简单整数运算具有不同的速度。
通过示例程序中的快速计算表明:
因此,您希望使用尽可能少的除法,以支持乘法或取模。 未显示减法、加法和比较运算符,因为它们将所有这些都从水中剔除。 另外,尽可能使用“final”可以让 JVM 进行某些优化。 加快“getLength”函数的速度:
该函数大约提高了 7 倍。 如果 b > ,您会得到一个 indexOutOfBounds 异常。 你的最大指数。 为了解决这个问题,你可以这样做:
这会稍微慢一些,并且如果 b 太大的话,会给你一个不正确的长度,但它不会抛出异常。
提示 2: 不必要的对象/基元创建和方法调用会增加运行时间。
我猜测“getLength”在其他任何地方都没有被调用,所以虽然拥有一个单独的函数可能会很好,但从优化的角度来看,它是不必要的方法调用和对象“len”的创建。 我们可以将该代码放在我们使用它的地方。
另外,请注意我更改了底部 while 循环以包含“a <= b”。 我还没有对此进行测试,并且不确定每次迭代的惩罚是否胜过您不浪费任何迭代的事实。 我确信有一种方法可以使用巧妙的数学来消除除法,但我现在想不到。
This is an interesting problem. Many of String.class's functions are actually native making beating String a difficult proposition. But here's some helpers:
TIP 1: Different simple integer operations have different speeds.
By quick calculations in sample programs showed:
So you want to use as little division as possible in favor of multiplication or modulo. Not shown are subtraction, addition, and comparison operators cause they blow all of these out of the water. Also, using "final" as much as possible allows the JVM to do certain optimizations. Speeding up you "getLength" function:
That gives about a 7x improvement in the function. You get an indexOutOfBounds exception if b > your max in exponents. To solve that, you can have:
That's slightly slower and gives you an incorrect length if b is too big, but it doesn't throw an exception.
TIP 2: Unnecessary object/primitive creation and method calls add to run time.
I'm guessing that "getLength" isn't called anywhere else, so while it might be nice to have a separate function, from a optimization standpoint its an unnecessary method call and creation of the object "len". We can put that code right where we use it.
Also, note I changed the bottom while loop to also include "a <= b". I haven't tested that and not sure if the per-iteration penalty beats the fact that you don't waste any iterations. I'm sure there's a way to get rid of the division using clever math, but I can't think of it right now.
嗯,我可能完全误解了这个问题,但是......
除非你想知道一个特定的数字序列是否在另一个数字序列中。
在这种情况下,将其转换为字符串会比进行数学计算更快。
Umm, I'm probably totally misunderstanding the question, but.....
Unless you want to know if a particular sequence of numbers is within another sequence of numbers.
In that case, converting it to a string WILL be faster than doing the math to figure it out.
无论如何,这无论如何都不能回答您的问题,但无论如何,这都是建议:-)
方法名称
findMatch
的描述性不是很强。 在本例中,我有一个静态方法ContainerBuilder.number(int)
,它返回一个ContainerBuilder
,其中包含方法contains
它。 这样你的代码就变成了:Juts some suggest for the long run!
哦,是的,我还想说,你应该定义“包含”的含义
This in no way answers your question, whatsoever, but it's advice anyway :-)
The method name
findMatch
is not very descriptive. In this case, I'd have a static methodContainerBuilder.number(int)
, which returned aContainerBuilder
, which has the methodcontains
on it. In this way your code becomes:Juts some advice for the long run!
Oh yes, I meant to say also, you should define what you mean by "contains"
有什么办法可以用二进制计算这个吗? 显然,包含另一个字符的二进制整数的整数的二进制值并不意味着十进制也有同样的作用。 然而,是否有某种可以使用的二进制技巧? 也许将像 12345 这样的数字转换为 0001 0010 0011 0100 0101,然后进行一些位移以确定其中是否包含 23 (0010 0011)。 由于您的字符集只有 10 个字符,因此您可以通过在单个字节中存储 2 个字符值来减少计算时间。
编辑
稍微扩展一下这个想法。 如果你有 2 个整数 A 和 B,并且想知道 A 是否包含 B,你首先检查 2 件事。 如果A小于B,则A不能包含B。如果A = B,则A包含B。此时您可以将它们转换为字符串*。 如果A包含与B相同数量的字符数,那么A不包含B,除非它们相等,但如果它们相等,我们就不会在这里,所以如果两个字符串长度相同,a不包含b 。 此时,A 的长度将比 B 长。因此,现在您可以将字符串转换为其打包的二进制值,正如我在本文第一部分中指出的那样。 将这些值存储在整数数组中。 现在,您对数组中的整数值进行按位与操作,如果结果是 A,则 A 包含 B。现在,您将 B 的整数数组向左移动 4 位,然后再次进行比较。 执行此操作,直到您开始从 B 左侧弹出位。
*上一段中的 * 意味着您可以跳过此步骤。 可能有一种方法可以完全不使用字符串来做到这一点。 您可能可以使用一些奇特的二进制技巧来获取我在第一段中讨论的打包二进制表示。 应该有一些您可以使用的二进制技巧,或者一些快速数学,可以将整数转换为我之前讨论过的十进制值。
Is there any way to calculate this in binary? Obviously the binary value of an integer containing the binary integer of another character doesn't mean that the decical does the same. However, is there some kind of binary trickary that could be used? Maybe convert a numer like 12345 to 0001 0010 0011 0100 0101, and then do some bit shifting to figure out if 23 (0010 0011) is contained in there. Because your character set is only 10 characters, you could cut down the computation time by store 2 characters values in a single byte.
EDIT
Expanding on this idea a bit. if you have 2 integers, A and B, and want to know if A contains B, you check 2 things first. if A is less than B, then A cannot contain B. If A = B then A contains B. At this point you can convert them to strings*. If A contains the same number of character numbers as B, then A does not contain B, unless they are equal, but we wouldn't be here if they are equal, so if both strings are the same length, a does not contain b. At this point, the length of A will be longer than B. So, now you can convert the strings to their packed binary values as I noted in the first part of this post. Store these values in an array of integers. Now you do a bitwise AND Of the integer values in your array, and if the result is A, then A contains B. Now you shift the array of integers for B, to the left 4 bits, and do the conparison again. Do this until you start popping bits off the left of B.
*That * in the previous paragraph means you may be able to skip this step. There may be a way to do this without using strings at all. There might be some fancy binary trick you can do to get the packed binary representation I discussed in the first paragraph. There should be some binary trick you can use, or some quick math which will convert an integer to the decimal value I discussed before.
请问您在代码中的什么地方使用了这个函数? 也许有另一种方法可以解决当前正在解决的问题,而且速度会快得多。 这可能就像我的朋友要求我完全重新调音他的吉他一样,我在意识到我可以将底弦降低一个音阶并获得相同的结果之前就这样做了。
Can I ask where you're using this function in your code? Maybe there's another way to solve the problem it is currently solving which would be much faster. This could be like when my friend asked me to completely re-tune his guitar, and I did it before realizing I could have just lowered the bottom string by a whole step and gotten an equivalent result.
仅供参考
http://refactormycode.com/
可以为你工作。
FYI
http://refactormycode.com/
Could work for you.