Python,一种更智能的字符串到整数转换方式
我编写了这段代码,将“0(532) 222 22 22”格式的字符串转换为整数,例如 05322222222 。
class Phone():
def __init__(self,input):
self.phone = input
def __str__(self):
return self.phone
#convert to integer.
def to_int(self):
return int((self.phone).replace(" ","").replace("(","").replace(")",""))
test = Phone("0(532) 222 22 22")
print test.to_int()
用3种替换方法来解决这个问题感觉很笨拙。我很好奇是否有更好的解决方案?
I have written this code to convert string in such format "0(532) 222 22 22" to integer such as 05322222222 .
class Phone():
def __init__(self,input):
self.phone = input
def __str__(self):
return self.phone
#convert to integer.
def to_int(self):
return int((self.phone).replace(" ","").replace("(","").replace(")",""))
test = Phone("0(532) 222 22 22")
print test.to_int()
It feels very clumsy to use 3 replace methods to solve this. I am curious if there is a better solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
请注意,如果您想将其转换为 int (就像标题中建议的那样),您将“丢失”前导零。如果您想这样做,只需将上面的代码包装在
int()
调用中即可。不过,电话号码作为字符串确实更有意义(在我看来)。Note that you'll "lose" the leading zero if you want to convert it to int (like you suggested in the title). If you want to do that, just wrap the above in a
int()
call. A telephone number does make more sense as a string though (in my opinion).在 Python 2.6 或 2.7 中,
(self.phone).translate(None,' ()')
将从手机中删除所有空格或
(
或)
细绳。有关详细信息,请参阅 str.translate 上的 Python 2.6 文档。在 Python 3.x 中,str.translate() 采用映射(而不是如上所示的两个字符串)。因此,相应的代码片段如下所示,使用 str.maketrans() 生成映射。
'(self.phone).translate(str.maketrans('','', '()-/ '))
有关详细信息,请参阅 关于 str.translate 的 Python 3.1 文档。
In Python 2.6 or 2.7,
(self.phone).translate(None,' ()')
will remove any spaces or
(
or)
from the phone string. See Python 2.6 doc on str.translate for details.In Python 3.x, str.translate() takes a mapping (rather than two strings as shown above). The corresponding snippet therefore is something like the following, using str.maketrans() to produce the mapping.
'(self.phone).translate(str.maketrans('','', '()-/ '))
See Python 3.1 doc on str.translate for details.
只使用正则表达式怎么样?
示例:
ChristopheD 提出的建议效果很好,但效率不高。
以下是使用 dis 模块演示这一点的测试程序(请参阅 Doug Hellman 的 PyMOTW 关于模块 此处了解更多详细信息)。
输出:
translate 方法将是最有效的,尽管依赖于 py2.6+。正则表达式的效率稍低,但兼容性更高(我认为这是您的要求)。原始替换方法将为每次替换添加 6 条额外指令,而所有其他指令将保持不变。
另外,将您的电话号码存储为字符串以处理前导零,并在需要时使用电话格式化程序。相信我,它以前咬过我。
How about just using regular expressions?
Example:
The suggestion made by ChristopheD will work just fine, but is not as efficient.
The following is a test program to demonstrate this using the dis module (See Doug Hellman's PyMOTW on the module here for more detailed info).
Output:
The translate method will be the most efficient, though relies on py2.6+. regex is slightly less efficient, but more compatible (which I see a requirement for you). The original replace method will add 6 additional instructions per replacement, while all of the others will stay constant.
On a side note, store your phone numbers as strings to deal with leading zeros, and use a phone formatter where needed. Trust me, it's bitten me before.
SilentGhost:
dis.dis
确实展示了底层概念/执行的复杂性。毕竟,OP 抱怨原来的替换链太“笨拙”,而不是太“慢”。我建议不要在非不可避免的情况下使用正则表达式;它们只是增加了概念开销和速度损失。恕我直言,在这里使用
translate()
是一个错误的工具,而且没有任何地方像原始替换链那样在概念上简单且通用。所以你说 tamaytoes,我说 tomahtoes:原始解决方案在清晰度和通用性方面相当好。它一点也不笨拙。为了使它更密集和更参数化,可以考虑
在这个特殊的应用程序中将其更改为,当然,您真正想做的只是取消任何不需要的字符,因此您可以简化这一点:
想一想,我不太清楚为什么要将电话号码转换为整数——这只是错误的数据类型。这可以通过以下事实来证明:至少在移动网络中,
+
和#
以及也许更多是拨号字符串中的有效字符(拨号、字符串 - 看到了吗?)。但除此之外,清理用户输入的电话号码以获得规范化且安全的表示是一个非常非常合理的担忧 - 只是我觉得您的方法太具体了。为什么不将清理方法重写为非常通用的方法而不变得更加复杂呢?毕竟,您如何确保您的用户不会在该 Web 表单字段中输入其他异常字符?
所以你想要的实际上不是禁止-允许特定字符(unicode 5.1中有大约十万个定义的代码点,那么如何赶上这些?),而是允许< /em> 那些在拨号字符串中被视为合法的字符。您可以使用正则表达式...
...或使用集合来做到这一点:
最后一节很可能写在一行上。此解决方案的缺点是您需要从 Python 内部迭代输入字符,而不是利用
str.replace()
甚至正则表达式提供的可能加速的 C 遍历。然而,在任何情况下,性能都取决于预期的使用模式(我确信您首先会截断您的手机 NRS,对吧?所以这些将是许多要处理的小字符串,而不是几个大字符串)。这里请注意几点:我力求清晰,这就是为什么我尽量避免过度使用缩写。
chr
表示字符
,nr
表示数字
,R
表示返回值(更多可能是,呃,标准库中使用的retval
)在我的样式书中。编程的目的是让事情被理解并完成,而不是程序员编写接近 gzip 空间效率的代码。现在看,最后一个解决方案做了相当多的OP设法完成的事情(甚至更多),在......两行代码(如果需要的话),而OP的代码...
...很难被压缩四行以下。看看严格的 OOP 解决方案给您带来了什么额外负担?我相信大多数时候它都可以被排除在外。
SilentGhost:
dis.dis
does demonstrate underlying conceptual / executional complexity. after all, the OP complained about the original replacement chain being too ‘clumsy’, not too ‘slow’.i recommend against using regular expressions where not inevitable; they just add conceptual overhead and a speed penalty otherwise. to use
translate()
here is IMHO just the wrong tool, and nowhere as conceptually simple and generic as the original replacement chain.so you say tamaytoes, and i say tomahtoes: the original solution is quite good in terms of clarity and genericity. it is not clumsy at all. in order to make it a little denser and more parametrized, consider changing it to
in this special application, of course, what you really want to do is just cancelling out any unwanted characters, so you can simplify this:
coming to think of it, it is not quite clear to me why you want to turn a phone nr into an integer—that is simply the wrong data type. this can be demonstrated by the fact that at least in mobile nets,
+
and#
and maybe more are valid characters in a dial string (dial, string—see?).but apart from that, sanitizing a user input phone nr to get out a normalized and safe representation is a very, very valid concern—only i feel that your methodology is too specific. why not re-write the sanitizing method to something very generic without becoming more complex? after all, how can you be sure your users never input other deviant characters in that web form field?
so what you want is really not to dis-allow specific characters (there are about a hundred thousand defined codepoints in unicode 5.1, so how do catch up with those?), but to allow those very characters that are deemed legal in dial strings. and you can do that with a regular expression...
...or with a set:
that last stanza could well be written on a single line. the disadvantage of this solution would be that you iterate over the input characters from within Python, not making use of the potentially speeder C traversal as offered by
str.replace()
or even a regular expression. however, performance would in any case be dependent on the expected usage pattern (i am sure you truncate your phone nrs first thing, right? so those would be many small strings to be processed, not few big ones).notice a few points here: i strive for clarity, which is why i try to avoid over-using abbreviations.
chr
forcharacter
,nr
fornumber
andR
for the return value (more likely to be, ugh,retval
where used in the standard library) are in my style book. programming is about getting things understood and done, not about programmers writing code that approaches the spatial efficiency of gzip. now look, the last solution does fairly much what the OP managed to get done (and more), in......two lines of code if need be, whereas the OP’s code...
...can hardly be compressed below four lines. see what additional baggage that strictly-OOP solution gives you? i believe it can be left out of the picture most of the time.