Python，一种更智能的字符串到整数转换方式

发布于 2024-08-26 18:25:39 字数 436 浏览 8 评论 0原文

我编写了这段代码，将“0(532) 222 22 22”格式的字符串转换为整数，例如 05322222222 。

class Phone():
    def __init__(self,input):
        self.phone = input
    def __str__(self):
        return self.phone
    #convert to integer.
    def to_int(self):
        return int((self.phone).replace(" ","").replace("(","").replace(")",""))

test = Phone("0(532) 222 22 22")
print test.to_int()

用3种替换方法来解决这个问题感觉很笨拙。我很好奇是否有更好的解决方案？

原文

I have written this code to convert string in such format "0(532) 222 22 22" to integer such as 05322222222 .

class Phone():
    def __init__(self,input):
        self.phone = input
    def __str__(self):
        return self.phone
    #convert to integer.
    def to_int(self):
        return int((self.phone).replace(" ","").replace("(","").replace(")",""))

test = Phone("0(532) 222 22 22")
print test.to_int()

It feels very clumsy to use 3 replace methods to solve this. I am curious if there is a better solution?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不语却知心 2024-09-02 18:25:39

p = "0(532) 222 22 22"
print ''.join([x for x in p if x.isdigit()])

请注意，如果您想将其转换为 int （就像标题中建议的那样），您将“丢失”前导零。如果您想这样做，只需将上面的代码包装在 int() 调用中即可。不过，电话号码作为字符串确实更有意义（在我看来）。

p = "0(532) 222 22 22"
print ''.join([x for x in p if x.isdigit()])

Note that you'll "lose" the leading zero if you want to convert it to int (like you suggested in the title). If you want to do that, just wrap the above in a int() call. A telephone number does make more sense as a string though (in my opinion).

回复收藏 0 原文

亢潮 2024-09-02 18:25:39

在 Python 2.6 或 2.7 中，
(self.phone).translate(None,' ()')
将从手机中删除所有空格或 ( 或 )细绳。有关详细信息，请参阅 str.translate 上的 Python 2.6 文档。

在 Python 3.x 中，str.translate() 采用映射（而不是如上所示的两个字符串）。因此，相应的代码片段如下所示，使用 str.maketrans() 生成映射。
'(self.phone).translate(str.maketrans('','', '()-/ '))
有关详细信息，请参阅关于 str.translate 的 Python 3.1 文档。

回复收藏 0 原文

两人的回忆 2024-09-02 18:25:39

只使用正则表达式怎么样？

示例：

>>> import re
>>> num = '0(532) 222 22 22'
>>> re.sub('[\D]', '', num) # Match all non-digits ([\D]), replace them with empty string, where found in the `num` variable.
'05322222222'

ChristopheD 提出的建议效果很好，但效率不高。

以下是使用 dis 模块演示这一点的测试程序（请参阅 Doug Hellman 的 PyMOTW 关于模块此处了解更多详细信息）。

TEST_PHONE_NUM = '0(532) 222 22 22'

def replace_method():
    print (TEST_PHONE_NUM).replace(" ","").replace("(","").replace(")","")

def list_comp_is_digit_method():
    print ''.join([x for x in TEST_PHONE_NUM if x.isdigit()])

def translate_method():
    print (TEST_PHONE_NUM).translate(None,' ()')

import re
def regex_method():
    print re.sub('[\D]', '', TEST_PHONE_NUM)

if __name__ == '__main__':
    from dis import dis

    print 'replace_method:'
    dis(replace_method)
    print
    print

    print 'list_comp_is_digit_method:'
    dis(list_comp_is_digit_method)

    print
    print

    print 'translate_method:'
    dis(translate_method)

    print
    print
    print "regex_method:"
    dis(phone_digit_strip_regex)
    print

输出：

replace_method:
  5       0 LOAD_GLOBAL              0 (TEST_PHONE_NUM)
          3 LOAD_ATTR                1 (replace)
          6 LOAD_CONST               1 (' ')
          9 LOAD_CONST               2 ('')
         12 CALL_FUNCTION            2
         15 LOAD_ATTR                1 (replace)
         18 LOAD_CONST               3 ('(')
         21 LOAD_CONST               2 ('')
         24 CALL_FUNCTION            2
         27 LOAD_ATTR                1 (replace)
         30 LOAD_CONST               4 (')')
         33 LOAD_CONST               2 ('')
         36 CALL_FUNCTION            2
         39 PRINT_ITEM          
         40 PRINT_NEWLINE       
         41 LOAD_CONST               0 (None)
         44 RETURN_VALUE   

phone_digit_strip_list_comp:
  3           0 LOAD_CONST               1 ('0(532) 222 22 22')
              3 STORE_FAST               0 (phone)

  4           6 LOAD_CONST               2 ('')
              9 LOAD_ATTR                0 (join)
             12 BUILD_LIST               0
             15 DUP_TOP             
             16 STORE_FAST               1 (_[1])
             19 LOAD_GLOBAL              1 (test_phone_num)
             22 GET_ITER            
             23 FOR_ITER                30 (to 56)
             26 STORE_FAST               2 (x)
             29 LOAD_FAST                2 (x)
             32 LOAD_ATTR                2 (isdigit)
             35 CALL_FUNCTION            0
             38 JUMP_IF_FALSE           11 (to 52)
             41 POP_TOP             
             42 LOAD_FAST                1 (_[1])
             45 LOAD_FAST                2 (x)
             48 LIST_APPEND         
             49 JUMP_ABSOLUTE           23
             52 POP_TOP             
             53 JUMP_ABSOLUTE           23
             56 DELETE_FAST              1 (_[1])
             59 CALL_FUNCTION            1
             62 PRINT_ITEM          
             63 PRINT_NEWLINE       
             64 LOAD_CONST               0 (None)
             67 RETURN_VALUE   

translate_method:
  11           0 LOAD_GLOBAL              0 (TEST_PHONE_NUM)
               3 LOAD_ATTR                1 (translate)
               6 LOAD_CONST               0 (None)
               9 LOAD_CONST               1 (' ()')
              12 CALL_FUNCTION            2
              15 PRINT_ITEM          
              16 PRINT_NEWLINE       
              17 LOAD_CONST               0 (None)
              20 RETURN_VALUE      

phone_digit_strip_regex:
  8       0 LOAD_CONST               1 ('0(532) 222 22 22')
          3 STORE_FAST               0 (phone)

  9       6 LOAD_GLOBAL              0 (re)
          9 LOAD_ATTR                1 (sub)
         12 LOAD_CONST               2 ('[\\D]')
         15 LOAD_CONST               3 ('')
         18 LOAD_GLOBAL              2 (test_phone_num)
         21 CALL_FUNCTION            3
         24 PRINT_ITEM          
         25 PRINT_NEWLINE       
         26 LOAD_CONST               0 (None)
         29 RETURN_VALUE

translate 方法将是最有效的，尽管依赖于 py2.6+。正则表达式的效率稍低，但兼容性更高（我认为这是您的要求）。原始替换方法将为每次替换添加 6 条额外指令，而所有其他指令将保持不变。

另外，将您的电话号码存储为字符串以处理前导零，并在需要时使用电话格式化程序。相信我，它以前咬过我。

How about just using regular expressions?

Example:

>>> import re
>>> num = '0(532) 222 22 22'
>>> re.sub('[\D]', '', num) # Match all non-digits ([\D]), replace them with empty string, where found in the `num` variable.
'05322222222'

The suggestion made by ChristopheD will work just fine, but is not as efficient.

The following is a test program to demonstrate this using the dis module (See Doug Hellman's PyMOTW on the module here for more detailed info).

TEST_PHONE_NUM = '0(532) 222 22 22'

def replace_method():
    print (TEST_PHONE_NUM).replace(" ","").replace("(","").replace(")","")

def list_comp_is_digit_method():
    print ''.join([x for x in TEST_PHONE_NUM if x.isdigit()])

def translate_method():
    print (TEST_PHONE_NUM).translate(None,' ()')

import re
def regex_method():
    print re.sub('[\D]', '', TEST_PHONE_NUM)

if __name__ == '__main__':
    from dis import dis

    print 'replace_method:'
    dis(replace_method)
    print
    print

    print 'list_comp_is_digit_method:'
    dis(list_comp_is_digit_method)

    print
    print

    print 'translate_method:'
    dis(translate_method)

    print
    print
    print "regex_method:"
    dis(phone_digit_strip_regex)
    print

Output:

replace_method:
  5       0 LOAD_GLOBAL              0 (TEST_PHONE_NUM)
          3 LOAD_ATTR                1 (replace)
          6 LOAD_CONST               1 (' ')
          9 LOAD_CONST               2 ('')
         12 CALL_FUNCTION            2
         15 LOAD_ATTR                1 (replace)
         18 LOAD_CONST               3 ('(')
         21 LOAD_CONST               2 ('')
         24 CALL_FUNCTION            2
         27 LOAD_ATTR                1 (replace)
         30 LOAD_CONST               4 (')')
         33 LOAD_CONST               2 ('')
         36 CALL_FUNCTION            2
         39 PRINT_ITEM          
         40 PRINT_NEWLINE       
         41 LOAD_CONST               0 (None)
         44 RETURN_VALUE   

phone_digit_strip_list_comp:
  3           0 LOAD_CONST               1 ('0(532) 222 22 22')
              3 STORE_FAST               0 (phone)

  4           6 LOAD_CONST               2 ('')
              9 LOAD_ATTR                0 (join)
             12 BUILD_LIST               0
             15 DUP_TOP             
             16 STORE_FAST               1 (_[1])
             19 LOAD_GLOBAL              1 (test_phone_num)
             22 GET_ITER            
             23 FOR_ITER                30 (to 56)
             26 STORE_FAST               2 (x)
             29 LOAD_FAST                2 (x)
             32 LOAD_ATTR                2 (isdigit)
             35 CALL_FUNCTION            0
             38 JUMP_IF_FALSE           11 (to 52)
             41 POP_TOP             
             42 LOAD_FAST                1 (_[1])
             45 LOAD_FAST                2 (x)
             48 LIST_APPEND         
             49 JUMP_ABSOLUTE           23
             52 POP_TOP             
             53 JUMP_ABSOLUTE           23
             56 DELETE_FAST              1 (_[1])
             59 CALL_FUNCTION            1
             62 PRINT_ITEM          
             63 PRINT_NEWLINE       
             64 LOAD_CONST               0 (None)
             67 RETURN_VALUE   

translate_method:
  11           0 LOAD_GLOBAL              0 (TEST_PHONE_NUM)
               3 LOAD_ATTR                1 (translate)
               6 LOAD_CONST               0 (None)
               9 LOAD_CONST               1 (' ()')
              12 CALL_FUNCTION            2
              15 PRINT_ITEM          
              16 PRINT_NEWLINE       
              17 LOAD_CONST               0 (None)
              20 RETURN_VALUE      

phone_digit_strip_regex:
  8       0 LOAD_CONST               1 ('0(532) 222 22 22')
          3 STORE_FAST               0 (phone)

  9       6 LOAD_GLOBAL              0 (re)
          9 LOAD_ATTR                1 (sub)
         12 LOAD_CONST               2 ('[\\D]')
         15 LOAD_CONST               3 ('')
         18 LOAD_GLOBAL              2 (test_phone_num)
         21 CALL_FUNCTION            3
         24 PRINT_ITEM          
         25 PRINT_NEWLINE       
         26 LOAD_CONST               0 (None)
         29 RETURN_VALUE

The translate method will be the most efficient, though relies on py2.6+. regex is slightly less efficient, but more compatible (which I see a requirement for you). The original replace method will add 6 additional instructions per replacement, while all of the others will stay constant.

On a side note, store your phone numbers as strings to deal with leading zeros, and use a phone formatter where needed. Trust me, it's bitten me before.

回复收藏 0 原文

花开柳相依 2024-09-02 18:25:39

SilentGhost：dis.dis 确实展示了底层概念/执行的复杂性。毕竟，OP 抱怨原来的替换链太“笨拙”，而不是太“慢”。

我建议不要在非不可避免的情况下使用正则表达式；它们只是增加了概念开销和速度损失。恕我直言，在这里使用 translate() 是一个错误的工具，而且没有任何地方像原始替换链那样在概念上简单且通用。

所以你说 tamaytoes，我说 tomahtoes：原始解决方案在清晰度和通用性方面相当好。它一点也不笨拙。为了使它更密集和更参数化，可以考虑

phone_nr_translations = [ 
    ( ' ', '', ), 
    ( '(', '', ), 
    ( ')', '', ), ]

def sanitize_phone_nr( phone_nr ):
  R = phone_nr
  for probe, replacement in phone_nr_translations:
    R = R.replace( probe, replacement )
  return R

在这个特殊的应用程序中将其更改为，当然，您真正想做的只是取消任何不需要的字符，因此您可以简化这一点：

probes = ' ()'

def sanitize_phone_nr( phone_nr ):
  R = phone_nr
  for probe in probes:
    R = R.replace( probe, '' )
  return R

想一想，我不太清楚为什么要将电话号码转换为整数——这只是错误的数据类型。这可以通过以下事实来证明：至少在移动网络中，+ 和 # 以及也许更多是拨号字符串中的有效字符（拨号、字符串 - 看到了吗？）。

但除此之外，清理用户输入的电话号码以获得规范化且安全的表示是一个非常非常合理的担忧 - 只是我觉得您的方法太具体了。为什么不将清理方法重写为非常通用的方法而不变得更加复杂呢？毕竟，您如何确保您的用户不会在该 Web 表单字段中输入其他异常字符？

所以你想要的实际上不是禁止-允许特定字符（unicode 5.1中有大约十万个定义的代码点，那么如何赶上这些？），而是允许< /em> 那些在拨号字符串中被视为合法的字符。您可以使用正则表达式...

from re import compile as _new_regex
illegal_phone_nr_chrs_re = _new_regex( r"[^0-9#+]" )

def sanitize_phone_nr( phone_nr ):
  return illegal_phone_nr_chrs_re.sub( '', phone_nr )

...或使用集合来做到这一点：

legal_phone_nr_chrs = set( '0123456789#+' )

def sanitize_phone_nr( phone_nr ):
    return ''.join( 
        chr for chr in phone_nr 
            if chr in legal_phone_nr_chrs )

最后一节很可能写在一行上。此解决方案的缺点是您需要从 Python 内部迭代输入字符，而不是利用 str.replace() 甚至正则表达式提供的可能加速的 C 遍历。然而，在任何情况下，性能都取决于预期的使用模式（我确信您首先会截断您的手机 NRS，对吧？所以这些将是许多要处理的小字符串，而不是几个大字符串）。

这里请注意几点：我力求清晰，这就是为什么我尽量避免过度使用缩写。 chr 表示字符，nr 表示数字，R 表示返回值（更多可能是，呃，标准库中使用的retval）在我的样式书中。编程的目的是让事情被理解并完成，而不是程序员编写接近 gzip 空间效率的代码。现在看，最后一个解决方案做了相当多的OP设法完成的事情（甚至更多），在...

legal_phone_nr_chrs = set( '0123456789#+' )
def sanitize_phone_nr( phone_nr ): return ''.join( chr for chr in phone_nr if chr in legal_phone_nr_chrs )

...两行代码（如果需要的话），而OP的代码...

class Phone():
    def __init__  ( self, input ): self.phone = self._sanitize( input )
    def __str__   ( self        ): return self.phone
    def _sanitize ( self, input ): return input.replace( ' ', '' ).replace( '(', '' ).replace( ')', '' )

...很难被压缩四行以下。看看严格的 OOP 解决方案给您带来了什么额外负担？我相信大多数时候它都可以被排除在外。

SilentGhost: dis.dis does demonstrate underlying conceptual / executional complexity. after all, the OP complained about the original replacement chain being too ‘clumsy’, not too ‘slow’.

i recommend against using regular expressions where not inevitable; they just add conceptual overhead and a speed penalty otherwise. to use translate() here is IMHO just the wrong tool, and nowhere as conceptually simple and generic as the original replacement chain.

so you say tamaytoes, and i say tomahtoes: the original solution is quite good in terms of clarity and genericity. it is not clumsy at all. in order to make it a little denser and more parametrized, consider changing it to

phone_nr_translations = [ 
    ( ' ', '', ), 
    ( '(', '', ), 
    ( ')', '', ), ]

def sanitize_phone_nr( phone_nr ):
  R = phone_nr
  for probe, replacement in phone_nr_translations:
    R = R.replace( probe, replacement )
  return R

in this special application, of course, what you really want to do is just cancelling out any unwanted characters, so you can simplify this:

probes = ' ()'

def sanitize_phone_nr( phone_nr ):
  R = phone_nr
  for probe in probes:
    R = R.replace( probe, '' )
  return R

coming to think of it, it is not quite clear to me why you want to turn a phone nr into an integer—that is simply the wrong data type. this can be demonstrated by the fact that at least in mobile nets, + and # and maybe more are valid characters in a dial string (dial, string—see?).

but apart from that, sanitizing a user input phone nr to get out a normalized and safe representation is a very, very valid concern—only i feel that your methodology is too specific. why not re-write the sanitizing method to something very generic without becoming more complex? after all, how can you be sure your users never input other deviant characters in that web form field?

so what you want is really not to dis-allow specific characters (there are about a hundred thousand defined codepoints in unicode 5.1, so how do catch up with those?), but to allow those very characters that are deemed legal in dial strings. and you can do that with a regular expression...

from re import compile as _new_regex
illegal_phone_nr_chrs_re = _new_regex( r"[^0-9#+]" )

def sanitize_phone_nr( phone_nr ):
  return illegal_phone_nr_chrs_re.sub( '', phone_nr )

...or with a set:

legal_phone_nr_chrs = set( '0123456789#+' )

def sanitize_phone_nr( phone_nr ):
    return ''.join( 
        chr for chr in phone_nr 
            if chr in legal_phone_nr_chrs )

that last stanza could well be written on a single line. the disadvantage of this solution would be that you iterate over the input characters from within Python, not making use of the potentially speeder C traversal as offered by str.replace() or even a regular expression. however, performance would in any case be dependent on the expected usage pattern (i am sure you truncate your phone nrs first thing, right? so those would be many small strings to be processed, not few big ones).

notice a few points here: i strive for clarity, which is why i try to avoid over-using abbreviations. chr for character, nr for number and R for the return value (more likely to be, ugh, retval where used in the standard library) are in my style book. programming is about getting things understood and done, not about programmers writing code that approaches the spatial efficiency of gzip. now look, the last solution does fairly much what the OP managed to get done (and more), in...

legal_phone_nr_chrs = set( '0123456789#+' )
def sanitize_phone_nr( phone_nr ): return ''.join( chr for chr in phone_nr if chr in legal_phone_nr_chrs )

...two lines of code if need be, whereas the OP’s code...

class Phone():
    def __init__  ( self, input ): self.phone = self._sanitize( input )
    def __str__   ( self        ): return self.phone
    def _sanitize ( self, input ): return input.replace( ' ', '' ).replace( '(', '' ).replace( ')', '' )

...can hardly be compressed below four lines. see what additional baggage that strictly-OOP solution gives you? i believe it can be left out of the picture most of the time.

回复收藏 0 原文

~没有更多了~