花下猫语: 相对来说,Python 是一门编写难度较低的语言,然而,这并不意味着在基础的语法层面上,它就没有需要特别注意的用法。事实上,所有的语言系统都存在着一些相似却另有差别的内容,毕竟,它们需要描述的是非常精确的东西,而精确性就带来了差异性。
今天,英文分享栏目的内容是:如何区分字符串对象的 isdigit()、isnumeric() 和 isdecimal() 方法?
这篇文章的作者是一个培训师,写作风格是简明易懂,因此,整篇文章短小精悍,可以达到学习 Python 知识,以及锻炼英语阅读能力的目的,一举两得,希望你也能读有所获。
原题:Python’s str.isdigit vs. str.isnumeric
作者:Reuven Lerner
https://blog.lerner.co.il/pythons-str-isdigit-vs-str-isnumeric
Let’s say that I want to write some Python code that invites the user to enter a number, and then prints that number, tripled. We could say:
代码语言:javascript复制>>> n = input("Enter a number: ")
>>> print(f"{n} * 3 = {n*3}")
The good news is that this code works just fine. The bad news is that it probably doesn’t do what you might expect. If I run this program, I’ll see:
代码语言:javascript复制Enter a number: 5
5 * 3 = 555
The reason for this output is that the “input” function always returns a string. So sure, we asked the user for a number, but we got the string ‘5’, rather than the integer 5. The ‘555’ output is thanks to the fact that you can multiply strings in Python by integers, getting a longer string back. So ‘a’ * 5 will give us ‘aaaaa’.
Of course, we can always create an integer from a string by applying the “int” class to the user’s input:
代码语言:javascript复制>>> n = input("Enter a number: ")
>>> n = int(n)
>>> print(f"{n} * 3 = {n*3}")
Sure enough, we get the following output:
代码语言:javascript复制Enter a number: 5
5 * 3 = 15
Great, right? But what happens if the user gives us something that’s no longer numeric? The program will blow up:
代码语言:javascript复制Enter a number: abcd
ValueError: invalid literal for int() with base 10: 'abcd'
Clearly, we want to avoid this problem. You could make a good argument that in this case, it’s probably best to run the conversion inside of a “try” block, and trap any exception that we might get.
But there’s another way to test this, one which I use with my into Python classes, before we’ve covered exceptions: Strings have a great method called “isdigit” that we can run, to tell us whether a string contains only digits (0-9), or if it contains something else. For example:
代码语言:javascript复制>>> '1234'.isdigit()
True
>>> '1234 '.isdigit() # space at the end
False
>>> '1234a'.isdigit() # letter at the end
False
>>> 'a1234'.isdigit() # letter at the start
False
>>> '12.34'.isdigit() # decimal point
False
>>> ''.isdigit() # empty string
False
If you know regular expressions, then you can see that str.isdigit returns True for ‘^d $’. Which can be very useful, as we can see here:
代码语言:javascript复制>>> n = input("Enter a number: ")
>>> if n.isdigit():
n = int(n)
print(f"{n} * 3 = {n*3}")
But wait: Python also includes another method, str.isnumeric. And it’s not at all obvious, at least at first, what the difference is between them, because they would seem to give the same results:
代码语言:javascript复制>>> n = input("Enter a number: ")
>>> if n.isnumeric():
n = int(n)
print(f"{n} * 3 = {n*3}")
So, what’s the difference? It’s actually pretty straightforward, but took some time for me to find out: Bascially, str.isdigit only returns True for what I said before, strings containing solely the digits 0-9.
By contrast, str.isnumeric returns True if it contains any numeric characters. When I first read this, I figured that it would mean decimal points and minus signs — but no! It’s just the digits 0-9, plus any character from another language that’s used in place of digits.
For example, we’re used to writing numbers with Arabic numerals. But there are other languages that traditionally use other characters. For example, in Chinese, we count 1, 2, 3, 4, 5 as 一,二,三,四, 五. It turns out that the Chinese characters for numbers will return False for str.isdigit, but True for str.isnumeric, behaving differently from their 0-9 counterparts:
代码语言:javascript复制>>> '12345'.isdigit()
True
>>> '12345'.isnumeric()
True
>>> '一二三四五'.isdigit()
False
>>> '一二三四五'.isnumeric()
True
So, which should you use? For most people, “isdigit” is probably a better choice, simply because it’s more clearly what you likely want. Of course, if you want to accept other types of numerals and numeric characters, then “isnumeric” is better. But if you’re interested in turning strings into integers, then you’re probably safer using “isdigit”, just in case someone tries to enter something else:
代码语言:javascript复制>>> int('二')
ValueError: invalid literal for int() with base 10: '二'
Just when I thought I was done with this, David Beazleyreminded me that there’s a third method I should be dealing with: str.isdecimal. This asks a slightly different question, namely whether we have something which is a number but not a decimal number. What does this mean? Well, in Python, we can describe “2 to the 2nd power” as “2 ** 2”. But if you want to print it, you’ll probably use something a bit fancier, such as 2². This is a two-character string, in which the first character is ‘2’ and the second character is ‘²’. The second one contains digits, but it’s not described as a decimal number. Thanks to Unicode, you can create such strings either via the keyboard or using Unicode-entry syntax. Thus:
代码语言:javascript复制>>> s = '2²' # or if you prefer, s = '2' 'u00B2'
>>> s.isdigit()
True
>>> s.isnumeric()
True
>>> s.isdecimal()
False
Most of us, most of the time, can thus use these three methods interchangeably, with little chance of being mixed up. Once you start using all sorts of interesting Unicode-based numbers, things can get a bit weird and interesting.
I tried to determine whether there was any difference in speed between these methods, just in case, but after numerous tests with “%timeit” in Jupyter, I found that I was getting roughly the same speeds from all methods.
If you’re like me, and ever wondered how a language that claims to have “one obvious way to do it” can have multiple seemingly identical methods… well, now you know!
PS:英文分享栏目致力于分享优质的英文阅读材料,内容聚焦于 Python 技术的介绍与辨析,自推出以来,收到了不少积极的反馈。为了使本栏目更好地运作下去,欢迎亲爱的读者们来投稿。可以直接将内容链接发到 Python猫
公众号后台,也可以在菜单栏添加我为微信好友,通过微信发给我。先行致谢了!