How to convert unusual unicode string with number to integer in python
How to convert unusual unicode string with number to integer in python
I have some fairly hairy unicode strings with numbers in them that I'd like to test the value of. Normally, I'd just use str.isnumeric
to test for whether it could be converted via int()
but I'm encountering cases where isnumeric
returns True
but int()
raises an exception.
str.isnumeric
int()
isnumeric
True
int()
Here's an example program:
>>> s = '⒍'
>>> s.isnumeric()
True
>>> int(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '⒍'
Unicode is always full of surprises, so I'm happy to just be robust to this case and use a try/except block to catch unusual numbers. However, I'd be happier if I could still convert them to integers. Is there a consistent way to do this?
3 Answers
3
If you want to test if a string can be passed to int
, use str.isdecimal
. Both str.isnumeric
and str.isdigit
include decimal-like characters that aren't compatible with int
.
int
str.isdecimal
str.isnumeric
str.isdigit
int
And as @abarnert has mentioned in the comments, the most guaranteed way to test if a string can be passed to int
is to simply do it in a try
block.
int
try
On the other hand, '⒍' can be converted to an actual digit with the help of the unicodedata
module, e.g.
unicodedata
print(unicodedata.digit('⒍'))
would output 6
.
6
⒍
Use
unicodedata
. print(unicodedata.digit('⒍'))
outputs 6
.– blhsing
Jul 3 at 4:40
unicodedata
print(unicodedata.digit('⒍'))
6
@blhsing You should add that comment to the answer. But also, the best way to test if a string can be passed to
int
is to just pass it to int
in a try:
block.– abarnert
Jul 3 at 4:42
int
int
try:
@abarnert Indeed. I've edited the answer as suggested. Thanks.
– blhsing
Jul 3 at 4:46
The best way to find out if a string can be converted to int is to just try
it:
try
s = '⒍'
try:
num = int(s)
except ValueError:
# handle it
Sure, you can try to figure out the right way to test the string in advance, but why? If the rule you want is "whatever int
accepts", just use int
.
int
int
If you want to convert something that is a digit, but isn't a decimal, use the unicodedata
module:
unicodedata
s = '⒍'
num = unicodedata.digit(s) # 6
num = unicodedata.numeric(s) # 6.0
num = unicodedata.decimal(s) # ValueError: not a decimal
The DIGIT SIX FULL STOP
character's entry in the database has Digit and Numeric values, despite being a Number, Other
rather than a Number, Decimal Digit
(and therefore not being compatible with int
).
DIGIT SIX FULL STOP
Number, Other
Number, Decimal Digit
int
I don't know how much luck you'll have, but unicodedata may handle some cases (python 3 code):
>>> import unicodedata
>>> unicodedata.normalize('NFKC', '⒍')
'6.'
Slightly better. As to testing, if you want an int you could just int() it and catch the exception.
This works, because
⒍
(DIGIT SIX FULL STOP
) decomposes into 6
(DIGIT SIX
) and .
(FULL STOP
), which can somewhat coincidentally be interpreted as a float
, but it's not a general solution for all numeric/digit characters that aren't decimals.– abarnert
Jul 3 at 4:49
⒍
DIGIT SIX FULL STOP
6
DIGIT SIX
.
FULL STOP
float
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
This helps a lot, but any idea how to convert
⒍
to a integer or even a float?– David Jurgens
Jul 3 at 4:35