How to convert unusual unicode string with number to integer in python


How to convert unusual unicode string with number to integer in python



I have some fairly hairy unicode strings with numbers in them that I'd like to test the value of. Normally, I'd just use str.isnumeric to test for whether it could be converted via int() but I'm encountering cases where isnumeric returns True but int() raises an exception.


str.isnumeric


int()


isnumeric


True


int()



Here's an example program:


>>> s = '⒍'
>>> s.isnumeric()
True
>>> int(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '⒍'



Unicode is always full of surprises, so I'm happy to just be robust to this case and use a try/except block to catch unusual numbers. However, I'd be happier if I could still convert them to integers. Is there a consistent way to do this?




3 Answers
3



If you want to test if a string can be passed to int, use str.isdecimal. Both str.isnumeric and str.isdigit include decimal-like characters that aren't compatible with int.


int


str.isdecimal


str.isnumeric


str.isdigit


int



And as @abarnert has mentioned in the comments, the most guaranteed way to test if a string can be passed to int is to simply do it in a try block.


int


try



On the other hand, '⒍' can be converted to an actual digit with the help of the unicodedata module, e.g.


unicodedata


print(unicodedata.digit('⒍'))



would output 6.


6





This helps a lot, but any idea how to convert to a integer or even a float?
– David Jurgens
Jul 3 at 4:35






Use unicodedata. print(unicodedata.digit('⒍')) outputs 6.
– blhsing
Jul 3 at 4:40


unicodedata


print(unicodedata.digit('⒍'))


6





@blhsing You should add that comment to the answer. But also, the best way to test if a string can be passed to int is to just pass it to int in a try: block.
– abarnert
Jul 3 at 4:42


int


int


try:





@abarnert Indeed. I've edited the answer as suggested. Thanks.
– blhsing
Jul 3 at 4:46



The best way to find out if a string can be converted to int is to just try it:


try


s = '⒍'
try:
num = int(s)
except ValueError:
# handle it



Sure, you can try to figure out the right way to test the string in advance, but why? If the rule you want is "whatever int accepts", just use int.


int


int



If you want to convert something that is a digit, but isn't a decimal, use the unicodedata module:


unicodedata


s = '⒍'
num = unicodedata.digit(s) # 6
num = unicodedata.numeric(s) # 6.0
num = unicodedata.decimal(s) # ValueError: not a decimal



The DIGIT SIX FULL STOP character's entry in the database has Digit and Numeric values, despite being a Number, Other rather than a Number, Decimal Digit (and therefore not being compatible with int).


DIGIT SIX FULL STOP


Number, Other


Number, Decimal Digit


int



I don't know how much luck you'll have, but unicodedata may handle some cases (python 3 code):


>>> import unicodedata
>>> unicodedata.normalize('NFKC', '⒍')
'6.'



Slightly better. As to testing, if you want an int you could just int() it and catch the exception.





This works, because (DIGIT SIX FULL STOP) decomposes into 6 (DIGIT SIX) and . (FULL STOP), which can somewhat coincidentally be interpreted as a float, but it's not a general solution for all numeric/digit characters that aren't decimals.
– abarnert
Jul 3 at 4:49



DIGIT SIX FULL STOP


6


DIGIT SIX


.


FULL STOP


float






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

How to set up datasource with Spring for HikariCP?

Display dokan vendor name on Woocommerce single product pages