Chapter 7, Automate the boring stuff with Python, practice project: regex version of strip()
Chapter 7, Automate the boring stuff with Python, practice project: regex version of strip()
I am reading the book "Automate the boring stuff with Python'. In Chapter 7, in the project practice: the regex version of strip(), here is my code (I use Python 3.x):
def stripRegex(x,string):
import re
if x == '':
spaceLeft = re.compile(r'^s+')
stringLeft = spaceLeft.sub('',string)
spaceRight = re.compile(r's+$')
stringRight = spaceRight.sub('',string)
stringBoth = spaceRight.sub('',stringLeft)
print(stringLeft)
print(stringRight)
else:
charLeft = re.compile(r'^(%s)+'%x)
stringLeft = charLeft.sub('',string)
charRight = re.compile(r'(%s)+$'%x)
stringBoth = charRight.sub('',stringLeft)
print(stringBoth)
x1 = ''
x2 = 'Spam'
x3 = 'pSam'
string1 = ' Hello world!!! '
string2 = 'SpamSpamBaconSpamEggsSpamSpam'
stripRegex(x1,string1)
stripRegex(x2,string2)
stripRegex(x3,string2)
And here is the output:
Hello world!!!
Hello world!!!
Hello world!!!
BaconSpamEggs
SpamSpamBaconSpamEggsSpamSpam
So, my regex version of strip() nearly work as the original version. In the origninal version, the output always is "BaconSpamEggs" no matter you passed in 'Spam', 'pSam', 'mapS', 'Smpa'... So how to fix this in Regex version???
What is "the original version"?
– cricket_007
Jan 22 '16 at 19:16
"the original version" is the strip() default method that was explained in chapter 6 in this book. Example: spam = 'SpamSpamBaconSpamEggsSpamSpam') where the you type: spam.strip('Spam') or spam.strip('Smap') or spam.strip('pSam') ... the output always is : BaconSpamEggs
– Dales Vu
Jan 22 '16 at 19:40
8 Answers
8
You could check for multiple characters in the regex like this:
charLeft = re.compile(r'^([%s]+)' % 'abc')
print charLeft.sub('',"aaabcfdsfsabca")
>>> fdsfsabca
Or even better, do it in a single regex:
def strip_custom(x=" ", text):
return re.search(' *[{s}]*(.*?)[{s}]* *$'.format(s=x), text).group(1)
split_custom('abc', ' aaabtestbcaa ')
>>> test
Is that supposed to remove all a, b, and c's? Why are a,b,c still in the output?
– cricket_007
Jan 22 '16 at 19:28
only the ones on the left are removed, similar approach can be done on the right
– rtemperv
Jan 22 '16 at 19:30
Gotcha - forgot
strip
only does the ends of the string– cricket_007
Jan 22 '16 at 19:33
strip
thank you @rtemperv, It turns out very simple !!!
– Dales Vu
Jan 22 '16 at 19:52
import re
def regexStrip(x,y=''):
if y!='':
yJoin=r'['+y+']*([^'+y+'].*[^'+y+'])['+y+']*'
cRegex=re.compile(yJoin,re.DOTALL)
return cRegex.sub(r'1',x)
else:
sRegex=re.compile(r's*([^s].*[^s])s*',re.DOTALL)
return sRegex.sub(r'1',x)
text=' spmaHellow worldspam'
print(regexStrip(text,'spma'))
I switched the arguments, but from my quick testing, this seems to work. I gave it an optional argument which defaults to None
.
None
def stripRegex(s,toStrip=None):
import re
if toStrip is None:
toStrip = 's'
return re.sub(r'^[{0}]+|[{0}]+$'.format(toStrip), '', s)
x1 = ''
x2 = 'Spam'
x3 = 'pSam'
string1 = ' Hello world!!! '
string2 = 'SpamSpamBaconSpamEggsSpamSpam'
print(stripRegex(string1)) # 'Hello world!!!'
print(stripRegex(string1, x1)) # ' Hello world!!! '
print(stripRegex(string2, x2)) # 'BaconSpamEggs'
print(stripRegex(string2, x3)) # 'BaconSpamEggs'
@DalesVu - Wooh, what are you doing?
– cricket_007
Feb 10 '16 at 3:17
I have written two different codes for the same:
1st way:
import re
def stripfn(string, c):
if c != '':
Regex = re.compile(r'^['+ c +']*|['+ c +']*$')
strippedString = Regex.sub('', string)
print(strippedString)
else:
blankRegex = re.compile(r'^(s)*|(s)*$')
strippedString = blankRegex.sub('', string)
print(strippedString)
2nd way:
import re
def stripfn(string, c):
if c != '':
startRegex = re.compile(r'^['+c+']*')
endRegex = re.compile(r'['+c+']*$')
startstrippedString = startRegex.sub('', string)
endstrippedString = endRegex.sub('', startstrippedString)
print(endstrippedString)
else:
blankRegex = re.compile(r'^(s)*|(s)*$')
strippedString = blankRegex.sub('', string)
print(strippedString)
This seems to work:
def stripp(text, leftright = None):
import re
if leftright == None:
stripRegex = re.compile(r'^s*|s*$')
text = stripRegex.sub('', text)
print(text)
else:
stripRegex = re.compile(r'^.|.$')
margins = stripRegex.findall(text)
while margins[0] in leftright:
text = text[1:]
margins = stripRegex.findall(text)
while margins[-1] in leftright:
text = text[:-2]
margins = stripRegex.findall(text)
print(text)
mo = ' @@@@@@ '
mow = '@&&@#$texttexttext&&^&&&&%%'
bla = '@&#$^%+'
stripp(mo)
stripp(mow, bla)
Here my version:
#!/usr/bin/env python3
import re
def strippp(txt,arg=''): # assigning a default value to arg prevents the error if no argument is passed when calling strippp()
if arg =='':
regex1 = re.compile(r'^(s+)')
mo = regex1.sub('', txt)
regex2 = re.compile(r'(s+)$')
mo = regex2.sub('', mo)
print(mo)
else:
regex1 = re.compile(arg)
mo = regex1.sub('', txt)
print(mo)
text = ' So, you can create the illusion of smooth motion '
strippp(text, 'e')
strippp(text)
Solution by @rtemperv is missing a case when a string starts/ends w/ whitespace characters but such character is not provided for removal.
I.e
>>> var=" foobar"
>>> var.strip('raf')
' foob'
Hence regex should be a bit different:
def strip_custom(x=" ", text):
return re.search('^[{s}]*(.*?)[{s}]*$'.format(s=x), text).group(1)
See the code below
from re import *
check = '1'
while(check == '1'):
string = input('Enter the string: ')
strToStrip = input('Enter the string to strip: ')
if strToStrip == '': #If the string to strip is empty
exp = compile(r'^[s]*') #Looks for all kinds of spaces in beginning until anything other than that is found
string = exp.sub('',string) #Replaces that with empty string
exp = compile(r'[s]*$') #Looks for all kinds of spaces in the end until anything other than that is found
string = exp.sub('',string) #Replaces that with empty string
print('Your Stripped string is '', end = '')
print(string, end = '')
print(''')
else:
exp = compile(r'^[%s]*'%strToStrip) #Finds all instances of the characters in strToStrip in the beginning until anything other than that is found
string = exp.sub('',string) #Replaces it with empty string
exp = compile(r'[%s]*$'%strToStrip) #Finds all instances of the characters in strToStrip in the end until anything other than that is found
string = exp.sub('',string) #Replaces it with empty string
print('Your Stripped string is '', end = '')
print(string, end = '')
print(''')
print('Do you want to continue (1): ', end = '')
check = input()
Explanation:
The character class is used to check the individual instances of the character in the string.
The ^
is used to check whether the characters in the string to strip are in the beginning or not
^
$
If found they are replaced by empty string
with the sub()
empty string
sub()
*
is used to match the maximum of the characters in the string to strip until anything other than that is found.
*
*
matches 0 is no instance if found or matches as many as instances if found.
*
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Well there is no mystery about regex. So the problem you're having is you've lost control of what the code flow does.
– sln
Jan 22 '16 at 19:16