Solving String Challenges in Python

Wed 25 March 2020

Filed under Python

Tags python

Python String Minatures

I was reading a blog post at http://beyondloom.com/blog/strings.html, where a number of string manipulation challenges were posted. They were supposed to be solved by APL style one-liners in K, but I deciced to redo them in Python


Define the Environment

In [2]:
from collections import Counter, defaultdict
from typing import List


from itertools import permutations
from itertools import islice
from itertools import compress
from itertools import cycle

Given a string x and a character y, how many times does y occur in x?

In [3]:
def str_count(string: str)->Counter:

    return Counter(string)
#end str_count


c = str_count('Australia')
c['A'], c['a'], c['b']
Out[3]:
(1, 2, 0)

Solved by using the Counter object from the collections package


Given a string x, is it identical when read forwards and backwards?

In [4]:
def is_palindrome(string:str)->bool:
    return string == ''.join(reversed(string))
#end is_palindrome

is_palindrome('racecar'), is_palindrome('palindrame')
Out[4]:
(True, False)

In fact the concept of reversing a string gets very complicated in dealing with general Unicode strings (some languages have different shapes for the same letter at the end, middle, or front of words, for example)


Given a string x, produce a list of characters which appear more than once in x.

In [5]:
def non_uniques2(string:str):
    cnt = Counter(string)
    res = [k for k in cnt if cnt[k]>1]
    return res

#end non_unique

non_uniques2('applause')
Out[5]:
['a', 'p']

Create a Counter dictionary, and iterate over all keys


Given strings x and y, do both strings contain the same letters, possibly in a different order?

In [6]:
def same_chars(str1:str, str2:str)->bool:
    return sorted(str1)==sorted(str2)
#end same_chars

same_chars('tab', 'bat'), same_chars('cat', 'dog')
    
Out[6]:
(True, False)

Given a string x, find all the characters which occur exactly once, in the order they appear.

In [7]:
def uniques(string:str)->List[str]:
    cnt = Counter(string)
    single =  [s for s in string if cnt[s] == 1]
    return single
#end non_unique

uniques('applause')
Out[7]:
['l', 'u', 's', 'e']

Given strings x and y, is x a rotation of the characters in y?

In [8]:
def is_rotation(str1:str, str2:str)->bool:
    
    if( len(str1) != len(str2)):
        return False
    elif( str1 == str2):
        return True
    elif( sorted(str1) != sorted(str2)):
        return False
    else:
        for i in range(1, len(str1)):
            if str1 ==str2[-i:] + str2[:-i] :
                return True
            #end if
        #end for
        return False
    #end if
#end is_rotation

str1 = 'foobar'
str2 = 'barfoo'
is_rotation('foobar', 'barfoo'),is_rotation('cat', 'bat')
Out[8]:
(True, False)

I use some cheap heuristic checks to short-circuit the test loop for strings that are certain to fail


Given a list of strings x, sort the strings by length, ascending.

In [9]:
def sort_by_len(str_list:List[str])->List[str]:
    
    return sorted(str_list, key=len)
#end sort_by_len

sort_by_len(['a', 'bb', 'ccccccc', 'xxx', ''])
Out[9]:
['', 'a', 'bb', 'xxx', 'ccccccc']

Given a string x, identify the character which occurs most frequently

In [10]:
def most_pop_char(string:str)->str:

    return Counter(string).most_common(1)[0][0]
#end most_pop_char

most_pop_char('a bb ccc dddd xxxx, yyyyyyy')
Out[10]:
'y'

Given a string x consisting of words (one or more non-space characters) which are separated by spaces, reverse the order of the characters in each word.

In [11]:
def reverse_words(string:str)->str:
    words = string.split()

    rev_words = [''.join(reversed(w)) for w in words]
 
    rev_string =' '.join(rev_words)
    return rev_string                     
    
#end reverse_words

reverse_words('this is a very short sentance Hurray'), reverse_words('zoop')
Out[11]:
('siht si a yrev trohs ecnatnes yarruH', 'pooz')

Given a string x and a boolean vector y of the same length, extract the characters of x corresponding to a 1 in y.

In [12]:
def extract(string:str, mask:List[bool])->str:
    
    res =  [c for c, m  in zip(string, mask) if m]
    return ''.join(res)
#end extract

print(extract('abcdef', [1,1,0,0,1,0]))

print(''.join(list(compress('abcdef', [1,1,0,0,1,0]))))
abe
abe

Python Collections has method to do just this, as shown above


Given a string x and a boolean vector y, spread the characters of x to the positions of 1s in y, filling intervening characters with underscores

In [13]:
def spread2(string:str, mask:List[bool])->str:
    
    chars = islice(string, None)
    res = [ next(chars) if m else '_' for m in mask]
    return ''.join(res)
#end spread2

spread2('abc', [1,0,0,1,0,0,1]), spread2('abcdef', [1,0,1,0,0,1,1, 0,0,0,0,1,0,1]), 
Out[13]:
('a__b__c', 'a_b__cd____e_f')
In [14]:
def spread3(string:str, mask:List[bool])->str:
    
    chars = cycle(string)
    res = [ next(chars) if m else '_' for m in mask]
    return ''.join(res)
#end spread3

spread3('abc', [1,0,1,0,0, 1,1,1,1,1,0]),
Out[14]:
('a_b__cabca_',)

We have two solutions, the second one catering for the case where the input string is shorter than the mask. We create iterators from the input string, to allow easier access from inside the list comprehension


Given a string x, remove all the vowels entirely.

In [15]:
def de_vowel(string:str)->str:
    res = [c for c in string if not c in 'aeiou']
    return ''.join(res)
#end de_vowel

de_vowel('cat'), de_vowel('A very short sentence')
Out[15]:
('ct', 'A vry shrt sntnc')

Given a string x, replace all the vowels (a, e, i, o, u, or y) with underscores.

In [16]:
def de_vowel2(string:str)->str:
    res = [c if not c in 'aeiouy' else '_' for c in string]
    return ''.join(res)
#end de_vowel2

de_vowel2('cat'), de_vowel2('This is a short sentence')
Out[16]:
('c_t', 'Th_s _s _ sh_rt s_nt_nc_')

Given a string x consisting of words separated by spaces (as above), and a string y, replace all words in x which are the same as y with a series of xes.

In [17]:
def hide_words(string:str, secret:str)->str:
    words = string.split()

    new_words = [w if not w==secret else '*'*len(w) for w in words]
 
    new_string =' '.join(new_words)
    return new_string                     
    
#end hide_words

hide_words('this is a very short sentance, this is  Hurray', 'this'), hide_words('zoop', 'cat')
Out[17]:
('**** is a very short sentance, **** is Hurray', 'zoop')

Given a string x, generate a list of all possible reorderings of the characters in x

In [18]:
def all_perms(word:str)->List[str]:
    ps = permutations(word)
    return [''.join(p) for p in ps]
#end all_perms

all_perms('xyz')
Out[18]:
['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']

Conclusions

Most of the above could be compressed even more, with the loss of some readibility. I would claim that most are very close to one-liners


Comments


net-analysis.com Data Analysis Blog © Don Cameron Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More