Python Find & Count Certain Word within (Strings) List Items

Multi tool use
Python Find & Count Certain Word within (Strings) List Items
Hello dear Programmers,
I want to find certain words within List Items. My Input looks like this:
myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen'...]
I want to find and count tNNt or tADJt or tVFINt.
The position of the words that I want to count is always the same, like you can see in the example.
I tried the following code, but I get the following error: ValueError too many values to unpack (expected 3)
from collections import Counter
myInputList = Counter([b for a,b,c in myInputList])
print(myInputList)
Actually, I can see why this code is not working. But I don't have another approach.
So my goal is to count the Part of Speach tags that are between the t.
So at the end I want to say: There are 5 NN, 4 ADJA...
split
t
Counter(s.split('t')[1] for s in myInputList)
4 Answers
4
Including the case when 't' not present
from collections import Counter
myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen','xyz']
Counter([x.split('t')[1] for x in myInputList if 't' in x])
Convert into Dictionary
from collections import Counter
myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen','xyz']
d=dict(Counter([x.split('t')[1] for x in myInputList if 't' in x]))
print(d['NN'])
Output:1
Thank you! This works! :-) And one additional Question: Now I would like to count only the NN, and to have as output just the number, nothing else. like: 1
– AnnaLise
Jul 2 at 16:24
Counter has the structure similar to that of the dictionary. Updated my code. Hope it helps.
– mad_
Jul 2 at 16:34
Thanks! This worked also! Great :-)
– AnnaLise
Jul 2 at 16:59
myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen']
newList =
for i in myInputList:
newList.extend(i.split("t"))
from collections import Counter
Counter(newList)
gives
{'ADJ': 1,
'Haus': 1,
'Hauses': 1,
'NN': 1,
'VFIN': 1,
'gehen': 1,
'geht': 1,
'gut': 1,
'guten': 1}
If you want and you're sure that you want only the elements in the 2ND index, then you can simply do
myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen']
newList =
for i in myInputList:
newList.append(i.split("t"))
from collections import Counter
onlySecond = [i[1] for i in newList]
dict(Counter(onlySecond))
will give you
{'ADJ': 1, 'NN': 1, 'VFIN': 1}
You can use collections.defaultdict
. If there is a possibility of more than one value occurring in a list item, then you can remove break
, which otherwise stops at the first match for a particular string.
collections.defaultdict
break
myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen']
values = ['tNNt', 'tADJt', 'tVFINt']
from collections import defaultdict
d = defaultdict(int)
for item in myInputList:
for v in values:
if v in item:
d[v] += 1
break
print(d)
defaultdict(int, {'tADJt': 1, 'tNNt': 1, 'tVFINt': 1})
This should do it:
a, b, c = ('tNNt', 'tADJt', 'tVFINt')
myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen']
print(len([i for i in myInputList if any(j in i for j in [a, b, c])]))
#3
Thank you. Your Code is working. But I tried to modify it a little bit, to get for example only NN to count. Can you see why where my mistake is? I did it like this: a = ('tNNt') myInputList = ['HausestNNtHaus', 'gutentADJtgut', 'gehttVFINtgehen'] print(len([i for i in myInputList if (j in i for j in a)])) # I get 3 as result # but I should get 1 as result
– AnnaLise
Jul 2 at 16:01
@AnnaLise As I suggested in edit use
print(len([i for i in myInputList if any(j in i for j in [a])]))
because ... for j in a
splits a into ['t', 'N', 'N', 't']
– zipa
Jul 2 at 16:08
print(len([i for i in myInputList if any(j in i for j in [a])]))
... for j in a
['t', 'N', 'N', 't']
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
split
the strings ont
.Counter(s.split('t')[1] for s in myInputList)
– Patrick Haugh
Jul 2 at 15:46