How to check a Python Set for a string

Multi tool use
Multi tool use


How to check a Python Set for a string



Hopefully this doesn't get downvoted on the title but couldn't think of a better way of explaining the issue.



Based on suggestions I've seen on Stack, I am using Sets to ignore duplicate lines which works a treat until I get a use case where the line changes slightly but I still want to filter that line out as a duplicate. In my example, I cannot seem to be able to search a string for a certain keyword so in my example below, I exclude any new line where the firs column, which is the ID, already exists, in this case London.



For example.


London,Sold,2021-12-07,1000000,301909
London,Sold,2021-12-07,1000000,999999



So I wanted to know if it was possible to just check whether the ID London exists in my Set before adding the line or ignoring but I can't find any way to do this. I tried tuples but not sure that is my solution and I cannot create a set from list. My simple test case is as follows and the end result is I only get the line in my set, not the testline as is happening now.


testline = 'London,Sold,2021-12-07,1000000,301909'
id = 'London'

j="testline2"
seen = set()
seen.add(testline)

if id not in seen:
seen.add(j)

print seen





If you find such a near-duplicate, would you want to keep the first or the second encountered item?
– tobias_k
Jul 3 at 8:10





keep the first occurrence
– emmon simbo
Jul 3 at 8:15




4 Answers
4



It seems like you want a dict, where the key is the first value, rather than a set.


seen = {}
id = testline.partition(',')[0]
seen[id] = testline
...
if id not in seen:
...





Can you please change id name? Not good practice to shadow built-ins..
– jpp
Jul 3 at 8:14


id





Sure, will do for future posts, was just using that as an example but will remember for best practices. thanks
– emmon simbo
Jul 3 at 8:21


testline = 'London,Sold,2021-12-07,1000000,301909'
id = 'London'

j="testline2"
seen = set()
for element in testline.split(',')
seen.add(element)
if id not in seen:
seen.add(j)

print seen



You're near to the solution:


testline = ['London','Sold','2021-12-07','1000000','301909'] #updated
id = 'London'

j="testline2"
seen = set(testline) #updated
if id not in seen:
seen.add(j)

print (seen )





What's different than in OP's code? I don't think the indentation was the problem.
– tobias_k
Jul 3 at 8:13





Ah, you split testline to a list. You should mention that in your answer. However, Unless you tell OP how to do this programmatically, I don't think this in itself will help much.
– tobias_k
Jul 3 at 8:15


testline





Thanks @tobias_k, I shall consider your suggestion in my future answer.
– Taohidul Islam
Jul 3 at 8:17



You need to extract the first split of your comma separated string and add it to your set. For this, you can either your x.split(',', 1)[0] or x.partition(',')[0].


x.split(',', 1)[0]


x.partition(',')[0]



The logic is identical to the commonly used itertools recipe unique_everseen, available in the official docs.


itertools


unique_everseen



The recipe has been re-implemented in 3rd party toolz.unique, which can save you reinventing the wheel:


toolz.unique


from toolz import unique

L = ['London,Sold,2021-12-07,1000000,301909',
'London,Sold,2021-12-07,1000000,999999']

res = list(unique(L, key=lambda x: x.partition(',')[0]))

print(res)
['London,Sold,2021-12-07,1000000,301909']






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

GrmOJ4LIQE,NULr,zLX 8irqzxtrvwk Rv,KIpsr,vqtEBo5L u,Hwae
f r7xO9Lv0S3K

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications