Pandas: how to select a susbset of a dataframe with multiple conditions


Pandas: how to select a susbset of a dataframe with multiple conditions



I have a dataframe like the following:


df = pd.DataFrame({'COND1' : [0,4,4,4,0],
'NAME' : ['one', 'one', 'two', 'three', 'three'],
'COND2' : ['a', 'b', 'a', 'a','b'],
'value': [30, 45, 18, 23, 77]})



Where we have two conditions: [0,4] and ['a','b']


[0,4]


['a','b']


df
COND1 COND2 NAME value
0 0 a one 30
1 4 a one 45
2 4 b one 25
3 4 a two 18
4 4 a three 23
5 4 b three 77



For each name I want to select the a subset with the condition COND1=0 & COND2=a if I have the information, COND1=4 & COND2=b otherwise.


COND1=0 & COND2=a


COND1=4 & COND2=b



the resulting dataframe will be:


df
COND1 COND2 NAME value
0 0 a one 30
1 NaN Nan two NaN
2 4 b three 77



I tried to do the following:


df[ ((df['COND1'] == 0 ) & (df['COND2'] == 'a') |
(df['COND1'] == 4 ) & (df['COND2'] == 'b'))]





Have you tried any approach? What error(s) did you get??
– Harv Ipan
Jul 2 at 18:31





Kindly fix your result , in three shoud it be 77 ?
– Wen
Jul 2 at 18:36




2 Answers
2



Try to modify your result by using drop_duplicates(drop the NAME satisfied both condition only keep one ) with reindex(Add back the NAME does not satisfied any condition )


drop_duplicates


reindex


Newdf=df[ ((df['COND1'] == 0 ) & (df['COND2'] == 'a') | (df['COND1'] == 4 ) & (df['COND2'] == 'b'))]
Newdf.sort_values('COND1').drop_duplicates(['NAME']).set_index('NAME').reindex(df.NAME.unique()).reset_index()
Out[378]:
NAME COND1 COND2 value
0 one 0.0 a 30.0
1 two NaN NaN NaN
2 three 4.0 b 77.0



Here's one extendable solution using a helper column. The idea is to create a dictionary mapping order, and apply this to a combination of two series. The sort and drop duplicates.


import numpy as np

df = pd.DataFrame({'COND1' : [0,4,4,4,4,4],
'NAME' : ['one', 'one', 'one', 'two', 'three', 'three'],
'COND2' : ['a', 'a', 'b', 'a', 'a','b'],
'value': [30, 45, 25, 18, 23, 77]})

# define order dictionary and apply to dataframe
order = {(0, 'a'): 0, (4, 'b'): 1}
df['order'] = df.set_index(['COND1', 'COND2']).index.map(order.get)

# if not found in dictionary, convert columns to NaN
df.loc[df['order'].isnull(), ['COND1', 'COND2', 'value']] = np.nan

# sort values, drop duplicates, drop helper column
res = df.sort_values('order').drop_duplicates(subset=['NAME']).drop('order', 1)

print(res)

COND1 NAME COND2 value
0 0.0 one a 30.0
5 4.0 three b 77.0
3 NaN two NaN NaN






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

How to set up datasource with Spring for HikariCP?

Display dokan vendor name on Woocommerce single product pages