Pandas: how to select a susbset of a dataframe with multiple conditions

I have a dataframe like the following:

df = pd.DataFrame({'COND1' : [0,4,4,4,0], 'NAME' : ['one', 'one', 'two', 'three', 'three'], 'COND2' : ['a', 'b', 'a', 'a','b'], 'value': [30, 45, 18, 23, 77]})

Where we have two conditions: [0,4] and ['a','b']

[0,4]

['a','b']

df COND1 COND2 NAME value 0 0 a one 30 1 4 a one 45 2 4 b one 25 3 4 a two 18 4 4 a three 23 5 4 b three 77

For each name I want to select the a subset with the condition COND1=0 & COND2=a if I have the information, COND1=4 & COND2=b otherwise.

COND1=0 & COND2=a

COND1=4 & COND2=b

the resulting dataframe will be:

df COND1 COND2 NAME value 0 0 a one 30 1 NaN Nan two NaN 2 4 b three 77

I tried to do the following:

df[ ((df['COND1'] == 0 ) & (df['COND2'] == 'a') | (df['COND1'] == 4 ) & (df['COND2'] == 'b'))]

Have you tried any approach? What error(s) did you get??
– Harv Ipan
Jul 2 at 18:31

Kindly fix your result , in three shoud it be 77 ?
– Wen
Jul 2 at 18:36

2 Answers
2

Try to modify your result by using drop_duplicates(drop the NAME satisfied both condition only keep one ) with reindex(Add back the NAME does not satisfied any condition )

drop_duplicates

reindex

Newdf=df[ ((df['COND1'] == 0 ) & (df['COND2'] == 'a') | (df['COND1'] == 4 ) & (df['COND2'] == 'b'))] Newdf.sort_values('COND1').drop_duplicates(['NAME']).set_index('NAME').reindex(df.NAME.unique()).reset_index() Out[378]: NAME COND1 COND2 value 0 one 0.0 a 30.0 1 two NaN NaN NaN 2 three 4.0 b 77.0

Here's one extendable solution using a helper column. The idea is to create a dictionary mapping order, and apply this to a combination of two series. The sort and drop duplicates.

import numpy as np df = pd.DataFrame({'COND1' : [0,4,4,4,4,4], 'NAME' : ['one', 'one', 'one', 'two', 'three', 'three'], 'COND2' : ['a', 'a', 'b', 'a', 'a','b'], 'value': [30, 45, 25, 18, 23, 77]}) # define order dictionary and apply to dataframe order = {(0, 'a'): 0, (4, 'b'): 1} df['order'] = df.set_index(['COND1', 'COND2']).index.map(order.get) # if not found in dictionary, convert columns to NaN df.loc[df['order'].isnull(), ['COND1', 'COND2', 'value']] = np.nan # sort values, drop duplicates, drop helper column res = df.sort_values('order').drop_duplicates(subset=['NAME']).drop('order', 1) print(res) COND1 NAME COND2 value 0 0.0 one a 30.0 5 4.0 three b 77.0 3 NaN two NaN NaN

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Fjhtyj