How to collapse multiple columns into one in pandas

How to collapse multiple columns into one in pandas

I have a pandas dataframe filled with users and categories, but multiple columns for those categories.

| user | category | val1 | val2 | val3 | | ------ | ------------------| -----| ---- | ---- | | user 1 | c1 | 3 | NA | None | | user 1 | c2 | NA | 4 | None | | user 1 | c3 | NA | NA | 7 | | user 2 | c1 | 5 | NA | None | | user 2 | c2 | NA | 7 | None | | user 2 | c3 | NA | NA | 2 |

I want to get it so the values are compressed into a single column.

| user | category | value| | ------ | ------------------| -----| | user 1 | c1 | 3 | | user 1 | c2 | 4 | | user 1 | c3 | 7 | | user 2 | c1 | 5 | | user 2 | c2 | 7 | | user 2 | c3 | 2 |

Ultimately, to get a matrix like the following:

np.array([[3, 4, 7], [5, 7, 2]])

How do you get 2 for the final row, since all values are null there?
– jpp
Jun 29 at 14:40

2

Pretty sure that was a typo and the 2 in the val3 column should be dropped down.
– piRSquared
Jun 29 at 14:41

val3

Edited it. yes it was a typo
– hedebyhedge
Jul 2 at 10:29

3 Answers
3

You can use pd.DataFrame.bfill to backfill values over selected columns. However, I'm not sure how you derive 2 for the final value, since no values are non-null in the final row.

pd.DataFrame.bfill

2

val_cols = ['val1', 'val2', 'val3'] df['value'] = pd.to_numeric(df[val_cols].bfill(axis=1).iloc[:, 0], errors='coerce') print(df) user0 category val1 val2 val3 value 0 user 1 c1 3.0 NaN None 3.0 1 user 1 c2 NaN 4.0 None 4.0 2 user 1 c3 NaN NaN 7 7.0 3 user 2 c1 5.0 NaN None 5.0 4 user 2 c2 NaN 7.0 2 7.0 5 user 2 c3 NaN NaN None NaN

bfill is a nice way to do it.
– piRSquared
Jun 29 at 14:39

bfill

['user', 'category']

d = df.set_index(['user', 'category']) pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).reset_index(name='value') user category value 0 user 1 c1 3 1 user 1 c2 4 2 user 1 c3 7 3 user 2 c1 5 4 user 2 c2 7 5 user 2 c3 2

You can skip the resetting of the index and unstack to get your final result

d = df.set_index(['user', 'category']) pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).unstack() category c1 c2 c3 user user 1 3 4 7 user 2 5 7 2

You can simply fillna(0) (df2 = df.fillna(0)) and use | operator.

fillna(0)

df2 = df.fillna(0)

|

Convert to int first

int

df2.loc[:, ['val1','val2','val3']] = df2[['val1','val2','val3']].astype(int)

Then

df2['val4'] = df2.val1.values | df2.val2.values | df2.val3.values

Interesting approach (-:
– piRSquared
Jun 29 at 14:39

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

7xIjaK tnkOhrWERO2lI7OkYlIBD9DvCeFhZzMsF9Us1WGsH IcBYpnrtgLvLwGfkF7elWDF,xSCs6rldK

搜尋此網誌

Fjhtyj