Create new column depending on values from other column
Create new column depending on values from other column
I have a DataFrame that looks something like this:
import numpy as np
import pandas as pd
df=pd.DataFrame([['vt 40462',5,6],[5,6,6],[5,5,8],[4,3,1],['vl 6450',5,6],[5,6,7],
[1,2,3],['vt 40462',5,6],[5,5,8],['vl 658',6,7],[5,5,8],[4,3,1],['vt 40461',5,6],[5,5,8],
[7,8,5]],columns=['A','B','C'])
df
A B C
0 vt 40462 5 6
1 5 6 6
2 5 5 8
3 4 3 1
4 vl 6450 5 6
5 5 6 7
6 1 2 3
7 vt 40462 5 6
8 5 5 8
9 vl 658 6 7
10 5 5 8
11 4 3 1
12 vt 40461 5 6
13 5 5 8
14 7 8 5
I want to give indexes the values that are between vt
and vl
in column A
and create a new columns as :
vt
vl
A
A B C D
0 vt 40462 5 6 vt 40462
1 5 6 6 vt 40462
2 5 5 8 vt 40462
3 4 3 1 vt 40462
4 vl 6450 5 6 vl 6450
5 5 6 7 vl 6450
6 1 2 3 vl 6450
7 vt 40462 5 6 vt 40462
8 5 5 8 vt 40462
9 vl 658 6 7 vl 658
10 5 5 8 vl 658
11 4 3 1 vl 658
12 vt 40461 5 6 vt 40461
13 5 5 8 vt 40461
14 7 8 5 vt 40461
2 Answers
2
Another way would be to assign
column D
to be all values of A
that start with a letter, and then use df.ffill()
to get rid of NaN
s:
assign
D
A
df.ffill()
NaN
df.assign(D=df.loc[df.A.str.contains('^[A-Za-z]', na=False), 'A']).ffill()
A B C D
0 vt 40462 5 6 vt 40462
1 5 6 6 vt 40462
2 5 5 8 vt 40462
3 4 3 1 vt 40462
4 vl 6450 5 6 vl 6450
5 5 6 7 vl 6450
6 1 2 3 vl 6450
7 vt 40462 5 6 vt 40462
8 5 5 8 vt 40462
9 vl 658 6 7 vl 658
10 5 5 8 vl 658
11 4 3 1 vl 658
12 vt 40461 5 6 vt 40461
13 5 5 8 vt 40461
14 7 8 5 vt 40461
Or, more or less equivalently, but in 2 steps:
df.loc[df.A.astype(str).str.contains('^[A-Za-z]'), 'D'] = df.A
df.ffill()
Use str.split
, if ' ' not found the it returns NaN use ffill
to fill NaN and join fields together and assign to 'D':
str.split
ffill
#Thanks @user3483203 for the upgrade in syntax
df['D'] = df['A'].str.split().ffill().apply(' '.join)
print(df)
Output:
A B C D
0 vt 40462 5 6 vt 40462
1 5 6 6 vt 40462
2 5 5 8 vt 40462
3 4 3 1 vt 40462
4 vl 6450 5 6 vl 6450
5 5 6 7 vl 6450
6 1 2 3 vl 6450
7 vt 40462 5 6 vt 40462
8 5 5 8 vt 40462
9 vl 658 6 7 vl 658
10 5 5 8 vl 658
11 4 3 1 vl 658
12 vt 40461 5 6 vt 40461
13 5 5 8 vt 40461
14 7 8 5 vt 40461
I think you can simplify to
df.A.str.split().ffill().apply(' '.join)
(Although using a lambda for join may be clearer as to what it's doing)– user3483203
Jul 2 at 20:37
df.A.str.split().ffill().apply(' '.join)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
thanks it's working
– sym
Jul 3 at 6:49