Creating multiple pandas dataframes from a single dataframe based on iteration
Creating multiple pandas dataframes from a single dataframe based on iteration
From a single dataframe(tr), I'm trying to create multiple dataframes based on a set of columns(cat_col). New dataframe names must be tr_'colname'.
Could someone help me with the below code?
for col in cat_col:
tr_ = tr[[col,'TARGET']].groupby([col,'TARGET']).size().reset_index(name='Counts')
tr_ = pivot_table(tr_,values='Counts',index=[col],columns=['TARGET'])
print tr_.shape
Output:
(3, 2)
(7, 2)
(8, 2)
(5, 2)
(6, 2)
(6, 2)
(18, 2)
(7, 2)
(58, 2)
(4, 2)
(3, 2)
(7, 2)
tr[['col1','TARGET']].head(10)
col1 TARGET
0 Unaccompanied 1
1 Family 0
2 Unaccompanied 0
3 Unaccompanied 0
4 Unaccompanied 0
5 Spouse, partner 0
6 Unaccompanied 0
7 Unaccompanied 0
8 Children 0
9 Unaccompanied 0
tr_col1.head(3)
TARGET 0 1
col1
Family 37140 3009
Spouse, partner 10475 895
Unaccompanied 228189 20337
Please let me know if it makes sense now. I can see dataframes are being created when I try to print within 'for' loop. I just don't know how to create a new dataframe inside the loop
– Harish
Jul 3 at 6:15
Why do you think
tr_
is not DataFrame
? It is DataFrame
, test it by print (type(tr_))
– jezrael
Jul 3 at 6:30
tr_
DataFrame
DataFrame
print (type(tr_))
tr_ is a DataFrame but it gets replaced everytime on iteration and stores the values only for the last column.
– Harish
Jul 3 at 6:50
I just checked your sample and code. I was thinking of a way to store d['A'] as tr_A, d['B'] as tr_B but using a dictionary helps to store the values and retrieve it more easily! Thanks
– Harish
Jul 3 at 7:06
1 Answer
1
I think need:
tr = pd.DataFrame({'A':list('abcdefabcd'),
'B':list('abcdeabffe'),
'TARGET':[1,1,0,0,1,0,1,1,0,1]})
print (tr)
A B TARGET
0 a a 1
1 b b 1
2 c c 0
3 d d 0
4 e e 1
5 f a 0
6 a b 1
7 b f 1
8 c f 0
9 d e 1
cat_col = ['A','B']
d = {}
for col in cat_col:
tr_ = (tr[[col,'TARGET']].groupby([col,'TARGET'])
.size()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
#some another processes if necessary
#check if outout is DataFrame
print (type(tr_))
print (tr_)
#if necessary store to dict
d[col] = tr_
#select df from dict
print (d['A'])
A 0 1
0 a NaN 2.0
1 b NaN 2.0
2 c 2.0 NaN
3 d 1.0 1.0
4 e NaN 1.0
5 f 1.0 NaN
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Can you add data sample?
– jezrael
Jul 3 at 6:02