Creating multiple pandas dataframes from a single dataframe based on iteration


Creating multiple pandas dataframes from a single dataframe based on iteration



From a single dataframe(tr), I'm trying to create multiple dataframes based on a set of columns(cat_col). New dataframe names must be tr_'colname'.
Could someone help me with the below code?


for col in cat_col:
tr_ = tr[[col,'TARGET']].groupby([col,'TARGET']).size().reset_index(name='Counts')
tr_ = pivot_table(tr_,values='Counts',index=[col],columns=['TARGET'])
print tr_.shape



Output:
(3, 2)
(7, 2)
(8, 2)
(5, 2)
(6, 2)
(6, 2)
(18, 2)
(7, 2)
(58, 2)
(4, 2)
(3, 2)
(7, 2)


tr[['col1','TARGET']].head(10)



col1 TARGET
0 Unaccompanied 1
1 Family 0
2 Unaccompanied 0
3 Unaccompanied 0
4 Unaccompanied 0
5 Spouse, partner 0
6 Unaccompanied 0
7 Unaccompanied 0
8 Children 0
9 Unaccompanied 0


tr_col1.head(3)



TARGET 0 1
col1
Family 37140 3009
Spouse, partner 10475 895
Unaccompanied 228189 20337





Can you add data sample?
– jezrael
Jul 3 at 6:02





Please let me know if it makes sense now. I can see dataframes are being created when I try to print within 'for' loop. I just don't know how to create a new dataframe inside the loop
– Harish
Jul 3 at 6:15





Why do you think tr_ is not DataFrame? It is DataFrame, test it by print (type(tr_))
– jezrael
Jul 3 at 6:30


tr_


DataFrame


DataFrame


print (type(tr_))





tr_ is a DataFrame but it gets replaced everytime on iteration and stores the values only for the last column.
– Harish
Jul 3 at 6:50






I just checked your sample and code. I was thinking of a way to store d['A'] as tr_A, d['B'] as tr_B but using a dictionary helps to store the values and retrieve it more easily! Thanks
– Harish
Jul 3 at 7:06





1 Answer
1



I think need:


tr = pd.DataFrame({'A':list('abcdefabcd'),
'B':list('abcdeabffe'),
'TARGET':[1,1,0,0,1,0,1,1,0,1]})

print (tr)
A B TARGET
0 a a 1
1 b b 1
2 c c 0
3 d d 0
4 e e 1
5 f a 0
6 a b 1
7 b f 1
8 c f 0
9 d e 1

cat_col = ['A','B']

d = {}
for col in cat_col:
tr_ = (tr[[col,'TARGET']].groupby([col,'TARGET'])
.size()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
#some another processes if necessary

#check if outout is DataFrame
print (type(tr_))

print (tr_)
#if necessary store to dict
d[col] = tr_


#select df from dict
print (d['A'])
A 0 1
0 a NaN 2.0
1 b NaN 2.0
2 c 2.0 NaN
3 d 1.0 1.0
4 e NaN 1.0
5 f 1.0 NaN






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

iOS Top Alignment constraint based on screen (superview) height