Creating multiple pandas dataframes from a single dataframe based on iteration

Creating multiple pandas dataframes from a single dataframe based on iteration

From a single dataframe(tr), I'm trying to create multiple dataframes based on a set of columns(cat_col). New dataframe names must be tr_'colname'.
Could someone help me with the below code?

for col in cat_col: tr_ = tr[[col,'TARGET']].groupby([col,'TARGET']).size().reset_index(name='Counts') tr_ = pivot_table(tr_,values='Counts',index=[col],columns=['TARGET']) print tr_.shape

Output:
(3, 2)
(7, 2)
(8, 2)
(5, 2)
(6, 2)
(6, 2)
(18, 2)
(7, 2)
(58, 2)
(4, 2)
(3, 2)
(7, 2)

tr[['col1','TARGET']].head(10)

col1 TARGET
0 Unaccompanied 1
1 Family 0
2 Unaccompanied 0
3 Unaccompanied 0
4 Unaccompanied 0
5 Spouse, partner 0
6 Unaccompanied 0
7 Unaccompanied 0
8 Children 0
9 Unaccompanied 0

tr_col1.head(3)

TARGET 0 1
col1
Family 37140 3009
Spouse, partner 10475 895
Unaccompanied 228189 20337

Can you add data sample?
– jezrael
Jul 3 at 6:02

Please let me know if it makes sense now. I can see dataframes are being created when I try to print within 'for' loop. I just don't know how to create a new dataframe inside the loop
– Harish
Jul 3 at 6:15

Why do you think tr_ is not DataFrame? It is DataFrame, test it by print (type(tr_))
– jezrael
Jul 3 at 6:30

tr_

DataFrame

print (type(tr_))

tr_ is a DataFrame but it gets replaced everytime on iteration and stores the values only for the last column.
– Harish
Jul 3 at 6:50

I just checked your sample and code. I was thinking of a way to store d['A'] as tr_A, d['B'] as tr_B but using a dictionary helps to store the values and retrieve it more easily! Thanks
– Harish
Jul 3 at 7:06

1 Answer
1

I think need:

tr = pd.DataFrame({'A':list('abcdefabcd'), 'B':list('abcdeabffe'), 'TARGET':[1,1,0,0,1,0,1,1,0,1]}) print (tr) A B TARGET 0 a a 1 1 b b 1 2 c c 0 3 d d 0 4 e e 1 5 f a 0 6 a b 1 7 b f 1 8 c f 0 9 d e 1 cat_col = ['A','B'] d = {} for col in cat_col: tr_ = (tr[[col,'TARGET']].groupby([col,'TARGET']) .size() .unstack() .reset_index() .rename_axis(None, axis=1)) #some another processes if necessary #check if outout is DataFrame print (type(tr_)) print (tr_) #if necessary store to dict d[col] = tr_

#select df from dict print (d['A']) A 0 1 0 a NaN 2.0 1 b NaN 2.0 2 c 2.0 NaN 3 d 1.0 1.0 4 e NaN 1.0 5 f 1.0 NaN

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

J,5R5aAYcoxGB0c x0VVkzDa DtVGUsVii1O6e098zRa76L1YNHm5,N,W5GCEQyZVJrxBn Pcj

搜尋此網誌

Fjhtyj