Pandas — .add() resulting in TypeError: 'int' object is not iterable

Pandas — .add() resulting in TypeError: 'int' object is not iterable

I've run into a bit of a problem adding Pandas dataframes using the .add() method. I have a data generator I'm using to generate synthetic data along a normal distribtuion:

import pandas as pd import numpy as np def DataSynthNormal(data, sel, column, fracFull, TotalRows, SelRows, mean, std, abst=False): fraction = data.loc[data['A'] == sel, column].sample(frac = fracFull).index if abst: data1 = pd.DataFrame(np.absolute(np.random.normal(mean, std, round(SelRows*fracFull)).astype('int64')), index=fraction).reindex(range(TotalRows)) else: data1 = pd.DataFrame(np.random.normal(mean, std, round(SelRows*fracFull)).astype('int64'), index=fraction).reindex(range(TotalRows)) data[column] = data[column].add(data1, fill_value=0)

Using a this dataframe as an example:

empty = pd.DataFrame(columns=['A','B'], index=range(0,10)) empty.A[0:4] = "C"; empty.A[4:7] = "D"; empty.A[7:10] = "E" print(empty) A B 0 C NaN 1 C NaN 2 C NaN 3 C NaN 4 D NaN 5 D NaN 6 D NaN 7 E NaN 8 E NaN 9 E NaN

And running the data generator:

DataSynthNormal(empty, 'C', 'B', 0.8, 10, 4, 0, 1)

I get the following error:

Traceback (most recent call last):

File "", line 1, in
DataSynthNormal2(empty, 'C', 'B', 0.8, 10, 4, 0, 1)

File "", line 7, in DataSynthNormal2
data[column] = data[column].add(data1, fill_value=0)

File "C:UsersUserAnaconda3libsite-packagespandascoreops.py",
line 1358, in flex_wrapper
self.index).finalize(self)

File
"C:UsersUserAnaconda3libsite-packagespandascoreseries.py",
line 274, in init
raise_cast_failure=True)

File
"C:UsersUserAnaconda3libsite-packagespandascoreseries.py",
line 4163, in _sanitize_array
subarr = com._asarray_tuplesafe(data, dtype=dtype)

File
"C:UsersUserAnaconda3libsite-packagespandascorecommon.py",
line 317, in _asarray_tuplesafe
values = [tuple(x) for x in values]

File
"C:UsersUserAnaconda3libsite-packagespandascorecommon.py",
line 317, in
values = [tuple(x) for x in values]

TypeError: 'int' object is not iterable

I'm trying to use .add() here because it conserves NaN when two dataframes are added, as opposed to .fillna(0) (which has been outputting n x n matrices, for some reason). I want it to do this because the real data this is trying to emulate has both blanks and 0's throughout. I also can't use "data[column] = data1", because I need to use other conditionals (=='D', =='E') at different times and with different mean and std.

Does anyone know how to solve this problem?

1 Answer
1

Came up with a solution, which involved creating a second function:

def DataSynthNormal(data, sel, column, fracFull, TotalRows, selRows, mean, std, abst=False): fraction = data.loc[data['A'] == sel, column].sample(frac = fracFull).index if abst: data1 = pd.DataFrame(np.absolute(np.random.normal(mean, std, round(selRows*fracFull)).astype('int64')), index=fraction).reindex(range(TotalRows)) else: data1 = pd.DataFrame(np.random.normal(mean, std, round(selRows*fracFull)).astype('int64'), index=fraction).reindex(range(TotalRows)) data[column] = data1

Here's the first one, works as you'd expect.

def DataSynthNormal2x(data, sel1, sel2, column, fracFull1, fracFull2, TotalRows, selRows1, selRows2, mean1, std1, mean2, std2, abst=False): fraction1 = data.loc[data['A'] == sel1, column].sample(frac = fracFull1).index fraction2 = data.loc[data['A'] == sel2, column].sample(frac = fracFull2).index if abst: data1 = pd.DataFrame(np.absolute(np.random.normal(mean1, std1, round(selRows1*fracFull1)).astype('int64')), index=fraction1).reindex(range(TotalRows)) data2 = pd.DataFrame(np.absolute(np.random.normal(mean2, std2, round(selRows2*fracFull2)).astype('int64')), index=fraction2).reindex(range(TotalRows)) else: data1 = pd.DataFrame(np.random.normal(mean1, std1, round(selRows1*fracFull1)).astype('int64'), index=fraction1).reindex(range(TotalRows)) data2 = pd.DataFrame(np.random.normal(mean2, std2, round(selRows2*fracFull2)).astype('int64'), index=fraction2).reindex(range(TotalRows)) data12 = data1.add(data2, fill_value=0) data[column] = data12

And the second, which takes double the inputs and combines it all in-function. These seem to work.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

3XjEHsPSaz28loD UmYrnbWjN,dNEwA Rm,pIzW,ci813so

搜尋此網誌

Fjhtyj