Pandas — .add() resulting in TypeError: 'int' object is not iterable
Pandas — .add() resulting in TypeError: 'int' object is not iterable
I've run into a bit of a problem adding Pandas dataframes using the .add() method. I have a data generator I'm using to generate synthetic data along a normal distribtuion:
import pandas as pd
import numpy as np
def DataSynthNormal(data, sel, column, fracFull, TotalRows, SelRows, mean, std, abst=False):
fraction = data.loc[data['A'] == sel, column].sample(frac = fracFull).index
if abst:
data1 = pd.DataFrame(np.absolute(np.random.normal(mean, std, round(SelRows*fracFull)).astype('int64')), index=fraction).reindex(range(TotalRows))
else:
data1 = pd.DataFrame(np.random.normal(mean, std, round(SelRows*fracFull)).astype('int64'), index=fraction).reindex(range(TotalRows))
data[column] = data[column].add(data1, fill_value=0)
Using a this dataframe as an example:
empty = pd.DataFrame(columns=['A','B'], index=range(0,10))
empty.A[0:4] = "C"; empty.A[4:7] = "D"; empty.A[7:10] = "E"
print(empty)
A B
0 C NaN
1 C NaN
2 C NaN
3 C NaN
4 D NaN
5 D NaN
6 D NaN
7 E NaN
8 E NaN
9 E NaN
And running the data generator:
DataSynthNormal(empty, 'C', 'B', 0.8, 10, 4, 0, 1)
I get the following error:
Traceback (most recent call last):
File "", line 1, in
DataSynthNormal2(empty, 'C', 'B', 0.8, 10, 4, 0, 1)
File "", line 7, in DataSynthNormal2
data[column] = data[column].add(data1, fill_value=0)
File "C:UsersUserAnaconda3libsite-packagespandascoreops.py",
line 1358, in flex_wrapper
self.index).finalize(self)
File
"C:UsersUserAnaconda3libsite-packagespandascoreseries.py",
line 274, in init
raise_cast_failure=True)
File
"C:UsersUserAnaconda3libsite-packagespandascoreseries.py",
line 4163, in _sanitize_array
subarr = com._asarray_tuplesafe(data, dtype=dtype)
File
"C:UsersUserAnaconda3libsite-packagespandascorecommon.py",
line 317, in _asarray_tuplesafe
values = [tuple(x) for x in values]
File
"C:UsersUserAnaconda3libsite-packagespandascorecommon.py",
line 317, in
values = [tuple(x) for x in values]
TypeError: 'int' object is not iterable
I'm trying to use .add() here because it conserves NaN when two dataframes are added, as opposed to .fillna(0) (which has been outputting n x n matrices, for some reason). I want it to do this because the real data this is trying to emulate has both blanks and 0's throughout. I also can't use "data[column] = data1", because I need to use other conditionals (=='D', =='E') at different times and with different mean and std.
Does anyone know how to solve this problem?
1 Answer
1
Came up with a solution, which involved creating a second function:
def DataSynthNormal(data, sel, column, fracFull, TotalRows, selRows, mean, std, abst=False):
fraction = data.loc[data['A'] == sel, column].sample(frac = fracFull).index
if abst:
data1 = pd.DataFrame(np.absolute(np.random.normal(mean, std, round(selRows*fracFull)).astype('int64')), index=fraction).reindex(range(TotalRows))
else:
data1 = pd.DataFrame(np.random.normal(mean, std, round(selRows*fracFull)).astype('int64'), index=fraction).reindex(range(TotalRows))
data[column] = data1
Here's the first one, works as you'd expect.
def DataSynthNormal2x(data, sel1, sel2, column, fracFull1, fracFull2, TotalRows, selRows1, selRows2, mean1, std1, mean2, std2, abst=False):
fraction1 = data.loc[data['A'] == sel1, column].sample(frac = fracFull1).index
fraction2 = data.loc[data['A'] == sel2, column].sample(frac = fracFull2).index
if abst:
data1 = pd.DataFrame(np.absolute(np.random.normal(mean1, std1, round(selRows1*fracFull1)).astype('int64')), index=fraction1).reindex(range(TotalRows))
data2 = pd.DataFrame(np.absolute(np.random.normal(mean2, std2, round(selRows2*fracFull2)).astype('int64')), index=fraction2).reindex(range(TotalRows))
else:
data1 = pd.DataFrame(np.random.normal(mean1, std1, round(selRows1*fracFull1)).astype('int64'), index=fraction1).reindex(range(TotalRows))
data2 = pd.DataFrame(np.random.normal(mean2, std2, round(selRows2*fracFull2)).astype('int64'), index=fraction2).reindex(range(TotalRows))
data12 = data1.add(data2, fill_value=0)
data[column] = data12
And the second, which takes double the inputs and combines it all in-function. These seem to work.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.