How to manage Numpy arrays in Pandas DataFrames

Let's assume one has a DataFrame with some integers values and some arrays defined somehow:

df = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['rand_int']) array_a = np.arange(5) array_b = np.arange(7) df['array_a'] = df['rand_int'].apply(lambda x: array_a[:x]) df['array_b'] = df['rand_int'].apply(lambda x: array_b[:x])

Some questions which can help me understand how to manage Numpy arrays with Pandas DataFrames:

array_diff

So you want to multiply both array a and array b by the corresponding value in rand_int?
– user3483203
Jul 2 at 7:20

Not only that, but also define another column in df which is the np.setdiff1d between the rows in array_a and array_b. Thank you
– espogian
Jul 2 at 7:24

df

np.setdiff1d

array_a

array_b

1 Answer
1

I'd say it's better to work with NumPy and import data into the dataframe as a last step.

Anyway here's a solution that stores arrays into the dataframe step by step. Not really sure you actually want the outer product, it would be great if you could post the expected result.

np.random.seed(42) df = pd.DataFrame(np.random.randint(0, 100, size=(5, 1)), columns=['rand_int']) >>> df rand_int 0 51 1 92 2 14 3 71 4 60 df['a'] = np.split(np.outer(df['rand_int'], np.arange(5)), 5) df['b'] = np.split(np.outer(df['rand_int'], np.arange(7)), 5) >>> df rand_int a b 0 51 [[0, 51, 102, 153, 204]] [[0, 51, 102, 153, 204, 255, 306]] 1 92 [[0, 92, 184, 276, 368]] [[0, 92, 184, 276, 368, 460, 552]] 2 14 [[0, 14, 28, 42, 56]] [[0, 14, 28, 42, 56, 70, 84]] 3 71 [[0, 71, 142, 213, 284]] [[0, 71, 142, 213, 284, 355, 426]] 4 60 [[0, 60, 120, 180, 240]] [[0, 60, 120, 180, 240, 300, 360]] df['d'] = df.b.combine(df.a, func=np.setdiff1d) >>> df['d'] 0 [255, 306] 1 [460, 552] 2 [70, 84] 3 [355, 426] 4 [300, 360] Name: d, dtype: object

Note that np.split leaves an extra dimension, not sure if this can be avoided. You might want to remove it with np.squeeze

np.split

np.squeeze

>>> df['a'].apply(np.squeeze) 0 [0, 51, 102, 153, 204] 1 [0, 92, 184, 276, 368] 2 [0, 14, 28, 42, 56] 3 [0, 71, 142, 213, 284] 4 [0, 60, 120, 180, 240] Name: a, dtype: object

Really helpful thanks! This addresses my need. Unfortunately I did not get the notification, apologize for the late reply.
– espogian
Jul 5 at 9:37

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Fjhtyj