How to manage Numpy arrays in Pandas DataFrames


How to manage Numpy arrays in Pandas DataFrames



Let's assume one has a DataFrame with some integers values and some arrays defined somehow:


df = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['rand_int'])
array_a = np.arange(5)
array_b = np.arange(7)
df['array_a'] = df['rand_int'].apply(lambda x: array_a[:x])
df['array_b'] = df['rand_int'].apply(lambda x: array_b[:x])



Some questions which can help me understand how to manage Numpy arrays with Pandas DataFrames:


array_diff





So you want to multiply both array a and array b by the corresponding value in rand_int?
– user3483203
Jul 2 at 7:20





Not only that, but also define another column in df which is the np.setdiff1d between the rows in array_a and array_b. Thank you
– espogian
Jul 2 at 7:24


df


np.setdiff1d


array_a


array_b




1 Answer
1



I'd say it's better to work with NumPy and import data into the dataframe as a last step.



Anyway here's a solution that stores arrays into the dataframe step by step. Not really sure you actually want the outer product, it would be great if you could post the expected result.


np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 1)), columns=['rand_int'])
>>> df
rand_int
0 51
1 92
2 14
3 71
4 60

df['a'] = np.split(np.outer(df['rand_int'], np.arange(5)), 5)
df['b'] = np.split(np.outer(df['rand_int'], np.arange(7)), 5)

>>> df
rand_int a b
0 51 [[0, 51, 102, 153, 204]] [[0, 51, 102, 153, 204, 255, 306]]
1 92 [[0, 92, 184, 276, 368]] [[0, 92, 184, 276, 368, 460, 552]]
2 14 [[0, 14, 28, 42, 56]] [[0, 14, 28, 42, 56, 70, 84]]
3 71 [[0, 71, 142, 213, 284]] [[0, 71, 142, 213, 284, 355, 426]]
4 60 [[0, 60, 120, 180, 240]] [[0, 60, 120, 180, 240, 300, 360]]

df['d'] = df.b.combine(df.a, func=np.setdiff1d)
>>> df['d']
0 [255, 306]
1 [460, 552]
2 [70, 84]
3 [355, 426]
4 [300, 360]
Name: d, dtype: object



Note that np.split leaves an extra dimension, not sure if this can be avoided. You might want to remove it with np.squeeze


np.split


np.squeeze


>>> df['a'].apply(np.squeeze)
0 [0, 51, 102, 153, 204]
1 [0, 92, 184, 276, 368]
2 [0, 14, 28, 42, 56]
3 [0, 71, 142, 213, 284]
4 [0, 60, 120, 180, 240]
Name: a, dtype: object





Really helpful thanks! This addresses my need. Unfortunately I did not get the notification, apologize for the late reply.
– espogian
Jul 5 at 9:37






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

How to set up datasource with Spring for HikariCP?

Display dokan vendor name on Woocommerce single product pages