How to flat map a dataframe with unequal lists?

Multi tool use
Multi tool use


How to flat map a dataframe with unequal lists?



I have a data frame as follows-


a | b | c
[1] | [3,4,5,6] | [7,8,9,10]



i need the output as


a | b | c
1 3 7
1 4 8
1 5 9
1 6 10



Currently i am getting output as below upon using the following statement-


cols=['a','b','c']
df.rdd.flatMap(lambda x: itertools.izip_longest(*[x[c] for c in cols])).toDF(cols)

a | b | c
1 3 7
null 4 8
null 5 9
null 6 10



The number of columns are not fixed. If the solution is a generalized it would be helpful.




1 Answer
1




One option is to use itertools.repeat to repeat each array whose length is less than length of the longest array.


itertools.repeat


from itertools import izip_longest, repeat, chain

cols = df.columns
def zip_with_repeat(row, cols):
M = max(len(row[c]) for c in cols)
return izip_longest(
*[list(chain(*repeat(row[c], (M - len(row[c])) + 1)))[:M] for c in cols]
)

df.rdd.flatMap(lambda row: zip_with_repeat(row, cols)).toDF(cols).show()
#+---+---+---+
#| a| b| c|
#+---+---+---+
#| 1| 3| 7|
#| 1| 4| 8|
#| 1| 5| 9|
#| 1| 6| 10|
#+---+---+---+



For illustrative purposes, suppose you instead had the following DataFrame:


#+--------+------------+-------------+
#| a| b| c|
#+--------+------------+-------------+
#| [1]|[3, 4, 5, 6]|[7, 8, 9, 10]|
#|[10, 20]|[30, 40, 50]| [70, 80, 90]|
#+--------+------------+-------------+



The code would produce:


#+---+---+---+
#| a| b| c|
#+---+---+---+
#| 1| 3| 7|
#| 1| 4| 8|
#| 1| 5| 9|
#| 1| 6| 10|
#| 10| 30| 70|
#| 20| 40| 80|
#| 10| 50| 90|
#+---+---+---+



Notice that the 10 repeated once to fill in the array in column a to the appropriate length.


10


a





Thanks so much @pault you are a life saver
– Visualisation App
Jul 2 at 16:38






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

h,yQWGl I,Oye u5ou3VedW,JWVHz0hlFaeNmCbYdH1
2KD9FITKEBb VuU4 a2TRWsxPK2bJb WY

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications