R - Nested list to (wide) dataframe

R - Nested list to (wide) dataframe

I currently have the following problem: I extracted some data via the crunchbase API, resulting in a big nested list of the following structure (there are many more nested lists on several instances included, I here only display the part of the structure currently relevant for me):

> str(x[[1]]) $ uuid : chr "5f9957b0841251e6e439d757XXXXXX" $ relationships: List of 27 ..$ websites: List of 3 .. ..$ cardinality: chr "OneToMany" .. ..$ items :'data.frame': 4 obs. of 7 variables: .. .. ..$ properties.website_type: chr [1:4] "homepage" "facebook" "twitter" "linkedin" .. .. ..$ properties.url : chr [1:4] "http://www.example.com" "https://www.facebook.com/example" "http://twitter.com/example" "http://www.linkedin.com/company/example"

Consider the following minimal example:

x <- list() x[[1]] <- list(uuid = "123", relationships = list(websites = list(items = list( properties.website_type = c("homepage", "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) ) ) ) x[[2]] <- list(uuid = "987", relationships = list(websites = list(items = list( properties.website_type = c("homepage", "facebook", "twitter" ), properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) ) ) )

Now, I would like to create a dataframe with the following column structure:

> x.df uuid web.url web.facebook web.twitter web.linkedin 1 123 www.example1.com www.fbex1.com www.twitterex1.com www.linkedinex1.com 2 987 www.example2.com www.fbex2.com www.twitterex2.com <NA>

Meaning: I would like to have every uuid (a unique firm identifier) in a single column, followed by the urls of the different platforms (fb, twitter...). I tried a lot of different things with a combination of lapply(), spread(), and row_bind(), yet didn't manage to make anything work. Any help on that would be appreciated.

lapply()

spread()

row_bind()

Please provide a sample of your data using dput
– docendo discimus
Jun 25 at 10:11

dput

Done. I added a downloadable link for a few datapoints.
– Daniel S. Hain
Jun 25 at 15:27

please make a minimal example instead of a 1000-line file to a link that may break at any time. See how to make a reproducible example
– Calum You
Jun 25 at 21:40

Done. Hope now it is clear.
– Daniel S. Hain
Jun 26 at 7:02

1 Answer
1

dplyr approach could be

dplyr

library(dplyr) library(tidyr) #convert list to dataframe in long format df <- do.call(rbind, lapply(x, data.frame, stringsAsFactors = FALSE)) #final result df1 <- df %>% spread(relationships.websites.items.properties.website_type, relationships.websites.items.properties.url)

which gives

uuid facebook homepage linkedin twitter 1 123 www.fbex1.com www.example1.com www.linkedinex1.com www.twitterex1.com 2 987 www.fbex2.com www.example2.com <NA> www.twitterex2.com

Sample data:

x <- list(structure(list(uuid = "123", relationships = structure(list( websites = structure(list(items = structure(list(properties.website_type = c("homepage", "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com" )), .Names = c("properties.website_type", "properties.url" ))), .Names = "items")), .Names = "websites")), .Names = c("uuid", "relationships")), structure(list(uuid = "987", relationships = structure(list( websites = structure(list(items = structure(list(properties.website_type = c("homepage", "facebook", "twitter"), properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com")), .Names = c("properties.website_type", "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", "relationships")))

Update: In order to fix below error

Error in (function (..., row.names = NULL, check.rows = FALSE,
check.names = TRUE, : arguments imply differing number of rows: 1,
0

you would need to remove corrupted elements from input data where website_type has one value but properties.url has NULL. Run this chunk of code as a pre-processing step before executing the main solution:

website_type

properties.url

NULL

idx <- which(sapply(x, function(k) is.null(k$relationships$websites$items$properties.url))) x <- x[-idx]

Sample data to test this pre-processing step:

x <- list(structure(list(uuid = "123", relationships = structure(list( websites = structure(list(items = structure(list(properties.website_type = c("homepage", "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com" )), .Names = c("properties.website_type", "properties.url" ))), .Names = "items")), .Names = "websites")), .Names = c("uuid", "relationships")), structure(list(uuid = "987", relationships = structure(list( websites = structure(list(items = structure(list(properties.website_type = "homepage", properties.url = NULL), .Names = c("properties.website_type", "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", "relationships")), structure(list(uuid = "345", relationships = structure(list( websites = structure(list(items = structure(list(properties.website_type = "homepage", properties.url = NULL), .Names = c("properties.website_type", "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", "relationships")))

Great, that generally seems to be what I need. Runs perfectly with the example. However, when I try it with my full dataset, I always get an error message: "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0" Any idea what the problem could be?
– Daniel S. Hain
Jun 26 at 15:06

Probably you have an element in your sample data wherein number of values in $relationships$websites$items$properties.website_type & $relationships$websites$items$properties.url is not matching. Because of this data.frame is throwing this error. So first you need to think on how do you want to handle such cases i.e. website_type is there but url is missing.
– Prem
Jun 26 at 17:34

$relationships$websites$items$properties.website_type

$relationships$websites$items$properties.url

data.frame

Indeed, on that, you are probably right! I didnt consider that case. In case the url is missing, it in the optimal case should be an NA.
– Daniel S. Hain
Jun 26 at 20:54

I think you are missing a point here. Consider this example and let me know the desired output -

x <- structure(list(uuid = "123", relationships = structure(list(websites = structure(list(     items = structure(list(properties.website_type = c("homepage",      "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com",      "www.fbex1.com", "www.linkedinex1.com")), .Names = c("properties.website_type",      "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid",  "relationships"))

Here twitter has no url in this example and gives the same error.
– Prem
Jun 27 at 7:06

x <- structure(list(uuid = "123", relationships = structure(list(websites = structure(list( items = structure(list(properties.website_type = c("homepage", "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", "www.fbex1.com", "www.linkedinex1.com")), .Names = c("properties.website_type", "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", "relationships"))

Alright, it works! Thank you so much!
– Daniel S. Hain
Jul 9 at 16:48

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

l Dhegu,drxAj5Y,QXU z9h8aJ81oX,UElRyjs,IdRZXy6K1JD2XAB3c5l3PL,D2C71oJLUEMfeUfwetu,GmzJ

搜尋此網誌

Fjhtyj