aggregating columns based on commas

Multi tool use
Multi tool use


aggregating columns based on commas



I have the following dataframe and I'm trying to separate the commas and turn that particular name(s) into their own individual columns and specify if that particular column names exist (which are separated by commas) for that particular ID. (1 = Yes, 0 = No) Any help would be appreciated! Thanks!


ID<- c(1,2,3,4,5,6)
Details<- c("V1,V2", "V1,V3", "V1", "V2", "V3,V4", "V2,V3" )

data.frame <- data.frame(ID, Details, stringsAsFactors=FALSE)



DESIRED OUTPUT:


ID<-c(1,2,3,4,5,6)
V1<-c(1,1,1,0,0,0)
V2<-c(1,0,0,1,0,1)
V3<-c(0,1,0,0,1,1)
V4<-c(0,0,0,0,1,0)

data.frame1<-data.frame(ID, V1, V2, V3, V4, stringsAsFactors=FALSE)




5 Answers
5



A solution using the tidyverse package. dat is your example data frame. dat2 is the final data frame.


tidyverse


dat


dat2


library(tidyverse)

dat2 <- dat %>%
separate_rows(Details) %>%
mutate(Value = 1L) %>%
spread(Details, Value, fill = 0L)
dat2
# ID V1 V2 V3 V4
# 1 1 1 1 0 0
# 2 2 1 0 1 0
# 3 3 1 0 0 0
# 4 4 0 1 0 0
# 5 5 0 0 1 1
# 6 6 0 1 1 0



One option with mtabulate from qdapTools


mtabulate


qdapTools


library(qdapTools)
cbind.data.frame(ID, # or data.frame$ID
mtabulate(strsplit(as.character(data.frame$Details), ",")))
# output
ID V1 V2 V3 V4
1 1 1 1 0 0
2 2 1 0 1 0
3 3 1 0 0 0
4 4 0 1 0 0
5 5 0 0 1 1
6 6 0 1 1 0



Here is a base R solution. I have renamed your data.frames data1 and data2.


data1


data2


data1 <- data.frame(ID, Details, stringsAsFactors=FALSE)
data2 <- data.frame(ID, V1, V2, V3, V4, stringsAsFactors=FALSE)

nms <- unique(unlist(strsplit(data1$Details, ",")))
data3 <- cbind.data.frame(ID, sapply(nms, grepl, data1$Details))
data3[-1] <- lapply(data3[-1], as.integer)



Now compare data3 with your expected result data2.


data3


data2


all.equal(data2, data3)
#[1] TRUE



Note, however, that


identical(data2, data3)
#[1] FALSE



This is because I have used as.integer and the values in data2 are of class "numeric". If this makes a difference, you can change the lapply instruction above to use as.numeric.


as.integer


data2


"numeric"


lapply


as.numeric



using base R:


xtabs(val~.,cbind.data.frame(ID=rep(ID,lengths(s<-strsplit(Details,","))),Details=unlist(s),val=1))
Details
ID V1 V2 V3 V4
1 1 1 0 0
2 1 0 1 0
3 1 0 0 0
4 0 1 0 0
5 0 0 1 1
6 0 1 1 0



The most straightforward way I see is to would be to build a data.frame for each of these vectors hidden in strings and bind them. purrr can help to make it quite compact. Note that column ID isn't needed, I'll work on Details directly.


purrr


ID


Details


library(purrr)
df <- map_dfr(strsplit(Details, ","),
~data.frame(t(setNames(rep(1, length(.x)), .x))))
df[is.na(df)] <- 0

# V1 V2 V3 V4
# 1 1 1 0 0
# 2 1 0 1 0
# 3 1 0 0 0
# 4 0 1 0 0
# 5 0 0 1 1
# 6 0 1 1 0



You could also split and unlist to get distinct values, and then look them up in the original vector:


unique_v <- unique(unlist(strsplit(Details, ",")))
map_dfc(unique_v, ~as.numeric(grepl(.x, Details)))
# # A tibble: 6 x 4
# V1 V2 V3 V4
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 0
# 2 1 0 1 0
# 3 1 0 0 0
# 4 0 1 0 0
# 5 0 0 1 1
# 6 0 1 1 0



We could do some dirty string evaluation also if you know the number of columns:


m <- as.data.frame(matrix(0,ncol=4,nrow=6))
eval(parse(text=paste0("m[",ID,", c(",gsub("V","",Details),")] <- 1")))
# V1 V2 V3 V4
# 1 1 1 0 0
# 2 1 0 1 0
# 3 1 0 0 0
# 4 0 1 0 0
# 5 0 0 1 1
# 6 0 1 1 0






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

jzfveTdTI mAsQfR gxoPYLbm0eExgU,6YWnP6ABR
IaszI9SHHZrGH7WP8csI9uUBl3kN,TCloAS06

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications