generate column with capped values in data.table

Multi tool use
generate column with capped values in data.table
I have a data.table similar to the following one:
| ID | C1 | C2 | C3 |
| 1 | 1 | 20 | 400 |
| 2 | 0 | 15 | 500 |
| 3 | 0 | 2 | 350 |
| 4 | 1 | 4 | 402 |
| 5 | 1 | 8 | 333 |
And I want to generate a new column "CALC" applying a function like this one:
func <- function(a, b, c){
pol <- a*b-0.01*c
value <- min(max(pol, 0), 5)
}
What I thought it was going to give was me the pol solution in each row, capped by 0 and 5. So, instead of
| ID | C1 | C2 | C3 | CALC |
| 1 | 1 | 20 | 400 | 16 |
| 2 | 0 | 15 | 500 | -5 |
| 3 | 0 | 2 | 350 | -3.5 |
| 4 | 1 | 4 | 402 | -0.02 |
| 5 | 1 | 8 | 333 | 4.67 |
it should return
| ID | C1 | C2 | C3 | CALC |
| 1 | 1 | 20 | 400 | 5 |
| 2 | 0 | 15 | 500 | 0 |
| 3 | 0 | 2 | 350 | 0 |
| 4 | 1 | 4 | 402 | 0 |
| 5 | 1 | 8 | 333 | 4.67 |
But instead, what it gives is the following dt:
| ID | C1 | C2 | C3 | CALC |
| 1 | 1 | 20 | 400 | 5 |
| 2 | 0 | 15 | 500 | 5 |
| 3 | 0 | 2 | 350 | 5 |
| 4 | 1 | 4 | 402 | 5 |
| 5 | 1 | 8 | 333 | 5 |
So, if I'm not wrong, it's getting the max of the values in the column (capping it by 5) and putting in along the column. Not what I intented, which would be getting the max of each row alone.
Is there any "easy" way to fix it? The original dt is more complex, so it wouldn't be possible to simply do dt[, CALC := func(C1, C2, C3), by="ID"]
, as it should be something like by=.(GROUP1, GROUP2, ID)
, and I'm pretty sure it has to be simpler. Maybe there's an easy way through the function itself?
dt[, CALC := func(C1, C2, C3), by="ID"]
by=.(GROUP1, GROUP2, ID)
Thanks a lot for your time.
by=seq_len(dt[,.N])
It works for the reduced model I'm working with, but I can see how it takes it a lot of time for such a few rows (100, vs the real one of something like 10 million). Is there any other possibility?
– sneaky_lobster
Jul 3 at 9:10
check out Onyambu's vectorized
func
– chinsoon12
Jul 3 at 9:12
func
1 Answer
1
You need to use pmin
and pmax
pmin
pmax
func <- function(a, b, c){
pol <- a*b-0.01*c
pmin(pmax(pol, 0), 5)
}
transform(dat,s=func(C1,C2,C3))
ID C1 C2 C3 s
1 1 1 20 400 5.00
2 2 0 15 500 0.00
3 3 0 2 350 0.00
4 4 1 4 402 0.00
5 5 1 8 333 4.67
setDT(dat)[,CALC:=func(C1,C2,C3)]
dat
ID C1 C2 C3 CALC
1: 1 1 20 400 5.00
2: 2 0 15 500 0.00
3: 3 0 2 350 0.00
4: 4 1 4 402 0.00
5: 5 1 8 333 4.67
Thanks so much! That's the answer I was looking for.
– sneaky_lobster
Jul 3 at 9:14
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
maybe
by=seq_len(dt[,.N])
?– chinsoon12
Jul 3 at 8:59