Find the second highest value in data frame in R


Find the second highest value in data frame in R



Hello for below data frame in R, may I know the simplest command (without using any additional library like deplyr) how to find the second highest salary and store the name of the employee in a variable named 2nd_high_employee?


EmployeeID EmployeeName Department Salary
----------- --------------- --------------- ---------
1 T Cook Finance 40000.00
2 D Michael Finance 25000.00
3 A Smith Finance 25000.00
4 D Adams Finance 15000.00
5 M Williams IT 80000.00
6 D Jones IT 40000.00
7 J Miller IT 50000.00
8 L Lewis IT 50000.00
9 A Anderson Back-Office 25000.00
10 S Martin Back-Office 15000.00
11 J Garcia Back-Office 15000.00
12 T Clerk Back-Office 10000.00





Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you.
– Sotos
Jul 3 at 6:58





do you need for each department?
– Onyambu
Jul 3 at 7:07




2 Answers
2



Next time you could consider to post a sample of your data using head(dput(x)), to ease SO members to read in your data.


df <- read.table(text = "
EmployeeID EmployeeName Department Salary

1 T Cook Finance 40000.00
2 D Michael Finance 25000.00
3 A Smith Finance 25000.00
4 D Adams Finance 15000.00
5 M Williams IT 80000.00
6 D Jones IT 40000.00
7 J Miller IT 50000.00
8 L Lewis IT 50000.00
9 A Anderson Back-Office 25000.00
10 S Martin Back-Office 15000.00
11 J Garcia Back-Office 15000.00
12 T Clerk Back-Office 10000.00", header = T)


second_high_employee <- tail(sort(df$Salary),2)[1]
second_high_employee
[1] 50000



BTW, it is not possible to start an object name with a number. You could check: ?make.names


?make.names



Also, for for each department you could do:


aggregate(Salary ~ Department, df, function(x) {tail(sort(x), 2)[1]})
Department Salary
1 Back-Office 15000
2 Finance 25000
3 IT 50000



In case there had been 2 top salaries of 80000 and you had wanted to find the second highest of 50000 again, you could have wrapped x or df$Salaray inside tail(sort(unique()), 2)[1]


x


df$Salaray


tail(sort(unique()), 2)[1]





Thanks a lot for the response, it helps, however considering there is only 1 employee with 50K as a salary, how would I print his name considering I use your command to fetch the second highest salary please?
– dipincoolwater
Jul 3 at 8:13





df[df$Salary %in% tail(sort(df$Salary),2)[1],]
– Len
Jul 3 at 8:14


df[df$Salary %in% tail(sort(df$Salary),2)[1],]





You could add $EmployeeName to this if you only want to print the name
– Len
Jul 3 at 8:15


$EmployeeName





Gives me the warning for NA values: Warning message: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' [1] emp_id emp_name salary start_date <0 rows> (or 0-length row.names)
– dipincoolwater
Jul 3 at 9:05






What does df[df$Salary %in% tail(sort(df$Salary),2)[1],] give?
– Len
Jul 3 at 9:14


df[df$Salary %in% tail(sort(df$Salary),2)[1],]



Using Base R: Finding the 2nd highest salary:



if you need the subset without taking into consideration the department:


subset(dat,sort(z<-rank(Salary),T)[2]==z)
EmployeeID EmployeeName Department Salary
7 J Miller IT 50000
8 L Lewis IT 50000



if taking into consideration the department:


unsplit(by(dat,dat$Department,function(x)subset(x,(y<-rank(Salary))==sort(y,T)[2])),rep(1:3,each=2))

EmployeeID EmployeeName Department Salary
10 S Martin Back-Office 15000
11 J Garcia Back-Office 15000
2 D Michael Finance 25000
3 A Smith Finance 25000
7 J Miller IT 50000
8 L Lewis IT 50000



Just for the employee name:


as.character(subset(dat,sort(z<-rank(Salary),T)[2]==z)[,2])
[1] "Miller" "Lewis"





subset(dat,sort(z<-rank(Salary),T)[2]==z) gives the complete row, I just want employee name as a final output..is there a way?
– dipincoolwater
Jul 3 at 9:11





@dipincoolwater check the edit
– Onyambu
Jul 3 at 9:16





Thanks a lot..Cheers!!
– dipincoolwater
Jul 3 at 9:27





lets say I wanna add the salaries of employees in IT department only..what woudl be the query?
– dipincoolwater
Jul 3 at 16:14






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?