How to find duplicated Ruby methods with the same name but different code?


How to find duplicated Ruby methods with the same name but different code?



The very large Ruby codebase I am working with has many instances of duplicated methods defined with the same name but some of its code is different (causing a large race condition problem). The eventual end goal is to reconcile the duplicates and have just one version of the same-named method. First I need to find all versions of a method that deviate from the "control" version of that method. Is there an optimal way to search and for and find all instances of duplicated same-named methods that deviate from one defined version?



The duplicated methods are spread out across hundreds of different files and contained in one class. These are essentially helper methods that should have been centralized in one file but instead have been duplicated and often altered, but keeping the same method name. Right now I just need a good way to locate all the instances where these methods have been duplicated and are different from what the method should be.



I think Rubocop only searches for duplicated method names which is only moderately helpful since it could find 237 methods with the same name but I don't know how many of those methods are deviations from my "control" method without manually looking and comparing.



Some examples of a method redefined in files across multiple subdirectories:


def get_field(field_name)
return nil unless field = @global_vars.business.fields.find_by_identifier(field_name)
field.value.present? ? field.value : nil
end

def get_field(field_name)
@global_vars.business.fields.find_by_identifier(field_name).try(:value)
end

def get_field(field_name)
return nil unless field @company.fields.find_by_identifier(field_name)
field.value.present? ? field.value : nil
end

def get_field(field_name)
@property.fields.find_by_identifier(field_name).try(:value)
end



Thanks for your help!





This is quite a vague problem, especially since you mention "race conditions". Are the methods defined in different files? Different classes/modules? Why does the order in which files are loaded affect which method "takes precedence"? Can you provide a Minimal, Complete, and Verifiable example of the problem? There are all sorts of things you could try, but I'm not sure what to suggest, with such little information to go on.
– Tom Lord
Jul 2 at 18:37





Hey @TomLord, the duplicated methods are spread out across hundreds of different files and contained in one class. These are essentially helper methods that should have been centralized in one file but instead have been duplicated over and over and often altered, but keeping the same method name. Right now with this question I just need a good way to locate all the instances where these methods have been duplicated and are different from what the method should be. So it's more of a research and planning phase of dealing with a race condition.
– Toma Nistor
Jul 2 at 18:56





237 methods with the same name in the same Class? Many methods have the same name but different implementations for numerous reasons e.g. Integer#+ and String#+ should obviously differ, but I am not sure why you would have them redefined over and over in the same class.
– engineersmnky
Jul 2 at 18:56



Integer#+


String#+





Based on your above comment what you should do is write a module that contains the "control" method and then include it in the class and remove all the other references regardless of their implementation
– engineersmnky
Jul 2 at 18:57





You should clarify by editing your question, not trying to explain in comments.
– Cary Swoveland
Jul 2 at 19:06




1 Answer
1



My first thought was to execute each file of interest with additional code added on the fly to build a directory of methods and their locations. That clearly would not work, however, as exceptions could be expected to be raised almost immediately. Even if exceptions were avoided there would be no guarantee that that added code would be executed. In addition, there could be unintended adverse consequences of blinding running code.



I think the only reasonable approach would be to parse the files of interest. There may even be gems that do just that. It's certainly worth a search.



I have constructed a method that parses the files to build a hash containing the information desired. The main requirement for its use is that the files are formatted properly; specifically, the key words class, module and def must be indented the same number of spaces as their corresponding end keywords. It will therefore miss modules, classes and methods that are defined in-line, such as the following.


class


module


def


end


module M; end
class C; end
def im(n) 2*n end
def self.cm(n) 2*n end



If vertical alignment is a problem there certainly are gems that format code properly.



I chose a particular hash structure, but once that hash has been constructed it could be modified as desired. For example, I've adopted the hierarchy "instance methods->files->containers" ("containers" being modules, classes and top-level). One could easily modify that hash to change the hierarchy to, say, "container->module methods->files". Alternatively, one could enter the information into a database to maintain flexibility on how is used.



Code



The following regular expression is used to parse each line of each file of interest.


R = /
A # match beginning of string
(?<indent>[ ]*) # capture zero or more spaces, name 'indent'
(?: # begin non-capture group
(?<type>class|module) # capture keyword 'class' or 'module', name 'type'
[ ]+ # match one or more spaces
(?<name>p{Upper}p{Alnum}*) # capture an uppercase letter followed by
# >= alphanumeric chars, name 'name'
| # or
(?<type>def) # capture keyword 'def', name 'type'
[ ]+ # match one or more spaces
(?<name> # begin capture group named 'name'
(?:self.)? # optionally match 'self.'
p{Lower}p{Alnum}* # match a lowercase letter followed by
# >= 0 zero alphanumeric chars, name 'name'
) # close capture group 'name'
| # or
(?<type>end) # capture keyword 'end', name 'type'
b # match a word break
) # end non-capture group
/x # free-spacing regex definition mode



The method used for parsing follows.


def find_methods_by_name(files_of_interest)
files_of_interest.each_with_object({ imethod: {}, cmethod: {} }) do |fname, h|
stack =
File.readlines(fname).each do |line|
m = line.match R
next if m.nil?
indent, type, name = m[:indent].size, m[:type], m[:name]
case type
when "module", "class"
name = stack.any? ? [stack.last[:name], name].join('::') : name
stack << { indent: indent, type: type, name: name }
when "def"
if name =~ /Aself./
stack << { indent: indent, type: :cmethod, name: name[5..-1] }
else
stack << { indent: indent, type: :imethod, name: name }
end
when "end"
next if stack.empty? || stack.last[:indent] != indent
type, name = stack.pop.values_at(:type, :name)
next if type == "module" or type == "class"
((h[type][name] ||= {})[fname] ||= ) << (stack.any? ?
[stack.last[:type], stack.last[:name]].join(' ') : :main)
end
end
raise StandardError, "stack = #{stack} after processing file '#{fname}'" if stack.any?
end
end



Example



The files of interest might be, for example, all files in certain directories. In this example we have just two files.


files_of_interest = ['file1.rb', 'file2.rb']



Those files are as follows.


File.write('file1.rb',
<<_)
def mm
end
module M
def m
end
module N
def self.nm
end
def n
end
def a2
end
end
end

class A
def self.a1c
end
def a1
end
def a2
end
end

class B
include M
def b
end
end
_
#=> 327




File.write('file2.rb',
<<_)
def mm
end
module M
def m
end
module N
def n
end
def a2
end
end
end

module P
def p
end
end

class A
include M::N
def self.a1c
end
def a1
end
end

class B
include P
def b
end
end
_
#=> 335




h = find_methods_by_name(files_of_interest)
#=> {
# :imethod=>{
# "mm"=>{
# "file1.rb"=>[:main],
# "file2.rb"=>[:main]
# },
# "m"=>{
# "file1.rb"=>["module M"],
# "file2.rb"=>["module M"]
# },
# "n"=>{
# "file1.rb"=>["module M::N"],
# "file2.rb"=>["module M::N"]
# },
# "a2"=>{
# "file1.rb"=>["module M::N", "class A"],
# "file2.rb"=>["module M::N"]
# },
# "a1"=>{
# "file1.rb"=>["class A"],
# "file2.rb"=>["class A"]
# },
# "b"=>{
# "file1.rb"=>["class B"],
# "file2.rb"=>["class B"]
# },
# "p"=>{
# "file2.rb"=>["module P"]
# }
# },
# :cmethod=>{
# "nm"=>{
# "file1.rb"=>["module M::N"]
# },
# "a1c"=>{
# "file1.rb"=>["class A"],
# "file2.rb"=>["class A"]
# }
# }
# }



To eliminate files that appear only once, we can perform an additional step.


h.transform_values! { |g| g.reject { |k,v| v.size == 1 && v.values.first.size == 1 } }



This removes the instance method p and the class method nm.


p


nm






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?