compare filenames to get the latest file using ID and timestemp information and merge into one file afterwards in spark scala

Multi tool use
compare filenames to get the latest file using ID and timestemp information and merge into one file afterwards in spark scala
I have a scenario, where data files are arriving (anytime of the day and coming multiple times a day) in Hadoop HDFS and tables are present in Hive. Files arriving has special naming convention with timestamp information and ID (separated with “_”).
How can I merge the existing data present for each object in their table with incoming data, making sure the latest file record is merged?
I got the filenames in folder and I can separate them. How can I compare filenames to get the latest file using ID and timestemp information and merge into one file afterwards?
def getFilenames(fullpath: String) = {
val dir = new File(fullpath)
dir.listFiles.map(_.getName).flatMap(_.split("_"))
}
Thanks
2 Answers
2
pls provide sample filenames, will be able to give you precise solution. you can try create a list for all the filenames and sort using sortBy. last element of list will be latest file, read that file and create (RDD/DF).
for combining data-frames(previous and current) look into this enter link description here
Thanks Gaurav, Files naming convention is like:
....<Timestemp>_<ID>_<Filetype>_<Filename>
..../20180516064905_012_UTG_TEST.txt
to be more precise, i have to identify if file is UTG or some other type and use timestamp to update the record of same ID from latest file if that record is present is in multiple files. For example, all records from UTG files should be compared with current data and find out which records are applicable for update. If same record of particular record ID (for example, record ID 012) is updated multiple times in source systems and occurs in different upsert files, then while processing, latest record of should be used to update the target record.
I hope its clear. Thanks once again.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.