compare filenames to get the latest file using ID and timestemp information and merge into one file afterwards in spark scala

Multi tool use
Multi tool use


compare filenames to get the latest file using ID and timestemp information and merge into one file afterwards in spark scala



I have a scenario, where data files are arriving (anytime of the day and coming multiple times a day) in Hadoop HDFS and tables are present in Hive. Files arriving has special naming convention with timestamp information and ID (separated with “_”).



How can I merge the existing data present for each object in their table with incoming data, making sure the latest file record is merged?



I got the filenames in folder and I can separate them. How can I compare filenames to get the latest file using ID and timestemp information and merge into one file afterwards?


def getFilenames(fullpath: String) = {
val dir = new File(fullpath)
dir.listFiles.map(_.getName).flatMap(_.split("_"))
}



Thanks




2 Answers
2



pls provide sample filenames, will be able to give you precise solution. you can try create a list for all the filenames and sort using sortBy. last element of list will be latest file, read that file and create (RDD/DF).
for combining data-frames(previous and current) look into this enter link description here



Thanks Gaurav, Files naming convention is like:


....<Timestemp>_<ID>_<Filetype>_<Filename>
..../20180516064905_012_UTG_TEST.txt



to be more precise, i have to identify if file is UTG or some other type and use timestamp to update the record of same ID from latest file if that record is present is in multiple files. For example, all records from UTG files should be compared with current data and find out which records are applicable for update. If same record of particular record ID (for example, record ID 012) is updated multiple times in source systems and occurs in different upsert files, then while processing, latest record of should be used to update the target record.



I hope its clear. Thanks once again.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

vQ w6y EKV4yrj4xMKWWx9oy MsRnnG2sS6v5XxrilMRDQgSAeduiv34WQGYy879z75hx4,10s,h1,7ybz3L
LK,GQkeB5sQ0Aky h3DBPhcXYErLWyy3,XGlJxxZ6CU7LTfp3MgyiBfMenYhNQqbzp3RRj

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications