Creating star schema from csv files using Python


Creating star schema from csv files using Python



I have 6 dimension tables, all in the form of csv files. I have to form a star schema using Python. I'm not sure how to create the fact table using Python. The fact table (theoretically) has at least one column that is common with a dimension table.



How can I create the fact table, keeping in mind that quantities from multiple dimension tables should correspond correctly in the fact table?



I am not allowed to reveal the code or exact data, but I'll add a small example. File 1 contains the following columns: student_id, student_name. File 2 contains : student_id, department_id, department_name, sem_id. Lastly File 3 contains student_id, subject_code, subject_score. The 3 dimension tables are in the form of csv files. I now need the fact table to contain: student_id, student_name, department_id, subject_code. How can I form the fact table in that form? Thank you for your help.




2 Answers
2



Reading certain blogs look like it is not a good way to handle such cases in python in memory but still if the below post make sense you cn use it



Fact Loading



The first step in DW loading is dimensional conformance. With a little cleverness the above processing can all be done in parallel, hogging a lot of CPU time. To do this in parallel, each conformance algorithm forms part of a large OS-level pipeline. The source file must be reformatted to leave empty columns for each dimension's FK reference. Each conformance process reads in the source file and writes out the same format file with one dimension FK filled in. If all of these conformance algorithms form a simple OS pipe, they all run in parallel. It looks something like this.



src2cvs source | conform1 | conform2 | conform3 | load
At the end, you use the RDBMS's bulk loader (or write your own in Python, it's easy) to pick the actual fact values and the dimension FK's out of the source records that are fully populated with all dimension FK's and load these into the fact table.



Would you like to add any code you're currently stuck on? Please add a Minimal, Complete, and Verifiable example including the file content and expected output





I am not allowed to reveal the code or exact data, but I'll add a small example. File 1 contains the following columns: student_id, student_name. File 2 contains : student_id, department_id, department_name, sem_id. Lastly File 3 contains student_id, subject_code, subject_score. The 3 dimension tables are in the form of csv files. I now need the fact table to contain: student_id, student_name, department_id, subject_code. How can I form the fact table in that form? Thank you for your help.
– pack24
Jul 3 at 9:58






@AkshayVenkatesh Please edit your question to include that information. A comment to an answer which is not an answer is not the appropriate place for this.
– Adrian W
Jul 3 at 11:11





@AdrianW I have edited my question. However downvoting because I have not made an attempt/no code is irrational, since I have mentioned that I am strictly now allowed to disclose any code or data. I have made an attempt, but I am not allowed to disclose any information.
– pack24
Jul 3 at 11:42





@pack24 Please read how to ask. Don't assume everybody can guess what your problem is without background information. That's why a MCVE is usually required. If you can't provide that information because it is confidential, then it is probably not a good idea to ask a public place for help. If you can provide something similar enough to describe your problem without violating your confidentiality constraints, that's OK. Therefore I revoked my downvote now.
– Adrian W
Jul 3 at 15:40






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

iOS Top Alignment constraint based on screen (superview) height