Top 10 things you must do as an ETL Developer
Every resource who do programming to process data in an Integration environment should follow this top 10 thing to become more productivity.
1. Understand the overall objective of the task.
- Are you loading data from source to staging OR staging to DW or DW to some other system. In what part of the pipeline / ETL you are working.
2. Understanding the source and source type (database, API, XML, JSON, flat file etc)
- Where does the source exists? Is it in the file system or its an RDBMS. Is it coming from enterprise system or coming from external source.
3. Visualize / Collect the sample data from source
- Visualize the sample data, if there is no sample data, create the test data which is equivalent to the realistic data by taking the help of subject matter expert if available.
4. Frequency of the source data generation and acceptance
- What is the frequency of data the source generate and how you identify the type of extract we should apply on the source data.
5. Understand the target database / files
-  What is the target file or database structure looks like? What is the PK of the target tables or how we uniquely identify the records in the file.
6. Visualize the data based on the sample source data
- Based on the sample input data, you have to apply business rules so that transformed data can be visualized in target structure or files.
7. Create mapping document (contains the source system, source file / table, target system, target file / table, what business rule you apply on the columns ).
- Use an excel sheet or use the standard mapping document according to the company you are working for. Make sure you have columns you identified as source and the target file / database columns and make sure you have the business rules documented as much as possible in the mapping document. By seeing the mapping document, you should know the work we have to do as ETL developer with the tool / program.
8. Identify the gaps between the source and required data in target
- After completing the mapping document, you should be able to figure out how much of columns we are yet to identify. By using this mapping document, we can sit with any one in the team to get it clarified so that one person can take the responsibility to close the same.
9. Develop the ETL job
- Use all the artifacts you collected till now and start creating the ETL job based on the mapping document.
10. Unit test the ETL
- Create the unit test document and make sure all the possible combinations of business rules. Identify the different ways of testing, row count testing or value testing according to the transformations we apply.
By applying the above-mentioned points, we should be able to make sure we develop good ETL programs with less issues. Happy Coding.