In this post we will see the roadmap for Data Engineering / Big data.
End to End Data Engineering Roadmap:
Prerequisites:
—————-
1. Basic Linux commands.
2. Programming fundamentals.
3. SQL is very important.
You should learn the below things:
————————————–
1. Distributed Computing Fundamentals
2. Data Lake – HDFS/Amazon S3
3. One data ingestion tool
4. DWH concepts
5. One NOSQL Database
6. Functional programming – Scala/Python
7. In-memory computation using Apache Spark
8. Structured Streaming with Kafka for dealing with real time data
9. One of the Cloud – AWS/Azure/GCP
10. Integration of various components
11. One Scheduling/Monitoring tool – Airflow
12. Do a couple of projects (minimum 2-3) to get a good feel of it.
If you are having 8+ years of experience then focus on – Performance Tuning Part & Design Aspects
If you are targeting for Top Product based companies, then Data Structures (DS) And Algorithms is also required to some extent. Arrays, LinkedList and Trees should be good enough for the preparation.
Remember, don’t just learn to prepare for interviews. your objective should also be to effectively work on projects. So it’s important to focus more on internals/basics and this will be the best way to be interview ready.
Bonus Tip – The Big Data world is moving towards SQL way of doing things. Things are getting a lot easier & this trend will continue.
Reference: Data Engineering RoadMap