In this post, we will see the common Data Engineer Theoretical Interview Questions asked in Companies.
Let’ s see the questions:
Basic Level Interview Questions:
- Explain Hadoop Architecture?
- What is 5 v’s of big data?
- What is default number of replica in Hadoop? Can you increase or decrease it?
- Difference between Hadoop (Gen1) and Hadoop (Gen2)?
- What is heartbeats in Hadoop? why is that important?
- Write down few Linux commands?
- What is partition, shuffling, sort in MapReduce?
- What is Record Reader?
- Explain Sqoop Eval Command?
- Explain different optimizations used in Sqoop?
- Explain Combiner in MapReduce?
- What is Yarn? Why is it used?
- Features of Sqoop? Explain significance of them?
- Explain Boundary Val’s Query? Explain the formula?
- Explain different Modes available in Sqoop that used in job execution?
- Difference between Target Vs Warehouse directory in Hive?
- What is split by command in Sqoop? when it is used?
- Explain Hive Architecture?
- Explain Transactional Processing Vs Analytical Processing?
- Difference between Hive and RDBMS?
- What is seek time in Hive?
- Difference between SQL Vs HQL?
- Explain UDFÂ concept in hive? How many types?
- What is views in hive?
- Explain Managed Table and External Table?
- Explain Spark Architecture?
- What is transformations and actions? Give some example of it.
Intermediate Level Interview Questions :
- Explain different no. of optimizations in hive?
- Explain types of Joins?
- What is Map side Join? What is Bucket Map Join and Sort Merge Bucket join(SMB)?
- Explain SCD Types in Hive?
- Explain File-formats in hive?
- Explain CAP Theorem?
- Explain RDD? Difference between RDD Vs Dataframe vs Dataset?
- Define Broadcast in Spark?
- Explain Catalyst optimizer?
- Difference between Client Mode vs Cluster Mode?
- Explain Cache & Persist in Spark?
- Explain Spark Performance Optimizations?
- Explain Spark Accumulators?
This article takes reference from given LinkedIn post:Post
Thank you for reading this post.