As a data professional, one of the most important aspects of our job is to ensure that data is accurate, timely, and accessible for analysis. Two common approaches to data integration are ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform).
Category: Big Data
IBM says if any data can be characterized or classified by these 3 v’s (Volume, Variety and Velocity) then it is called as Big Data.
Volume : Scale of Data.
Variety : Different forms of data.
Velocity : Speed at which new data is generated.
Why Do We Need Bigdata Technology?
Why Big Data?
- To process huge amounts of data which traditional systems (like your pc/ laptop) are not capable of processing.
- To process huge amounts of data we need to store it first.
Example: Suppose we need to store 150 TBs of data, can our traditional system/ laptop which have 1 TB capacity store these huge amounts of data? No Right.
Introduction To Big Data
FORMAL DEFINATION GIVEN BY IBM IS:
Any data which is characterized by 3v’s is termed as “BIGDATA”.
These are:
1) Volume
2) Variety
3) Velocity
End to End Data Engineering Roadmap
End to End Data Engineering Roadmap:
Prerequisites:
—————-
1. Basic Linux commands.
2. Programming fundamentals.
3. SQL is very important.
How to prepare for Databricks Certified Associate Developer for Apache Spark Exam ?
In this post we will see the preparation strategy for Databricks Certified Associate Developer for Apache Spark Exam.
This certification exam assesses the understanding of the Spark DataFrame/SQL API and the ability to apply the Spark DataFrame/SQL API to complete basic data manipulation tasks within a Spark session.
This certification exam assesses the understanding of the Spark DataFrame/SQL API and the ability to apply the Spark DataFrame/SQL API to complete basic data manipulation tasks within a Spark session.
Common Data Engineer Theoretical Interview Questions
In this post, we will see the common Data Engineer Theoretical Interview Questions asked in Companies. Let’ s see the questions: