Big data Archives - BioChemiThon

TCS – Data Engineer Interview Questions – Set 2

December 14, 2025 by Ankit Rai

This blog is based on actual questions asked in a TCS Data Engineer Interview (Round 1) and, more importantly, explains what interviewers really expect behind each question. If you are preparing for TCS or similar data engineering roles, this guide will help you align your answers with practical thinking rather than theoretical explanations.

Interview ExperienceTagged Big data, Interview QuestionsLeave a Comment

25 Most Commonly Asked AWS Redshift Interview Questions (Big Data)

November 23, 2025January 4, 2026 by Ankit Rai

Amazon Redshift is one of the most popular cloud data warehouses used in modern big-data architectures. Whether you are building ELT pipelines, performing analytics at scale, or optimizing workloads, Redshift is a crucial skill for Data Engineers.

In this article, I’ve compiled the 25 most frequently asked AWS Redshift interview questions, along with answers.

Big DataTagged AWS, Big data, Interview Questions, RedshiftLeave a Comment

Databricks | Building an ETL Pipeline on Road Accident Data Using PySpark

September 21, 2025September 22, 2025 by Ankit Rai

When I started learning data engineering, I always wanted to try a real-world dataset instead of just “toy” examples. So I picked up the India Road Accident Dataset from Kaggle and built a complete ETL pipeline using PySpark and Delta Lake.

Note: This project is a sample ETL pipeline I built for learning and practice. It’s not production-ready, but it’s a great way to understand how raw data becomes analytics-ready data step by step.

In this blog, I’ll walk you through how I designed the pipeline using the Medallion Architecture (Bronze → Silver → Gold). Don’t worry if the terms sound heavy, I’ll explain everything in plain English

Big Data, Databricks, PySparkTagged Big data, databricks, ETL, pysparkLeave a Comment

How to Become a Data Engineer from a Non-Technical Background: A Step-by-Step Guide

July 26, 2025 by Ankit Rai

Are you interested in transitioning into data engineering, even though your background is not in technology? You’re not alone. Many people from fields like business, healthcare, or the arts dream of harnessing the power of data but worry that their lack of technical experience will hold them back. The good news: breaking into data engineering is absolutely possible—with a roadmap and determination.

GeneralTagged Big data, Data EngineerLeave a Comment

PySpark | How to Sort a Dataframe?

October 13, 2024October 13, 2024 by Ankit Rai

Sorting data is a fundamental task in data processing, whether for analysis, reporting, or data transformation. In PySpark, sorting a DataFrame is a common operation that allows you to organize your data based on one or more columns. PySpark provides multiple ways to sort data efficiently, even when dealing with large datasets distributed across clusters.
In this blog post, we’ll explore various methods to sort a DataFrame in PySpark, covering both ascending and descending orders, sorting by multiple columns, and handling null values during sorting.

Big Data, PySparkTagged Big data, pyspark basic, pythonLeave a Comment

PySpark | How to Filter Data in DataFrame?

September 29, 2024 by Ankit Rai

Filtering data is one of the most common operations you’ll perform when working with PySpark DataFrames. Whether you’re analyzing large datasets, preparing data for machine learning models, or performing transformations, you often need to isolate specific subsets of data based on certain conditions. PySpark provides several methods for filtering DataFrames, and this article will explore the most widely used approaches.

Big Data, PySparkTagged Big data, pyspark basic, pythonLeave a Comment

PySpark | How to Add a New Column in a Dataframe?

September 25, 2024September 25, 2024 by Ankit Rai

In PySpark, adding a new column to a DataFrame is a common and essential operation, often used for transforming data, performing calculations, or enriching the dataset. PySpark offers 3 main methods for this: withColumn(),select() and selectExpr(). These methods allow you to create new columns, but they serve different purposes and are used in different contexts.

This article will guide you through adding new columns using both methods, explaining their use cases and providing examples.

Big Data, PySparkTagged Big data, pyspark basic, pythonLeave a Comment

PySpark | How to Create a Dataframe?

September 23, 2024September 24, 2024 by Ankit Rai

In PySpark, a DataFrame is a distributed collection of data organized into named columns, similar to a table in a relational database or an Excel spreadsheet. DataFrames provide a powerful abstraction for working with structured data, offering ease of use, high-level transformations, and optimization features like catalyst and Tungsten. This article will cover how to […]

Big Data, PySparkTagged Big data, pyspark, pyspark basicLeave a Comment

PySpark | How to Create a RDD?

September 22, 2024 by Ankit Rai

Resilient Distributed Datasets (RDDs) are the core abstraction in PySpark, offering fault-tolerant, distributed data structures that can be operated on in parallel. Although the DataFrame API is more popular due to its higher-level abstractions, RDDs are still fundamental for certain low-level operations and are the building blocks of PySpark.

In this article, you’ll learn how to create RDDs in PySpark, the different ways to create them, and when you should use RDDs over DataFrames.

Big Data, PySparkTagged Big data, pyspark, pyspark basic, pythonLeave a Comment

PySpark | How to Create a Spark Session?

September 8, 2024 by Ankit Rai

Creating a Spark session is the first step when working with PySpark, as it allows you to interact with Spark’s core functionality. This article will walk you through the process of creating a Spark session in PySpark.

Big Data, PySparkTagged Big data, pyspark, pyspark basic, pythonLeave a Comment

Tag: Big data

Tag: Big data

TCS – Data Engineer Interview Questions – Set 2

25 Most Commonly Asked AWS Redshift Interview Questions (Big Data)

Databricks | Building an ETL Pipeline on Road Accident Data Using PySpark

How to Become a Data Engineer from a Non-Technical Background: A Step-by-Step Guide

PySpark | How to Sort a Dataframe?

PySpark | How to Filter Data in DataFrame?

PySpark | How to Add a New Column in a Dataframe?

PySpark | How to Create a Dataframe?

PySpark | How to Create a RDD?

PySpark | How to Create a Spark Session?

? Need further clarification or have any questions? Let's connect!