Spark | How to setup Apache Spark on a Windows Machine?

Spark | How to setup Apache Spark on a Windows Machine?

Setting up Apache Spark on a Windows machine can be a straightforward process if you follow the right steps. This guide will walk you through installing Java, configuring environment variables, downloading and setting up Spark, and finally running Spark on your Windows system. Let’s get started!

Step 1: Install Java -> Before installing Spark, you need to have Java installed on your system since Spark runs on Java Virtual Machine (JVM). Follow these steps to set up Java:

a) Download Java: Go to the Java download page (https://www.oracle.com/java/technologies/downloads/) and download the latest version of Java Development Kit (JDK) suitable for your Windows machine.

b) Install Java: Run the downloaded installer and follow the installation instructions. By default, Java will be installed at C:\Program Files\Java\jdk-<<'version'>>.

c) Set Up Environment Variables for Java:

  • i) JAVA_HOME: Set this to the installation directory, i.e., C:\Program Files\Java\jdk-<<'version'>>
  • .

    environment variable-1
    environment variable

  • ii) Path: Add C:\Program Files\Java\jdk-<<'version'>>\bin to your system’s PATH environment variable.
  • environment path
    environment path
  • iii) Verify Java Installation: Open the Command Prompt and type java –version. You should see the Java version information if the installation is correct.
    java version
    java version

Step 2: Download and Set Up Apache Spark -> Now that Java is installed, let’s proceed with Spark installation:

a) Download Apache Spark: Visit the Apache Spark downloads page (https://spark.apache.org/downloads.html) and download the latest version of Spark (Spark 3.5.2 at the time of writing). Extract the downloaded file to a directory of your choice, for example, D:\spark_setup\spark-3.5.2.
b) Set Up Environment Variables for Spark:

  • i) SPARK_HOME: Set this to your Spark directory, i.e., D:\spark_setup\spark-3.5.2.
  • environment variable
    environment variable
  • ii) Path: Add %SPARK_HOME%\bin to your system’s PATH environment variable.

Step 3: Set Up WinUtils -> Spark requires some Hadoop binaries to run on Windows, even though we won’t be using Hadoop itself. Winutils.exe is one such binary that allows Spark to interact with the Windows file system:

a) Download WinUtils: Download winutils.exe from this GitHub repository (https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin/winutils.exe). Place winutils.exe in a directory, for example, D:\spark_setup\hadoop\bin.
b) Set Up Environment Variables for Hadoop:

  • i) HADOOP_HOME: Set this to the path where winutils.exe is placed, i.e., D:\spark_setup\hadoop.
  • ii) Path: Add %HADOOP_HOME%\bin to your system’s PATH environment variable.

Step 4: Verify the Spark Installation:To verify if Spark has been set up correctly:

a) Open Command Prompt:Type spark-shell and press Enter. If everything is set up correctly, you should see the Spark shell starting up, with a welcome message indicating the Spark version.

spark shell
spark shell

Additional Resources:
For a detailed visual guide on setting up Spark, you can refer to this YouTube video (https://www.youtube.com/watch?v=FIXanNPvBXM&t=312s).
For a detailed visual guide on setting up Java, you can refer to this YouTube video (https://www.youtube.com/watch?v=SQykK40fFds).

By following these steps, you should have a fully functional Spark setup on your Windows machine. Now, you can start working on your data processing tasks using Spark!

Leave a Reply

Your email address will not be published. Required fields are marked *

📢 Need further clarification or have any questions? Let's connect!

Connect 1:1 With Me: Schedule Call


If you have any doubts or would like to discuss anything related to this blog, feel free to reach out to me. I'm here to help! You can schedule a call by clicking on the above given link.
I'm looking forward to hearing from you and assisting you with any inquiries you may have. Your understanding and engagement are important to me!

This will close in 20 seconds