Spark Installation Guide (Windows)

This is a short guide on how to install the Hadoop add-on for Apache Spark manually.

Using PoolParty on Windows together with the Semantic Classifier makes ita prerequisite that you installthe Hadoop add-on manually. The application winutils.exe Spark relies on is not included in the official Spark installation package.

The PoolParty installer includes the Apache Spark framework, so follow the steps here in order to manually install the Apache Hadoop add-on.

Steps to Install the Apache Hadoop Add-on

  1. Download the hadoop bin folder from the github repo.

  2. Put everything from this folder into one destination folder <drive>:\<path-to-hadoop>\bin

  3. Set the HADOOP_HOME environment variable to: <drive>:\<path-to-hadoop>\

    • Make sure the folder structure of step 2 exists.

  4. Add the following environment variables to your path: %HADOOP_HOME% and %HADOOP_HOME%\bin

  5. We strongly recommend that you restart your computer to make sure the environment variable changes take effect.