Big data has been quite a buzz for long time now. At times, newbies struggle to get their development environment ready specifically if you are using Spark with Scala.
I will try to show how to set up your Big Data development environment using Spark and Scala. I am assuming that your workstation is equipped with Java 8+.
Open your favourite browser and navigate to https://www.scala-lang.org/download/2.12.2.html and https://spark.apache.org/downloads.html
Scroll down to the respective downloadable links
For Scala : scala-2.12.2.tgz
For Spark : Spark 3.1.2 with package type: Pre-built for Apache Hadoop 3.2 and later
As soon as you click on the downloadable link, it should start downloading Scala/Spark tgz file. Once the file is downloaded successfully, go ahead and open your terminal and enter the tar
command with the x
flag, then press Return. To see progress messages, also use the v
flag.
# Goto directory where Spark/Scala is downloaded
tar -xvf scala-2.12.2.tgz
tar -xvf spark-3.1.2-bin-hadoop3.2.tgz
Next copy the uncompressed folder to the local directory.
# Goto directory where Spark/Scala is uncompressed
sudo cp -R scala-2.12.2 /usr/local/scala
sudo cp -R spark-3.1.2-bin-hadoop3.2 /usr/local/spark
Last resort would be to set the path for both Spark and Scala
# Open Terminal and enter below CMD to open ~/.zshrc
vi ~/.zshrc
# Add below lines
# Scala
export PATH=/usr/local/scala/bin:$PATH
# Spark
export PATH=/usr/local/spark/bin:$PATH
# And in terminal, enter source CMD
source ~/.zshrc
To verify run the below CMD in the terminal applications
scala -version
scalac -version
spark-shell
That’s all folks.