Assignment Task:

Option 1

Your task is to perform the following steps:

  • Go to the following tutorial link: https://www.infoq.com/articles/apache-spark-introduction
  • Follow all the steps necessary to install Spark locally
  • Answer question 1: How many times is the word “Hadoop” counted when the tutorial has printed out all the word counts?
  • Answer question 2: How many seconds did it take for the count job to complete when looking at the web console at http://[yourComputerIPaddress]:4040/jobs/, where [yourComputerIPaddress] is you computer IP address to which installation was performed


Option 2

Your task is to perform the following steps:

  • Register with DataBricks Community Edition by following this link: https://databricks.com/try-databricks.
  • Follow all the steps for DataBricks Quickstart Tutorial.
  • Answer question 1: How many times is the word “Good” appears in the Diamonds dataset?
  • Answer question 2: How many diamond’s with colors “J’ is in the diamonds dataset?

You goals are:

  • Follow and document each step from the tutorial link above.
  • Document all necessary screens shots and explanations for each step.
  • Explain each screen shot provided in your won words.
  • Answer all questions in your own words.

Submit WORD document containing the following:

  • Summary: provide a short summary of the assignment. The summary is a short statement of the most relevant steps of the assignment.
  • Content: all materials, findings and your comments during the execution each step (screen shots, explanations)
  • Answers: Answers to both questions in your own words with supporting screen shots
  • Comments: Describe the cons and pros of the performed tasks, difficulties, issues, how easy (or not) was it to execute and/or resolve them
  • Conclusions summary (1/2 – 1 page): Summarize your experience, findings, thoughts, comments, ideas for the future, etc.

