Data Science – Using r-studio to finish it

I’m stuck on a Statistics question and need an explanation.

1 Sentiment Analysis of Les Miserables and A Tale of Two Cities

  • Look at how the sentiment changes across the length of a book by looking at 800 lines at a time.
  • Compare how sentiment changes in Victor Hugo’s Les Miserables and Charles Dickens’ A Tale of Two Cities.
  • Look at negative vs positive sentiment.
  • Then pick One sentiment like joy, or anger, or fear, or … and see how that sentiment compares.

1.1 Download Victor Hugo’s Les Miserables (135) and Dickens’ A Tale of Two Cities (98)

1.2 Tidy the two books with added line numbers and chapter numbers.

1.3 Measure the net sentiment using bing for each 100 lines and plot it for each book.

1.4 Graph all the sentiments for each book using a faceted graph.

1.5 Look at which words contribute to the positive and negative sentiment

  • Identify any to exclude as part of the sentiment.

1.6 Plot the top ten for each positive and negative sentiment

  • by book

1.7 Briefly summarize your analysis and any recommended next steps

2 tf-idf for Mark Twains books from Q1

2.1 Create a tf-idf ready dataframe for Mark Twain’s books from Q1

  • Remember to leave the stop words in the text
  • Re-download with the book title from the meta data fields

2.2 Calculate the tf-idf

2.3 Plot the tf for each book

  • Facet by book

2.4 look at terms with high tf-idf.

  • Across all books

2.5 Plot the the top 7 words from each book.

  • Sort by most frequent

2.6 Summarize your analysis and any recommended next steps

3 Extra Credit Podcasts

  • Choose One of the following podcasts and answer the questions below:
  1. Sentiment Preserving Fake Reviews
    The Original paper
  2. Data in Life: Authorsihp Attribution in Lennon-McCartney Songs
  3. Newsha Ajami| Improving Urban Water Systems Through Data Science, Public Policy and Engineering

3.1 What are some key ideas from this podcast relevant to text sentiment analysis/authorship attribution (1, or 2) or working with large diverse data sets (3)?

3.2 How do you think the ideas discussed may be relevant in your future work?

Order this or a similar paper and get 20% discount on your first order with us. Use coupon: GET20