Select a large text document (at least 50,000 words) and conduct the analysis we did in class in week 10 (review class code file). For example, Project Gutenburg has complete books in text format: https://www.gutenberg.org/browse/scores/top but you don’t have to limit yourself to a book. Use any large text document.
Submit the code you used to complete the following tasks:
- Text cleanup and processing
- Regular expression, stemming, removing numbers, stop words, punctuation, and white spaces
- Create a Term Document Matrix
- Report descriptive stats on words and generate a word cloud
- Conduct a k-means cluster of the words and identify words that occur in first two clusters and interpret what these clusters mean
Submit 1 page executive summary of the results of this analysis, processing steps and what you learned from the analysis. Remember to submit both your code file and your one page executive summary.
ALL ATTACHED FILES ARE JUST EXAMPLES THAT TEACHER PROVIDED IN CLASS. The actual assignment is above.