Problem I – (10 points)
Why is outlier mining important? Briefly describe the different approaches behind statistical-based outlier detection, distanced-based outlier detection, density-based local outlier detection, and deviation-based outlier detection.
Problem II – (10 points)
A group of students are linked to each other in a social network via advisors, courses, research groups, and friendship relationship. Present a clustering method that may partition students into different groups according to their research interest.
Problem III – (10 points)
What are the differences between visual data mining and data visualization? Data visualization may suffer from the data abundance problem. For example, it is not easy to visually discover interesting properties of network connections if a social network is huge with complex and dense connections. Propose a visualization method that may help people see through the network topology to the interesting features of a social network.
Problem IV – (10 points)
An e-mail database is a database that stores a large number of electronic mail (e-mail) messages. It can be viewed as a semi-structured database consisting mainly of text data. Discuss the following:
a. What can be mined from such an e-mail database?
b. Suppose you have roughly classified a set of your previous e-mail messages as junk, unimportant, normal, or important. What type of data mining problem or problem is/are this? Describe how a data mining system may take this as the training set to automatically classify new e-mail messages or unclassified ones.
Problem V – (10 points)
Suppose that your local bank has a data mining system. The bank has been studying your credit and debit card usage patterns. Noticing that you make many transactions at home renovation stores, the bank decides to contact you, offering information regarding their special loans for home improvements. Discuss how this may conflict with your right to privacy.
Problem VI – (50 points)
The President of the University has approached you, a professor who teaches a data mining class.He has heard about this incredible tool called data mining.He does not know much about the technology but he has decided to mine all of the databases in the university to gain “actionable knowledge” and wants you to be the project chief.
Describe your response to him. Be sure to address the benefits of data mining in the context of a university including what possible actionable knowledge that can be gained through this exercise.Outline a plan of action for implementing data mining at the university.Discuss all relevant issues and challenges and suggest how to address them. (Note: Any resemblance to real persons, living or dead is purely coincidental.)
Tags: data visualization The Outliers University of Maryland Global Campus Data Mining System general principles and algorithms The algorithm for relation extraction group detection