data mining discussion 1 2

Answer any one question:

Consider the following definition of an anomaly: An anomaly is an object that is unusually influential in the creation of a data model.

(a) Compare this definition to that of the standard model-based definition of an anomaly, then expound on the differences?

(b) For what sizes of data sets (small, medium, or large) is this definition appropriate?


3. In one approach to anomaly detection, objects are represented as points in a multidimensional space, and the points are grouped into successive shells, where each shell represents a layer around a grouping of points, such as a convex hull. An object is an anomaly if it lies in one of the outer shells.

(a) To which of the definitions of an anomaly in Section 10.1.2 is this definition most closely related?

(b) Name two problems with this definition of an anomaly?


4. Association analysis can be used to find anomalies as follows. Find strong association patterns, which involve some minimum number of objects. Anomalies are those objects that do not belong to any such patterns. To make this more concrete, we note that the hyperclique association pattern discussed in Section 6.8? is particularly suitable for such an approach. Specifically, given a user-selected h-confidence level, maximal hyperclique patterns of objects are 159 found. All objects that do not appear in a maximal hyperclique pattern of at least size three are classified as outliers.

(a) Does this technique fall into any of the categories discussed in this chapter? If so, which one?

(b) Name one potential strength and one potential weakness of this approach…


5. Discuss techniques for combining multiple anomaly detection techniques to improve the identification of anomalous objects. Consider both supervised and unsupervised cases.

In the supervised case?

In the unsupervised approach?


6. Describe the potential time complexity of anomaly detection approaches based on the following approaches: model-based using clustering, proximity based, and density. No knowledge of specific techniques is required. Rather, focus on the basic computational requirements of each approach, such as the time required to compute the density of each object.

23 23 0 12

“Order a similar paper and get 20% discount on your first order with us Use the following coupon “GET20”