University of Cumberland Week

Categorical attributes refer to qualitative data in discrete values belonging to specific predetermined sets of classes. These attributes lack many number properties. Several techniques can be applied when handling categorical attributes. First is ordinary number encoding, which involves replacing ordinal categorical data with ordinal numbers according to ranks. Also, another technique involves frequency encoding, which replaces every category with the frequency of time the category occurred in a specific column (Tan et al., 2016). Another technique is the target and means coding. In mean coding, the miner replaces the category with the mean value regarding the target column.

Continuous attributes differ from categorical attributes in several ways. First, categorical attributes are qualitative, while continuous attributes are quantitative. In addition, continuous attributes involve measuring data, while categorical involve the grouping of data. Therefore, continuous attributes have the most number properties while categorical attributes lack number characteristics. As a result, the continuous attribute values contain real numbers and are represented as floating-point variables and therefore are only represented with limited precision (Tan et al., 2016). On the other hand, Categorical has countably infinite or finite value sets often represented using integers, and some take values of either 0 or 1.

Concept hierarchy in data mining describes a multilevel arrangement of innumerable concepts defined in a given domain. Therefore, concept hierarchy is explained depending on specific organization standard classification schemes or domain knowledge (Tan et al., 2016). Additionally, miners can represent the concept hierarchies using direct acrylic graphs (Tan et al., 2016).

The primary data patterns include subgraphs, infrequent and sequential patterns. Subgraph patterns use frequent subgraph mining to identify substructures commonly associated with known compounds’ specific properties. Sequential patterns involve finding statistical patterns relevant between examples of data (Delen et al., 2017). Here the values data examples are presented in a sequence form. On the other hand, inferential patterns involve using an example of a given language to identify patterns in the data. Therefore, the data is taken from samples and generalizations made regarding that specific population.

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized

University of Cumberland Week

Techniques in handling categorical attributes

Creating dummies

This method is quite useful when data is having fewer categorical columns with few categories. This method is easy to use because it handles categorical columns very fast. This method is disadvantaged in circumstances when there are many categorical columns.

Encoding of ordinal number

This method is mainly used when replacing categorical attributes which are ordinal with an ordinary number based on ranks. The advantage of this method is that it’s the easiest while handling an ordinary feature in the dataset. Its disadvantage is that it’s not best suited in handling nominal type features in a dataset.

  1. Frequency encoding

      In this technique, the categories are replaced with frequency. The advantage of this technique is that it’s easy to implement and has no effect on any features. The disadvantage is that it’s not able to monotonize categories.  

  1. Guided encoding

In this method, the category of the column is replaced with its probability ranking compared to the target column. Its advantage is that it has no effect on data volume; its disadvantage is that over fitting is brought about by joint probability encoding.

  1. Mean encoding

This technique is used when replacing a category with the mean value with respect to the specific column. This technique has an advantage of creating a monotonous relationship amongst the attributes that are independent. This technique is however disadvantageous because it leads to over fit.

  1. Probability ratio encoding

  In this technique, the category column is replaced by a probability ratio. The advantage of this technique is that it captures information across all categories hence creating more predictive features.

Differences between continuous and categorical attributes

Continuous attributes

Cartegorical attributes

They contain a finite number of distinct categories.

They are numerical and have an infinite number of values.

They are obtained by measuring

They are obtained by counting

Economical when gathering samples because they are usually less.

Expensive when collecting samples because they are usually large.

Concept hierarchy

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized