Categorical attributes refer to qualitative data in discrete values belonging to specific predetermined sets of classes. These attributes lack many number properties. Several techniques can be applied when handling categorical attributes. First is ordinary number encoding, which involves replacing ordinal categorical data with ordinal numbers according to ranks. Also, another technique involves frequency encoding, which replaces every category with the frequency of time the category occurred in a specific column (Tan et al., 2016). Another technique is the target and means coding. In mean coding, the miner replaces the category with the mean value regarding the target column.
Continuous attributes differ from categorical attributes in several ways. First, categorical attributes are qualitative, while continuous attributes are quantitative. In addition, continuous attributes involve measuring data, while categorical involve the grouping of data. Therefore, continuous attributes have the most number properties while categorical attributes lack number characteristics. As a result, the continuous attribute values contain real numbers and are represented as floating-point variables and therefore are only represented with limited precision (Tan et al., 2016). On the other hand, Categorical has countably infinite or finite value sets often represented using integers, and some take values of either 0 or 1.
Concept hierarchy in data mining describes a multilevel arrangement of innumerable concepts defined in a given domain. Therefore, concept hierarchy is explained depending on specific organization standard classification schemes or domain knowledge (Tan et al., 2016). Additionally, miners can represent the concept hierarchies using direct acrylic graphs (Tan et al., 2016).
The primary data patterns include subgraphs, infrequent and sequential patterns. Subgraph patterns use frequent subgraph mining to identify substructures commonly associated with known compounds’ specific properties. Sequential patterns involve finding statistical patterns relevant between examples of data (Delen et al., 2017). Here the values data examples are presented in a sequence form. On the other hand, inferential patterns involve using an example of a given language to identify patterns in the data. Therefore, the data is taken from samples and generalizations made regarding that specific population.