Use of Data Mining Algorithms Chaid and Cart in Predicting Egg Weight from Egg Quality Traits of Indigenous Free-Range Chickens in Zambia

| This study aimed to evaluate the performance of data mining algorithms Chi-square Automatic Interac-tion Detector (CHAID) and Classification And Regression Tree (CART) in predicting egg weight from different egg quality traits such as egg length (EL), egg width (EW), shell weight (SW), shell thickness (ST), albumen weight (AW), yolk height (YH), yolk width (YD) and yolk weight (YW). To attain this, 364 indigenous free-range chicken eggs were employed. The goodness of fit test was done to compare the predictive performance of these algorithms. Pearson correlation coefficient between the egg weight and the predicted egg weights was found to be 0.907 ( P < 0.01) for the CHAID algorithm. Coefficient of determination (R 2 ), Adjusted R 2 (Adj-R 2 ), Root Mean Square Error (RMSE), Relative approximation error (RAE), standard deviation ratio (SD ratio ) for the CHAID algorithm were estimated to be 0.823, 0.823, 2.23, 0.04, and 0.23 correspondingly. Pearson correlation coefficient between the egg weight and the predicted egg weights was found to be 0.771 ( P < 0.01) for the CART algorithm. The R 2 , Adj-R 2 , RMSE, RAE, SD ratio for the CART algorithm were estimated to be 0.593, 0.593, 2.32, 0.07, and 0.24 correspondingly. Given its stronger prediction accuracy (R 2 ), lower values of RMSE, and RAE, the CHAID algorithm can be recommended for analysis of egg quality traits of free-range indigenous chickens.


Advances in Animal and Veterinary Sciences
February 2021 | Volume 9 | Issue 2 | Page 216 spring that are genetically superior to their parent generations. This is done to maintain or improve production and productivity under the expected conditions of production. This improvement of the genetic potential of a particular population is attained by the selection of the best-performing animals for a particular trait to be parents of the next generation (Olfaz et al., 2019).
In making decisions for selection to improve desired traits, breeders deal with a lot of data consisting of a lot of factors which in most cases renders the data multidimensional. With more parameters being studied comes more complications in the data for breeders to interpret and make sense of (Olfaz et al., 2019;Liswaniso et al., 2020b). At this stage making mistakes leading to the design of ineffective breeding programs becomes unavoidable. Several studies have made use of linear regression to study the relationships between traits of economic importance such as egg traits. Regression mainly relies on the linearity of the relationship between traits under consideration. And a lot of studies have been done to establish the interrelations between traits of economic importance and egg weight is correlated highly with various egg trait characteristics (Yakubu et al., 2008;Abanikannda and Leigh, 2007;Ojedapo, 2013). An alternative to linear regression is the use of data mining techniques a method that has gained so much momentum in animal science recently. Data mining is the use of computer-based methods to discover information from data (Kantardzic, 2011). Decision tree methods have been preferred due to the advantages they possess in multicollinearity, missing data, and with no supposition on the spreading of independent variables (Mendeş and Akkartal, 2009). This is not the first time that data mining algorithms are being employed to analyze genetic potential existent in animal populations. Data mining algorithms were exploited to guesstimate cold carcass weight and body weight from diverse biometric linear dimensions (Karabacak et al., 2017), to estimate egg weights of quail eggs from some internal and external egg characteristics (Çelik et al., 2017), to assess egg quality characteristics influencing fertility in the eggs of Japanese quails (Çelik et al., 2016). In sheep, CHAID and CART algorithms were used in Karayaka sheep breeding (Olfaz et al., 2019). Data mining has also been used in dairy to analyze factors affecting the first lactation milk yield (Mikail and Bakır, 2019). The CART algorithm was used by Tyasi et al. (2020) to proximate chicken body weight from their linear body biometric traits. However, in the evaluation of four dissimilar data mining algorithms, Ali et al. (2015) presented goodness of fit benchmarks for valuation of body weight using some morphological traits.
With so much available literature on the use of data mining algorithms in the analysis of data in animal science, there is yet to be available information with regards to the utilization of data mining to scrutinize indigenous free-range chicken eggs for selection and breeding. This is why this study was conducted to employ the CHAID and CART algorithms to generate decision trees that would establish principal influencers of egg weight. We also aimed at establishing which algorithm would be recommended in selecting for egg weight improvements between CART and CHAID algorithm.

mATERIAlS AND mETHODS
This study was conducted in Lusaka, Zambia. 364 eggs of free-range indigenous chicken were collected from Lusaka, Zambia for this study. The egg traits were measured as described previously (Liswaniso et al., 2020a).
In both the CART and CHAID model, egg length (EL), egg width (EW), shell weight (SW), shell thickness (ST), albumen weight (AW), yolk height (YH), yolk width (YD) and yolk weight (YW) were exploited as independent variables to foretell egg weight (EW). Nonetheless, only expressively meaningful variables were taken along in the tree regression model.
There are three data mining algorithms in SPSS statistical software and in this study we used CART and CHAID. These have been established to be suitable to substantiate the relationship among scale-dependent variables and numerous independent variables that may comprise both scale and categorical structure.
In the CHAID Algorithm, a prediction model is established to define how variables best combine to describe the outcome in a given dependent variable. In CHAID, continuous prognosticators are divided into categories that have approximately alike number of observations. In CHAID analysis, ordinal data Continuous and nominal data can be used. CHAID analyses non-binary data by fragmenting independent variables into clusters centered on the chisquare statistic. CHAID Clusters populations into subgroups such that variation in a dependent variable in the group is minimized while maximizing variations between groups (Ratner, 2003;Dogan, 2003). In the Classification and Regression tree (CART), there is DATA splitting into subgroups that are as uniform as there can be with regards to the reliant variable (Mikail and Bakır, 2019).
To establish which algorithm was the best between the CART and CHAID algorithm, we used SPSS 23 to determine the goodness of fit. The following formulae were used as criteria for quality as prescribed by (Grzesiak and Zaborski, 2012) NE US

Advances in Animal and Veterinary Sciences
February 2021 | Volume 9 | Issue 2 | Page 217 Where, Yi, the actual Egg weight (g) of the ith egg; Ŷi, the predicted egg weight value of ith egg; Ȳ, average of the actual egg weight the ith egg; Ɛi, the residual value of ith egg; έ average of the residual values; k, number of significant independent variables in the model; and n, the total number of eggs. The residual value of each egg is expressed as Ɛi= Yi -Ŷi.

RESUlTS AND DISCUSSION
Despite much literature being available on the use of data mining algorithms, there is still yet to be enough literature on the use of these methods in indigenous free-range chicken eggs. And so, this study aimed at using CHAID and CART algorithms to estimate the egg weight of indigenous free-range chickens. Also to assess which of the two data mining algorithms would be recommended for use in breeding for egg production in free-range chickens. Figure 1 shows a regression tree generated by the CHAID algorithm for the estimation of egg weight from a variety of variables, egg length, egg width, shell thickness, shell weight, albumen weight, yolk height, yolk weight, and yolk width factors. A look into the decision tree generated by the CHAID algorithm depicts the shell weight as a principal factor influencing the egg weight trait (Adj. P-value =0.00, F=417.327, df1=4, df2=359). In the second level of influencers of egg weight was found to be egg width and albumen weight. The third line of influencers had shell thickness and yolk weight.

Figure 1: The decision tree generated by CHAID
The overall egg weight was estimated to be 49.723 g (S=5.886) in node 0 where 364 eggs were present in the top section of the CHAID generated decision tree. Node 0 which was the root node subdivided all the eggs into 5 subgroups based on their shell weight. The egg weight by order was node 1 < node 2 < node 3< node 4< node 5.
Node 1 was the first subgroup (n=108) with eggs of shell weights less or equal to 5.820 g. These had a mean weight of 44.592 g and S=2.328. Node 2 comprised eggs with 5.820 g <shell weight ≤ 6.270 g. These had an estimated weight of 47.208 g (S= 1.911). By number, 72 eggs were in this category. Node 3 with a mean egg weight of 50.093 g (S= 3.267) comprised the third subgroup of eggs with eggshell weight in the range of 6.270 g <shell weight ≤ 6.540 g. Eggs with eggshell weight in the range 6.540 g <shell weight ≤7.250 g comprised node 4. These had 112 eggs with a predicted egg weight of 51.833 g (S= 2.464). Eggs with an eggshell weight greater than 7.250 g comprised node 5. 36 eggs belonged to node 5 and had a mean weight of 63.212 g (S=3.115).
Node 1 was subdivided into 3 subgroups based on their egg width (Adj. p-value=0.000, F=112.838, df1=2 and df2=105). The three (3) division comprised of nodes 6, 7, and 8. Eggs with an egg width of less than 38.280 mm were grouped in node 6. Eggs in node 6 were 24 by number with a predicted weight of 41.452 g (S=2.061). Eggs with an egg width of 38.280 mm < egg width ≤ 39.130

Advances in Animal and Veterinary Sciences
February 2021 | Volume 9 | Issue 2 | Page 218 comprised node 7 with a mean/predicted weight of 44.088 g and S value of 1.195. Node 7 had 30 eggs. Node 8 was the last of the subdivision in node 1. It comprised eggs that had an egg width greater than 39.130 mm. The 54 eggs in node 8 had a mean egg weight of 46.268 g (S=0.922). Node 7 was further subdivided into two nodes based on shell thickness (Adj. p-value=0.000, F=24.07, df1=1, df2=28). Eggs with less or equal to 0.333 mm in shell thickness belonged to node 19 with a predicted Egg weight of 44.740 g (S=1.144). This node had 18 eggs. The 30 eggs with a shell thickness thicker than 0.333 mm from among those in node 7 composed node 20 which had a mean weight of 43.110 g. Node 8 saw a subdivision into two subgroups based on shell thickness (Adj. p-value=0.000, F=87.862, df1=1, and df2=52). Eggs with shell thickness ≤ 0.347 mm formed the cluster in node 21. Node 21 had a group of 30 eggs with a mean weight of 45.620 g. Eggs with an shell thickness greater than 0.347 mm comprised node 22 which had 24 eggs with a mean weight of 47.078 g.

Figure 2:
The decision tree generated by CART Node 1 was split into node 3 and node 4 based on egg width. Eggs with egg width less or equal to 40.370 mm (n=192) and those with egg width greater than 40.370 mm (n=124). These had predicted weights of 45.742g (S=2.455) and 51.386 g (S=2.390) respectively.
Based on egg width, node 3 was split into 2 nodes. Those with egg width less or equal to 38.980 mm (node 7, n= 60) and those with egg width larger than 38.980 mm (node 8, n=132). The predicted weights for nodes 7 and 8 were 43.164 g (S=2.112) and 46.914 g (S=1.534) respectively. Eggs in node 7 were further subdivided into eggs with egg length less or equal 51.480 mm and those with egg length more than 51.480mm to form node 13 and node 14 which had predicted weights of 39.755g (S=0.548) and 44.016 g (S=1.360) respectively. From node 14 was born nodes 19 and node 20 based on egg length with 24 eggs in each node having a predicted weight of 42.865g (EL ≤54.060 mm) and 45.168 (EL >54.060 mm) correspondingly.
In this study, our results showed that the separation of the first node (node 0) was based on egg length in the CART algorithm-generated regression tree and was based on the eggshell weight in the CHAID generated regression tree. This is an indication that the predictor of egg weight was dependent on the stated algorithm. The CHAID algorithm engendered a three-level branching tree while the CART algorithm created a tree with five levels of branching. It therefore can be hypothesized here that the CHAID algorithm could be interpreted much easier. This hypothesis stands akin to the finding of other similar studies (Olfaz et al., 2019;Ali et al., 2015). Table 1 shows the goodness of fit results. Pearson correlation coefficient between the egg weight and the predicted egg weights was found to be 0.907 (P <0.01) for the CHAID algorithm. The R 2 , Adj-R 2 , RMSE, RAE, SD ratio for the CHAID algorithm were estimated to be 0.823, 0.823, 2.23, 0.04, and 0.23 correspondingly. Pearson correlation coefficient between the egg weight and the predicted egg weights was found to be 0.771 (P <0.01) for the CART algorithm. The R 2 , Adj-R 2 , RMSE, RAE, SD ratio for the CART algorithm were estimated to be 0.593, 0.593, 2.32, 0.07, and 0.24 correspondingly.
Given the stronger prediction accuracy (R 2 ), lower values of RMSE and RAE exhibited by the CHAID predicted algorithm compared to the CART algorithm, the CHAID algorithm can be recommended for use in the analysis of egg quality traits as well as to estimate egg weight from quality traits of free-range indigenous chickens.