The analysis of variance (ANOVA) for all the traits under study revealed significant difference among genotypes for all the characters in both seasons confirmed the presence of substantial variability in the experimental material. The correlation coefficient measures the interdependence between various plant characters and determines the component characters on which selection can be relied upon for genetic improvement of yield. The results of correlation are presented in the Fig 1. Among the five yield contributing characters evaluated, plant height and total number of pods per plant illustrated a significant positive correlation with grain yield, similar results were also reported by
Sandhiya and Saravanan (2018). Whereas the primary branches per plant and test weight (100 seed weight) were recorded positive but non-significant correlations with grain yield, which is in agreement with the findings of
Sheetal et al., (2014).
With respect to inter trait correlations among yield attributing traits, 100 seed weight showed a significant positive association with plant height whereas it exhibited a significant negative correlation with 50 percent flowering and primary branches per plant. These findings in accordance with the results of
Din et al., (2015). Furthermore, primary branches per plant and total number of pods per plants had a significant positive correlation with days to 50 per cent flowering, corroborating earlier reports by
Punia et al., (2014) and
Shanthi et al., (2024).
When evaluating a large number of advanced genotypes, the presence of a broad spectrum of genetic variability is essential for the effective identification of superior genotypes for further advancement. Multivariate analysis serves as a robust statistical approach for the comprehensive evaluation of advanced homozygous genotypes and facilitates the identification of promising genotypes based on key agronomic traits
(Rabbani et al., 1998). Principal Component analysis (PCA) biplot is a graphical technique that enables the simultaneous assessment of relationships among genotypes and their associated traits. The results of PCAs are presented in Table 1. The PCA clearly revealed that the first two principal components, with egienvalues greater than unity, accounted for 68.56 percentage of the total variation present in the dataset (Table 1; Fig 2). Amongst the two PCs, the PC1 with an eigenvalue of 2.3 explained the highest proportion of the total variance (36.27) and was primarily associated with plant height, indicating that this trait contributed substantially to variability. Similar report was also mentioned by
Thippani et al., (2017). The second principal component (PC2), with an eigenvalue of 1.75 accounted for 29.78 per cent of total variance reflected the significant loading of grain yield per hectare, which was also reflected in its positive correlation with other yield related traits. The remaining principal components (PC3 to PC6) contributed marginally and exhibited limited discriminatory power. Consequently, the most important yield and yield contributing characters, particularly total number of pods per plant and plant height were predominantly associated with first two principle components. Earlier studies by
Mahalingam et al., (2020); Mohan et al., (2021); Nayak et al., (2021); Jakhar and Kumar (2018) and
Gayathri et al., (2023) similarly emphasized the importance of these traits in enhancing efficiency in future breeding programmes.
All variables included in the study were aggregated and exhibited positive correlations with one another. Grain yield exhibited a significant positive association with total number of pods per plant and plant height, which corresponded to higher loadings of these traits on the first principal component (Fig 3). Traits located farther from the origin in the biplot, namely grain yield, total number of pods per plant and plant height, contributed more substantially to overall variability compared to primary branches per plant, days to 50% flowering and 100-seed weight. Notably, none of the variables displayed negative correlations. PCA was conducted using a correlation matrix to transform correlated quantitative variables into a reduced set of uncorrelated principal components, thereby elucidating the underlying structure of trait relationships (
Johnson and Wichern, 2007). The summary of the characters studied for correlation and principal component analysis are presented in Table 2.
The representation quality of the variables on the factor map is referred to as Cos2 (Fig 4), indicated that plant height, total number of pods per plant and grain yield exhibited the greatest contribution to total variation, where as 100 grain weight and primary branches per plant contributed the least to the principal components. This finding is corroborated by the research conducted by
Mahalingam et al., (2020). The variables strongly associated with PC1 (Dim.1) and PC2 (Dim.2) are pivotal in elucidating the variability within the dataset. Conversely, variables with weak or negligible association with the principal components contributed minimally and may be excluded to simplify the analysis without compromising interpretability.
In this present study, the integration of biplot analysis and attribute significance facilitated the construction of unified biplot, wherein attributes exhibiting similar cos2 scores were represented by analogous color codes (Fig 5). Attributes with high cos2 values, notably plant height and grain yield, were depicted in green, indicating their strong contribution to the principal components. In contrast, attributes with moderate cos2 values, including the total number of pods and 100 seed weight, are illustrated in orange, reflecting their intermediate influence. Finally, attributes with low cos2 values, namely primary branches per plant and days to 50% flowering, were represented in black, signifying their limited contribution to the principal components. Consequently, primary branches per plant and days to 50% flowering were identified as traits of lesser importance in explaining the overall variability. These findings are consistent with the results reported by
Nayak et al., (2021).
Genetic diversity analysis
K-means clustering is a centroid- based, non-hierarchical classification technique used to partition n genotypes into k distinct cluster, wherein each genotype is assigned to the cluster with the nearest mean. In this method, ‘K’ represents predefined number of clusters and genotypes are grouped by minimizing within cluster variance while maximizing inter-cluster divergence
(Kanavi et al., 2020).
In the present investigation, K-means cluster analysis was performed, resulting in the classification of genotypes into four distinct cluster, illustrated in Fig 6 and Fig 7. Earlier studies by
Mohan et al., (2021) and
Kanavi et al., (2020) reported the formation of four and seven cluster groups respectively, in greengram. Cluster II comprised the highest number of genotypes (43), predominantly characterized by increased plant height, Cluster I included 36 genotypes, most of which exhibited moderate to high yield. Cluster III consisted 13 genotypes, all identified as high-yielding, whereas cluster plot IV contained 15 genotypes that were primarily characterized by low yielding (Table 3). The high-yielding genotypes in cluster III, namely VBN (Gg)2, VGG 20-088, VGG 20-232, VGG 20-233, VGG 20-069, VGG 20-230, VGG 20-092, VGG 20-068, VGG 20-234, VGG 20-071, VGG 20-010, VGG 20-091 and VGG 20-066, were identified as promising candidate for multi-location trials aimed at assessing their adaptablility and suitability for variety release. To enhance crop adoption, it is crucial to select diverse parental lines based on component traits
(Katiyar et al., 2020). The distribution of genotypes derived from diverse parental crosses across multiple clusters indicates a weak association between genetic divergence and geographical origin. Similar conclusions have been drawn by
Katiyar et al., (2009) and
Singh et al., (2013), who emphasized that a high degree of genetic diversity is critical for generating substantial variability and achieving effective genetic gains through selection.