Legume Research

  • Chief EditorJ. S. Sandhu

  • Print ISSN 0250-5371

  • Online ISSN 0976-0571

  • NAAS Rating 6.80

  • SJR 0.391

  • Impact Factor 0.8 (2024)

Frequency :
Monthly (January, February, March, April, May, June, July, August, September, October, November and December)
Indexing Services :
BIOSIS Preview, ISI Citation Index, Biological Abstracts, Elsevier (Scopus and Embase), AGRICOLA, Google Scholar, CrossRef, CAB Abstracting Journals, Chemical Abstracts, Indian Science Abstracts, EBSCO Indexing Services, Index Copernicus
Legume Research, volume 46 issue 9 (september 2023) : 1134-1140

Application of Principal Component Analysis (PCA) for Blackgram [Vigna mungo (L.) Hepper] Germplasm Evaluation under Normal and Water Stressed Conditions

V.A. Mohanlal1,*, K. Saravanan1, T. Sabesan1
1Department of Genetics and Plant Breeding, Faculty of Agriculture, Annamalai University, Annamalai Nagar-608 002, Tamil Nadu, India.
  • Submitted23-05-2020|

  • Accepted30-10-2020|

  • First Online 16-01-2021|

  • doi 10.18805/LR-4427

Cite article:- Mohanlal V.A., Saravanan K., Sabesan T. (2023). Application of Principal Component Analysis (PCA) for Blackgram [Vigna mungo (L.) Hepper] Germplasm Evaluation under Normal and Water Stressed Conditions . Legume Research. 46(9): 1134-1140. doi: 10.18805/LR-4427.
Background: Blackgram [Vigna mungo (L.) Hepper] is a popularly known pulse crop in India for its nutritional quality and adaptability to many cropping systems. The crop is mostly cultivated in areas experiencing water stress which reduces the yield potential. Thus, it is imperative to assess the genetic variability present in the existing blackgram germplasm under drought condition. For this, principal component analysis was carried to visualize the complex dataset. This study was aimed to identify key traits and drought tolerant genotypes.    

Methods: Twenty-one blackgram genotypes were screened in the field condition for water stress where the experiment was laid out in RBD with two replications. Principal component analysis was carried out with thirteen traits in twenty-one genotypes of blackgram under normal and water stressed conditions.

Result: In T0 and T1, more than 75% of total variability among thirteen traits was explained by five and four principal component axes respectively. Under water stress, pod length was highly correlated with seed yield per plant. Based on the interaction vectors and PC scores of genotypes, VBG-12062 had a positive interaction with seed yield. Thus, VBG-12062 can be a reliable candidate for breeding high yielding drought tolerant variety.  
Blackgram [Vigna mungo (L.) Hepper] is one of the important pulse crops of Papilionaceae family. It is rich in nutritional quality with 24-27% protein, 1% fat, 57% carbohydrate, 3.8% fibre and 4.8 % ash. It is grown in both summer and winter seasons. Mostly it is cultivated under the rainfed conditions and faces terminal drought affecting its productivity adversely. Drought is a multi-dimensional stress affecting different plant growth stages (Abarshahr et al., 2011). To overcome water stressed condition, it is important to develop drought tolerant variety in blackgram. In any crop breeding programme, selection of promising genotype(s) is challenging and important task. Being a self-pollinated crop, genetic variation in blackgram germplasm is very low. Identification of blackgram genotypes with high yield under drought stress is an important prerequisite for breeding drought-tolerant varieties. Thus, exploration of the available germplasm collection is essential along with the information about the contribution of different characters towards yield under drought for breeding programs.

Knowledge of the nature, extent and organization of variation could be useful for genetic improvement of crop species (Maji and Shaibu, 2012). A large number of characters are often measured by plant breeders for germplasm evaluation, screening and characterization. Sometimes it is difficult to deal with a large number of characters because of the consolidation of information and drawing a valid conclusion. In such situations, Principal Component Analysis (PCA) is used to reveal patterns and eliminate redundancy in data sets (Adams, 1977; Amy and Pritts, 1991). PCA is a useful technique for the reduction of large data set with many variables into important principal components for a better understanding of information. This statistical procedure is commonly used for compression, reduction and transformation of data. PCA technique, which simultaneously analyses multiple measurements on each individual under investigation is widely used in the analysis of genetic diversity and in the selection of elite genotypes. Keeping these points in view, the present investigation was conducted to assess the relationship among traits and to determine selection criteria and selection of best genotypes in normal and drought conditions.
A total of 21 blackgram genotypes were used in this study (Table 1). The crop was raised during January to April, 2018 at the experimental farm of Department of Genetics and Plant Breeding, Faculty of Agriculture, Annamalai University, Tamilnadu, India in two treatments (T0 – Irrigated and T1 – Irrigated up to the flowering stage). The experiment was laid in Randomized Block Design (RBD) with two replications. Standard agronomic package and practices were followed to raise a healthy crop. The observations were recorded for 11 quantitative characters viz., days to first flowering, plant height, number of branches, number of clusters per plant, number of pods per plant, pod length, pod weight, number of seeds per pod, seed size, 100-seed weight, seed yield per plant and two physiological characters viz., chlorophyll content and leaf protein content. Total chlorophyll content was estimated by the method suggested by Arnon (1949). The soluble protein content of leaf was estimated as per the method given by Lowry et al., (1951). The data was subjected to PCA analysis using STAR (Statistical Tool for Agricultural Research) software.

Table 1: List of genotypes selected for the present study.


 
The descriptive statistics viz., minimum, maximum, mean, Standard Deviation (SD) and Coefficient of Variation (CV) were measured for 13 characters under normal and water stressed condition (Table 2 and 3). The highest variation was observed for number of branches with CV of 12.23 and number of pods per plant with CV of 21.29 in normal and water stressed conditions, respectively. In water stressed condition, the lowest level of coefficient variation was observed by the leaf protein content (0.34).

Table 2: Characteristic means and variations for blackgram genotypes in normal condition (T0).



Table 3: Characteristic means and variations for blackgram genotypes in water stressed condition (T1).



The success of plant breeding depends on the availability of genetic variation, knowledge about desired traits and efficient selection strategies that make it possible to exploit existing genetic resources (Nachimuthu et al., 2014). For the development of new variety, collection of germplasm and their systematic evaluation is needed, in order to know its various morphological, physiological and developmental characters including some special features such as stress tolerance, pest and disease resistance. Thus, appropriate and most efficient approaches should be used for germplasm evaluation and characterization. After the evaluation and characterization, it is important to analyse the genotypes and characters statistically for drawing a valid conclusion. PCA plays an important role in studying a large set of data by extracting the most significant data from the data points. Hotelling (1933) indicated that PCA is an exploratory tool designed by Pearson (1901) to identify unknown trends in a multi-dimensional data set. PCA can be used to uncover similarities between variable and classify the genotypes (Leonard and Peter, 2009). PCA measures the importance and contribution of each component to the total variance.
 
Evaluation under normal condition (T0)            
 
In this study, out of 13 components, the first five principal components explained most of the total variations present in the genotypes. Principal components were selected by the eigen values more than one suggested by Brejda et al., (2000). The first five principal components with eigen value >1 contributed about 78.89% of the total variability among twenty-one blackgram genotypes evaluated for different morphological and physiological characters under normal condition (Table 4). The remaining eight components contributed only 21.11%. The Principal Component (PC) 1 contributed maximum variability of 26.95 % followed by PC 2 showed variability of 17.75%. PC 3 recorded 14.21% variability. 10.99% and 8.99% variability were recorded by PC 4 and PC 5, respectively. Jeberson et al., (2018) estimated the principal components among twenty-five blackgram genotypes and reported 84.52% total variation for the first three components and the remaining four components were responsible for 15.48% variation only.

Table 4: Eigen value, factor scores and contribution of the first five principal component axes to variation in blackgram genotypes in normal condition (T0).



Interpretation of the principal components is based on finding which variables are most strongly correlated with each component. Eigen values close to -1 or 1 indicate that the variable strongly influences the component. Values close to 0 indicate that the variable has a weak influence on the component. The important characters contributed in positive factor loading value for PC 1 were seed yield per plant (0.4774), number of pods per plant (0.4616) followed by number of seeds per pod (0.3594) and plant height (0.3523). The trait days to first flowering (-0.3529) contributed to PC 1 negatively. PC 2 was contributed positively by the characters pod weight (0.4658), pod length (0.4240) and 100-seed weight (0.3269) while number of branches (-0.5115) contributed negatively. The PC 3 related to the characters number of seeds per pod (0.4346) and leaf protein content (0.3839) contributed positively whereas chlorophyll content (-0.5506) and number of clusters per plant (-0.4419) contributed negatively. The first three principal component axes explained more than half of the total variability (58.91 %). Hence, it indicated a high degree of correlation among the traits studied (Jain and Patel, 2016).

However, PC 4 expressed only negative factor loading values for leaf protein content (-0.4842), number of clusters per plant (-0.4304), plant height (-0.3874), days to first flowering (-0.3072) and pod weight (-0.3022). In PC 5, 100-seed weight (0.6442) contributed to maximum positive factor loading value while seed size (-0.6462) contributed to maximum negative factor loading value. As a whole, PCA was able to identify important characters that were responsible for the variability in a population. Similar studies were also conducted by Jeberson et al., (2018) and Sridhar et al., (2020) in blackgram.

Screeplot explained the percentage of variation associated with each principal component by drawing a graph between eigen values and principal components (Fig 1).

Fig 1: Scree plot showing Eigen value variation in normal condition (T0).



The length of the vector is based on the contribution of the character to the principal component (Fig 2). Moreover, the angle of the character vectors is reflecting the correlation of variables. If the angle between two trait vectors is <90° (an acute angle), indicates a positive correlation. The two vectors in the 4th quadrant viz., seed yield per plant and number of pods per plant were highly correlated variables. Similarly, the vectors in 3rd quadrant number of seeds per plant and plant height were highly correlated variables. These four variables also strongly correlated with the first principal component by the factor loading values. If the angle between two traits is >90° (an obtuse angle), indicates negative correlation. While if the angle is equivalent to 90°, indicates that no correlation between the characters. The character days to first flowering recorded negative correlation with seed yield per plant.

Fig 2: Distribution of genotypes and variables across first two components in normal condition (T0).



The genotype G8 projects onto the vector of seed yield per plant and number of pods per plant above the origin indicating a positive interaction (Fig 2). It concluded that by comparing the twenty-one genotypes, the genotype G8 was a superior genotype for characters seed yield per plant and number of pods per plant. Moreover, the genotypes G7 and G10 also had a positive interaction with those characters.

Among the twenty-one genotypes, three genotypes namely G8 (VBG-11011), G10 (VBG-12062) and G7 (VBG-10010) formed a distinct cluster in the right side of 3rd and 4th quadrant (Fig 3). The genotypes G11 (VBG-13017), G14 (RU-16-13), G15 (RU-16-14), G18 (VBN(Bg)-4), G20 (VBN(Bg)-7) and G1 (IC-343943), G3 (IC-343962), G6 (TBG-104), G13 (RU-16-9), G17 (T-9), G19 (VBN(Bg)-6), G21 (MDU Local) were formed two different clusters in between the 1st and 2nd quadrant. The genotypes G4 (ABG-11013), G5 (KU-11680), G12 (ADT-5) and G16 (KGB-28) were formed a cluster in 4th quadrant. Genotypes with a high positive principal component score for PC 1 was G7 (3.4322) followed by G10 (3.3172) and G8 (3.3007) (Table 6). These genotypes can be selected by the high principal component score in this environment (T0).

Fig 3: Scatter plot of the various blackgram genotypes represented in two major principal components in normal condition (T0).



Overall, it was observed that seed yield per plant, number of pods per plant, number of seeds per pod, plant height and days to first flowering had high influence on the PC 1 and the genotypes G8, G7 and G10 had high principal component score for PC 1. Based on the relationship of characters and genotypes to the PC 1, it can be concluded that the genotypes G8 (VBG-11011), G7 (VBG-10010) and G10 (VBG-12062) can be selected for above said characters for breeding purposes in normal environments.
 
Evaluation under water stressed condition (T1)
 
In water stressed condition, the principal component analysis condensed the thirteen traits into four major principal components which accounted for 77.94% of the total variation (Table 5 & Fig 4). The first four principal component axis recorded eigen values greater than one whereas, the fifth and further principal components recorded value less than one. Thus, those PC (<1) could be discarded to further shorten the set of data at disposal.

Table 5: Eigen value, factor scores and contribution of the first four principal component axes to variation in blackgram genotypes in water stressed condition (T1).



Fig 4: Scree plot showing Eigen value variation in water stressed condition (T1).



PCA analysis is able to identify the key traits that are responsible for the variability in a population (Subramanian et al., 2019). PC 1 accounted for 38.14% of total variability and it was positively contributed by the characters chlorophyll content (0.3905) while seed yield per plant (-0.3569), pod length (-0.3569), number of pods per plant (-0.3200), plant height (-0.3037) and number of clusters per plant (-0.3005) contributed negatively. PC 2 accounted for 15.4% of total variability. The positively related traits were 100-seed weight (0.4197) and number of clusters per plant (0.3110) whereas number of seeds per pod (-0.4168) was negatively related to PC 2. The first PC was related to seed yield and yield related traits like pod length, number of pods per plant, number of clusters per plant and plant height. PC 2 was related to 100-seed weight, number of clusters per plant and number of seeds per pod. The first two principal components explained more than half of the total variability of 53.54%. Similarly, Ghanbari and Javan (2015) reported that the first two principal components explained 58.28% variability under drought stress condition in mungbean.

PC 3 contributed 14.59% to total variability and the characters seed size (0.5301), 100-seed weight (0.3640) and number of pods per plant (-0.4107), number of clusters per plant (-0.3392) contributed to PC 3 positive and negative respectively. PC 4 contributed 4.35% of variability to the total variance. The characters namely number of clusters per plant, plant height, number of pods per plant and 100-seed weight grouped together in different principal components. Thus, the prominent characters placed together in different principal components and explaining the variability have the tendency to remain together (Mahendran et al., 2015). This may be taken into consideration during utilization of these characters in drought breeding programs.

The two vectors in 1st quadrant namely seed yield per plant and pod length were highly correlated variables which were strongly associated negatively with the first principal component by the factor loading values (Fig 5). The characters leaf protein content and chlorophyll content showed a negative correlation with seed yield per plant of blackgram genotypes. PCA biplot was extensively used by several researchers to dissect the traits correlation in different crops (Aslam et al., 2017 and Maqbool et al., 2016).

Fig 5: Distribution of genotypes and variables across first two components in water stressed condition (T1).



The genotype G9 had projected in a positive direction for the vector seed yield per plant. It suggested that the genotype G9 (VBG-12005) is positively adapted to water stressed condition for the trait seed yield per plant.

A scatter plot drawn between the first and second principal components depicted a clear pattern of genotypes grouping in the factor plane (Fig 6). The distribution of genotypes based on PC 1 and PC 2 exhibits the phenotypic variation among the population and it explains how they widely dispersed along both axes. The genotypes G1 (IC-343943), G2 (IC-343947), G7 (VBG-10010), G8 (VBG-11011), G10 (VBG-12062), G12 (ADT-5) and G20 (VBN(Bg)-7) clustered as a group in 1st quadrant. G13 (RU-16-9), G17 (T-9) and G18 (VBN(Bg)-4) grouped in 1st and 4th quadrant. The genotypes G3 (IC-343962), G5 (KU-11680), G16 (KGB-28) and G21 (MDU Local) were clustered in 2nd and 3rd quadrant. In 3rd and 4th quadrant the genotypes G6 (TBG-104), G14 (RU-16-13), G15 (RU-16-14) and G19 (VBN(Bg)-6) were clustered as another group. PCA did not show any distinct clustering in Fig 6. This could be due to the fact that the principal component analysis based on water stress. Genotypes with a high negative principal component score for PC 1 was G9 (-4.4696) (Table 6). This genotype can be selected by the high principal component score for water stressed environment (T1).

Fig 6: Scatter plot of the various blackgram genotypes represented in two major principal components in water stressed condition (T1).



Table 6: First principal component scores for twenty-one blackgram genotypes in normal and water stressed condition (T0& T1).



On the whole, the characters namely seed yield per plant, pod length, number of pods per plant, number of clusters per plant, plant height and chlorophyll content had a high influence on PC 1 and the genotype G9 had the high principal component score for PC 1. Based on the relationship of the characters and genotype to the PC 1, it can be concluded that the genotype G9 (VBG-12005) can be recommended for the above said characters in drought breeding programs.
Principal component analysis was done to identify the best performing genotypes, to find grouping patterns of genotypes and to assess the relationship between traits in normal and water stressed condition. Based on the interaction of the genotypes with vector, principal component score, the genotypes viz., VBG-11011, VBG-10010 and VBG-12062 can be selected for the characters seed yield per plant, number of pods per plant, number of seeds per pod and plant height in normal condition and the genotype VBG-12005 can be recommended for the characters seed yield per plant, pod length, number of pods per plant, number of clusters per plant and plant height in drought breeding programs.

  1. Abarshahr, M., Rabiei, B. and Lahigi, H.S. (2011). Assessing genetic diversity of rice varieties under drought stress conditions. Notulae Scientia Biologicae. 3(1): 114-123.

  2. Adams, M.W. (1977). An estimation of homogeneity in crop plants with special reference to genetic vulnerability in dry bean, Phaseolus vulgaris L. Euphytica. 26: 665-679.

  3. Amy, F.I. and Pritts, M.P. (1991). Application of principal component analysis to horticultural research. Hortscience. 26(4): 334-338.

  4. Arnon, D.I. (1949). Copper enzymes in isolated chloroplasts, polyphenoxidase in Beta vulgaris. Plant Physiology. 24: 1-15. 

  5. Aslam, M., Maqbool, M.A., Zaman, Q.U., Shahid, M., Akhtar, M.A. and Rana, A.S. (2017). Comparison of different tolerance indices and PCA biplot analysis for assessment of salinity tolerance in lentil (Lens culinaris) genotypes. International Journal of Agriculture and Biology. 19(3): 470-478.

  6. Brejda, J.J., Moorman, T.B., Karlen., D.L. and Dao, T.H. (2000). Identification of regional soil quality factors and indicators. I. Central and Southern High- Plains. Soil Science Society of America Journal. 64: 2115-2124.

  7. Ghanbari, M. and Javan, S.M. (2015). Study of the response of mung bean genotypes to drought stress by multivariate analysis. International Journal of Agriculture Innovations and Research. 3(4): 1298-1302.

  8. Hotelling, H. (1933). Analysis of a complex of statistical variable into principal components. Journal of Educational Psychology. 24(6): 417-441.

  9. Jain, S.K. and Patel, P.R. (2016). Genetic diversity and principle component analyses for fodder yield and their component traits in genotypes of forage sorghum [Sorghum bicolor (L.) Moench]. Annals of Arid Zone. 55: 17-23.

  10. Jeberson, S.M., Shashidhar, K.S. and Singh, A.K. (2018). Genetic variability, principal component and cluster analyses in black gram under Foot-hills conditions of Manipur. Legume Research. 42(4): 454-460.

  11. Leonard, K. and Peter, R.J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. pp.344.

  12. Lowry, O.H., Brought, N.T.R., Farr, L. A. and Randall, R.J. (1951). Protein measurement with folin phenol reagent. The Journal of Biological Chemistry. 193(1): 265-275.

  13. Mahendran, R., Veerabadhiran, P., Robin, S. and Raveendran, M. (2015). Principal component analysis of rice germplasm accessions under high temperature stress. International Journal of Agricultural Science and Research. 5(3): 355-360.

  14. Maji, A.T. and Shaibu, A.A. (2012). Application of principal component analysis for rice germplasm characterization and evaluation. Journal of Plant Breeding and Crop Science. 4(6): 87-93.

  15. Maqbool, M.A., Aslam, M., Ali, H. and Shah, T.M. (2016). Evaluation of advanced chickpea (Cicer arietinum L.) accessions based on drought tolerance indices and SSR markers against different water treatments. Pakistan Journal of Botany. 48:1421-9.

  16. Nachimuthu, V.V., Robin, S., Sudhakar, D., Raveendran, M., Rajeswari, S. and Manonmani, S. (2014). Evaluation of rice genetic diversity and variability in a population panel by principal component analysis. Indian Journal of Science and Technology. 7(10): 1555-1562.

  17. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine. 2(11): 559-572. 

  18. Sridhar, V., Prasad, B.V.V., Shivani, D. and Rao, S.S. (2020). Genetic Divergence Studies for Yield Components in Blackgram (Vigna mungo L.) Genotypes. International Journal of Current Microbiology and Applied Science. 9(1): 1816-1823.

  19. Subramanian, A., Raj, R.N. and Elangovan, M. (2019). Genetic variability and multivariate analysis in sorghum (Sorghum bicolour) under sodic soil conditions. Electronic Journal of Plant Breeding. 10(4): 1405-1414.

Editorial Board

View all (0)