Evaluating the Influence of Soil and Environmental Parameters in Terms of Crop Suitability using Machine Learning

¹Vishwakarma Institute of Information Technology, Pune-411 048, Maharashtra, India.

²Department of Computer Science and Engineering, GITAM Institute of Technology, GITAM University, Visakhapatnam-530 045, Andhra Pradesh, India.

ABSTRACT

Background: The key source of income in India is agriculture, so farming is called as backbone of Indian economy. To satisfy the need of increasing population increase in the crop yield is very important. India country programming framework stated that, the annual soil loss in India is about 5.3 billion tonnes.

Methods: Majority farmers are small or marginal scale and are dependent on natural resources like soil-quality, rainfall and environmental condition etc. for their yield. Based on experience farmers decide which crop to be adopted. Government is arranging trainings and exhibitions to enhance the skillset of farmers.

Result: A land which gives poor yield for one crop may produce adequate yield for some other crop/crops. To know the possible suitable crop/crops proposed machine learning model focuses current and potential suitability evaluation for available scenario.

KEYWORDS

INTRODUCTION

Some of the small/marginal scale Indian farmers are becoming cautious about their crop production. They approach to nearby KVK center (The term used in India, Maharashtra state for center where farmers can get their soil samples tested to know available soil nutrients and their proportion) to get the soil and water testing reports. It also recommends a crop, but we cannot be sure that whether the farmer will go for that crop (due to various reasons like economical condition, natural resources available etc.). On other hand, lot of work is done in agricultural research. But it has focus on a specific crop with limited dependent parameters. Proposed methodology is soft computing based, decision support system works only for major crops in India. Appropriate machine learning technique is adopted, where input parameters are nothing but existing availability of components in soil and environment and output is crop specific suitability level. The report “India Country Programming Framework, 2016” by FAO says that land degradation constitutes a major threat to India’s food and environmental security. So, increasing the crop yield plays a significant role in satisfying the food requirements for increasing population of human beings and livestock’s too. Average annual soil loss in India is about 5.3 billion tones (India Country Programming Framework, 2016) due to various reasons.

ALSE (Agriculture Land Suitability Evaluator) is one of the recent land suitability evaluator (Elsheikh et al., 2013). It is a tool for decision making, helps to compute suitability of crops like mango, banana, papaya, citrus and guava also plan the crops as per suitability levels. The system is developed using very basic language i.e. Visual Basic which cannot support efficient programming and web features. Also, it is not for major crops and fragmented land. The Micro-LEIS (De La Rosa et al., 2004) performed agro-ecological land evaluation. This is designed once and standardized. User don’t have flexibility to build own expert system using it. Thus, users can’t view the results as per their current requirements. LEIGIS is a software application developed by Kalogirou (Kalogirou, 2002), it worked like crop planner for rural farmers for specified crops. It is independent of environmental property whereas environment plays a key role. LIMEX is the system with multimedia capabilities (Kalogirou, 2002). Drawback of the system is that it does not support wide range of problems in land evaluation.

All above automated systems are used in different countries for large scale crop lands, none of it is precisely applicable for marginal or small-scale farming in India (Bhimanpallewar et al., 2015). While talking about national context (FAO, 2016) FAO says that in India over 70% of rural Indian households including livestock are totally dependent on agriculture. It is mainly a combination of marginal and small-scale farms. All the successful automated systems are built by assuming an adequate water supply whereas 62% of Indian farm lands are rain-fed. After collaboration with FAO, gradually India became a net food exporter. But the share of agriculture in total GDP is gone down from 18% in 2013-14 to 7-8% by 2019-20.

Economic survey till 2012-13 gives details about yield per hectare of major crops in India and state-wise (Directorate of Economics and Statistics, 2013). Observing that this system is developed for major crops like jowar, wheat and rice. FAO guidelines are available for land evaluation. Proposed system is using the guidelines from FAO Land Evaluation Framework published in 1976 (FAO, 1976), (Cavayas, 2012), Guidelines: Land evaluation for rainfed agriculture, FAO Soils Bulletin 52, 55(1985), 58(1991), 67(1993), 73(1996) for different objectives (FAO, 1985; 1991; Onyeji et al., 1996; FAO, 1996). Referring all these we have chosen the important system parameters.

Along with all above facilities continuously efforts are taken by State and Central Government, Agriculture Universities through sessions, websites, reports published to aware farmers. But it doesn’t help farmers as whenever, wherever they want. The focus of the system is forwarding expert knowledge to farmers for the cropping decision.

MATERIALS AND METHODS

Traditional practice of rural farmers is choosing the crop/crops and suitable fertilizers as per their local discussions and experience. To improve this, proposed method suggests as fallowing:
i.    Farmer must get their soil repot from nearby KVK (once in two years).
ii.   Depending on existing scenario, identify crop specific suitability and choose the crop/crops accordingly.
iii.   Add appropriate fertilizers in adequate proportion as suggested.
This paper has more focus on step (ii) Computing crop specific suitability.

Implementation details

It is a software module. Real time dataset is collected from Agriculture University, Pune since year 2009-10 till 2013-14. Accurate training data set is used. Database stored using SQL and algorithm implemented using JAVA language. Some part of the following algorithm is already published (Bhimanpallewar et al., 2017).

Algorithm: A hybrid machine learning algorithm

Probability distribution of the parameter P (input/output parameter).

P = (P₁, P₂, …, P_n)

Most of the logic is based on decision tree algorithm which allows us to generate multiple outcomes (Malik and Tarique, 2014) (Wu et al., 2008) (Flach, 2012). Here we are generating five outcomes i.e. S1, S2, S3, N1 and N2.

Steps for algorithm

I. Divide the dataset into two parts training and testing dataset. Here, output vector is nothing but suitability levels for the cropland,

T = (T₁, T₂, T₃, T₄,T₅)

i.e. Suitability levels S = {S1, S2, S3, N1,N2}
1.   X- set of every parameter considered as input i.e. (X₁, X₂,..., X_n)
Where; n = number of input parameters, T = output/class parameter.
2.   Select one parameter output (T) or input (Xi) consider it as p.
Calculate the probability distribution of that parameter.
a.   Parameter may take categorical values [Ex. parameter Soil Topography which takes value {1 (Plain), 2 (Gentle slope), 3 (Deep slop)}].
b.   Parameter takes continuous values, then split it depending on the probability of the range of parameter values occurred under the class value T_i, parameter values will be partitioned into categories using ranges [Ex. In Fig 1 probability distribution of parameter rainfall is as, p₁ (rainfall <= 38 centimeters), p2 (rainfall > 38 centimeters)].

p = (p₁, p₂, p₃, …, p_n)

Fig 1: Resultant tree for a sample farm.

3. Calculate the Entropy of P using Equation.

..........(1)

[More the uniformity in the probability distribution, gives more information.]
4. T- set of records which are partitioned based on class values C₁, C₂, ..., C_k
when, p- is the probability distribution of the partition (C₁, C₂, ..., C_k) then the information needed to identify the class of an element of T is as in Equation.

Info (T) = I (p)

p = (|C₁|/|T|,|C₂|/|T|, …, |C_K|/|T|)..........(2)

5. After partitioning based on class value into sets T₁, T₂, ...,T_n

The information needed to identify the class of an element of T = the weighted average of the information needed to identify the class of an element of Ti, i.e. the weighted average of info (Ti), Equation.

..........(3)

6. Information Gain of parameter Xi is computed using Equation.

Gain (X_i, T) = Info (T) - Info (X_i, T)..........(4)

7. Repeat the procedure 2 to 6 for all the parameters to get the vector.

G = [Gain (X₁, T), Gain (X₂, T), ..., Gain (X_n, T)]..........(5)

8.   Choose the parameter X_i such that Gain (X_i, T) is higher than the other parameters considered. Identify the sub-branches under that node as below, if the probability distribution of the parameter X_i i.e. P is (for first iteration it will be root node) refer step 2.
p = (p₁, p₂, p₃, …, p_n)
Then,
p₁ = Subset of the dataset belongs to sub-branch 1.
p₂ = Subset of the dataset belongs to sub-branch 2.
-
-
p_n = Subset of the dataset belongs to sub-branch n.
9.   Choose one of the sub-branch and respective subset of the dataset. Repeat the procedure 1-9 till last sub-branch will be the class node/leaf node.
10. Prune the tree if required (Logically not graphically).
11. Once the logical tree is ready verify this for the testing dataset.
Error rate allowed in output: +1 level or -1 level.
If error generated > ±1 then, repeat the procedure 1 to 11 for the sub-branch which is identified as misclassified.
II. Model is trained using training dataset and verified using testing dataset.

Now compare the suitability outcome of the decision tree approach with raster scan system.
a. If it is giving correct output with allowed error rate then display current suitability and expected improvements for potential suitability.
b. Else generate the outcome suitability using raster scan system and display it. In this case the conclusion is dataset selected is not appropriate, so we need to train the whole model again with another appropriate dataset.

RESULTS AND DISCUSSION

Input and output parameters

Input parameters

Few vital parameters are considered like rainfall, temperature, moisture, soil type, topography, pH (alkalinity) and soil nutrients NPK (Nitrogen, Phosphorous and Potash (Potassium) (Agriculture University, Pune 2008) .

Ranges of the parameter Table 1 are referred from the reports published by agriculture university (Agriculture University, 2008) and FAO (FAO, 2016) (Flat et al., 2011). It is valid for soil type in Pune region Maharashtra, India. This case study is for a small scale or marginal farmers, holding fragmented land.

Table 1: Input ranges.

Assumption is that required water supply is sufficient and those are indirectly dependent on rainfall, so one of the input parameters is precipitation. Table 1 indicates the valid input ranges and Table 2 indicates the suitable parameter ranges for increasing yield of jowar (Agriculture University, 2008; FAO, 2016; Vintrou et al., 2013; Arthapedia and Indian, 2016; Table Des Matières, 1991).

Table 2: Suitable ranges for Jowar.

Here cropland is categorized as 1- black clayey loams, 2- heavy and light alluvium to red, 3- grey and yellow loams and 4- sandy soils. Sandy soil is not suitable for jowar, so not indicated in Table 2. Topography of the land categorized as 1- plain, 2- gentle slope and 3- deep slope. In Pune region mostly, we found either plain or gentle slope. Table 2 is referred to prepare raster scan system; it is a kind of matrix to verify the predicted suitability outcome with respect to expert knowledge.

Output parameters

Suitability levels are categorized into five classes depending on their qualitative and quantitative features (FAO, 1974) (Copy, 2001). Classes are S1 (Suitable), S2 (Moderately-suitable), S3 (Marginally-suitable), N1 (Not-suitable: due to major economic reasons otherwise moderately suitable) and N2 (Not-suitable: due to physical reasons).

Dataset description

Case study of jowar for computing suitability S is considered. Sample dataset is Table 3.

Table 3: Sample dataset for Jowar crop

Average values are considered while monitoring few parameters example rainfall- average rainfall of the kharif season is considered here. Organic factors are measured at KVK by taking random samples from farmland.

Detailed region wise aggregate reports are maintained at KRISHI BHAVAN, either seasonally or annually. One report is shown as Table 4 for year 2009-10 EC measurements. Similarly, other records are available. Environmental parameters are referred from reports published by India Meteorological Department, Pune for the respective duration. Dataset is generated using available reports.

Table 4: Sample aggregate report year 2009-10 for EC Ksharata (Electrical conductivity).

Results for sample dataset

The results for sample dataset Table 3, by given model are observed. From resultant classification rules, we have plotted the decision tree Fig 1 to understand the results better.

Results

1. Node with highest information gain is rainfall and the next node contributing is soil-moisture.
2. Correctly Classified Instances-75%. Incorrectly Classified Instances-25%.

Results for real-time dataset

Dataset is divided into training and testing dataset. After observing the output classification rules of given approach, a decision tree is generated as shown in Fig 2.

Fig 2: Resultant tree for real-time dataset.

Results:

1. There are three categories of inputs:
High information-gain: rainfall, soil-type, topography and temperature.
Moderate information-gain: Soil moisture, EC and pH.
Average information-gain: N, P and K.
2. Instances categorized as below:
    Correctly Classified-73938
    Incorrectly Classified-799
Suitability levels compared with fertility index (STCR Indian Government, n.d.). Results are aligned.

Advantages

1. Accuracy=98.9%; Error rate= 1.10%.
2. Considered all types of input attributes.
3. Tree is pruned very efficiently.
4. After evaluating performance of above model error rate observed are:
Mean absolute error-0.0072.
Root mean squared error-0.0601.

Actual output generated by this system is in terms of current suitability level. Few of the lagging parameters can be modified artificially and we can achieve the target of expected value shown in Fig 4, if lagging parameters are changeable then,

Potential suitability level= ++ current suitability level

Where,
N2<current suitability level< S1.

Output for the sample case is shown in Fig 3 if the input value is not as expected then the expected value will be indicated in red colored side bar as shown in Fig 4.

Fig 3: Soil suitability by a hybrid machine learning model.

Fig 4: Graphical representation of expected parameters.

This hybrid approach is based on c4.5 tree algorithm, so all types of attributes can be provided as inputs. It also works with tuples having some unknown attribute values. One drawback of this rule creation is that the convergence is not 100%. Thus, misses some worst cases. To overcome this raster scan system is also provided here.

Comparing the results of sample dataset with real-time dataset we can say that performance of this approach is improves with variance in dataset. Decision tree algorithm is more sensitive towards dataset so to achieve balance raster scan it used in combination. In agriculture application criteria for checking suitability may vary depending on distinct geographical location, environment, season etc. Considering all these sensitive for distinct scenarios model can be trained differently i.e. with different dataset.

CONCLUSION

This hybrid approach gives us a privilege to make the model adoptable to real-time scenario by choosing real-time dataset. Obviously, the model trained in such a way will give practicable decisions. So, users can view the results as per their current requirements. This system is user friendly and simple for understanding. It will be a blueprint for the attendant at KVK, who is approachable guide for farmers. Also, small scale farmers can use it as a guide for deciding the crop to be cultivated with certain conditions. It will contribute in increasing yield without degrading soil quality.

REFERENCES

Agriculture University. (2008). Krishi Darshani 2008. Pune.

Arthapedia, F., Indian, T. (2016). Cropping seasons of India Kharif and Rabi.

Bhimanpallewar, R. and Narasinagrao, M.R. (2017). A Machine Learning Approach to Assess Crop Specific Suitability for Small/Marginal Scale Croplands. International Journal of Applied Engineering Research. 12(23): 13966-13973.

Cavayas, F. (2012). Table Des Matières. (1991), doi:10.1515/mamm.1991.55.4.665 Complete information on Jowar (Sorghum Vulgare), n.d.

Copy, T.D. (2001). Chapter 1. FAO Corp. Doc. Repos. doi:10.1017/CBO9781107415324.004.

De La Rosa, D., Mayol, F., Diaz-Pereira, E., Fernandez, M., De La Rosa, D. (2004). A land evaluation decision support system (MicroLEIS DSS) for agricultural soil protection: With special reference to the Mediterranean region. Environ. Model. Softw. 19: 929-942. doi:10.1016/j.envsoft. 2003.10.006.

Directorate of Economics and Statistics, (2013). Directorate of Economics and Statistics of respective State Governments and for All-India-CSO. As on August 14, 2012. 1-32.

Elsheikh, R., Mohamed Shariff, A.R.B., Amiri, F., Ahmad, N.B., Balasundram, S.K., Soom, M.A.M. (2013). Agriculture Land Suitability Evaluator (ALSE): A decision and planning support tool for tropical and subtropical crops. Comput. Electron. Agric. 93: 98-110. doi:10.1016/j.compag.2013.02.003.

FAO. (2016). Water and soil requirements [WWW Document]. URL http://www.fao.org/docrep/u3160e/u3160e04.htm.

FAO. (1991). Guidelines: land evaluation for extensive grazing. Soils Bull. 58. Food Agric. Organ. United Nations, Rome, Italy. 158.

FAO. (1985). Guidelines: Land evaluation for irrigated agriculture - FAO Soils Bulletin 55, FAO Soil Bulletin.

FAO. (1976). A framework for land evaluation: soil Bulletin 32. FAO soils Bull. n.32. doi: M-51.

FAO (Food and Agriculture Organization of the United Nations). (1996). Agro-ecological zoning guidelines. FAO Soils bulletin 73.

FAO (Food and Agriculture Organization of the UnitedNations), n.d. Sorghum bicolor.

Flach, P. (2012). Data, Machine Learning: The Art and Science of Algorithms that Make Sense of. doi:10.1145/242224. 242229.

Flat F., n.d. (2011). World Bank assisted Maharashtra Agricultural Competitiveness Project Marketing Strategy Supplement (MSS) District - Ahmednagar Project Implementation Unit (Agriculture), 1–109.

Food and Agriculture Organization of the United Nations (FAO). (1974). Approaches to Land Classification. India Country Programming Framework. 2016.

Kalogirou, S. (2002). Expert systems and GIS: an application of land suitability evaluation., in: Computers, Environment and Urban Systems 26: 89-120.

Malik, G., Tarique, M. (2014). On Machine Learning Techniques for Multi-class Classification 3: 6-9.

Onyeji, S.C., Fischer, G. and Kamau, W. (1996). Agro-Ecological Assessments for National Planning in Kenya: Database Structure for District Analysis.

R.N. Bhimanpallewar, Narasingarao, D.M.R. (2015). A Survey of Automated Advisory Systems in. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 4: 1022-1030.

STCR Indian Government, n.d. __ Districtwise Fertility Index __.

Vintrou, E., Ienco, D., Begue, A., Teisseire, M. (2013). Data mining, a promising tool for large-area cropland mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6: 2132–2138. doi:10.1109/JSTARS.2013.2238507.

Wu, X., Kumar, V., Ross, Q.J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D. (2008). Top 10 algorithms in data mining, Knowledge and Information Systems. doi:10.1007/s10115-007-0114-2.

Disclaimer :

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Copyright :

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Indian Journal of Agricultural Research

Research Article

Evaluating the Influence of Soil and Environmental Parameters in Terms of Crop Suitability using Machine Learning

ABSTRACT

KEYWORDS

INTRODUCTION

MATERIALS AND METHODS

RESULTS AND DISCUSSION

CONCLUSION

REFERENCES

Reviewed By

In this Article

APC

Publish With US

Become a Reviewer/Member

Open Access

Products and Services

Support and Policies

Editorial Board