Traditional practice of rural farmers is choosing the crop/crops and suitable fertilizers as per their local discussions and experience. To improve this, proposed method suggests as fallowing:
i. Farmer must get their soil repot from nearby KVK (once in two years).
ii. Depending on existing scenario, identify crop specific suitability and choose the crop/crops accordingly.
iii. Add appropriate fertilizers in adequate proportion as suggested.
This paper has more focus on step (ii) Computing crop specific suitability.
Implementation details
It is a software module. Real time dataset is collected from Agriculture University, Pune since year 2009-10 till 2013-14. Accurate training data set is used. Database stored using SQL and algorithm implemented using JAVA language. Some part of the following algorithm is already published
(Bhimanpallewar et al., 2017).
Algorithm: A hybrid machine learning algorithm
Probability distribution of the parameter
P (input/output parameter).
P = (P1, P2, …, Pn)
Most of the logic is based on decision tree algorithm which allows us to generate multiple outcomes (
Malik and Tarique, 2014)
(Wu et al., 2008) (
Flach, 2012). Here we are generating five outcomes
i.e. S1, S2, S3, N1 and N2.
Steps for algorithm
I. Divide the dataset into two parts training and testing dataset. Here, output vector is nothing but suitability levels for the cropland,
T = (T1, T2, T3, T4,T5)
i.e. Suitability levels S = {S1, S2, S3, N1,N2}
1. X- set of every parameter considered as input
i.e. (X
1, X
2,..., X
n)
Where; n = number of input parameters, T = output/class parameter.
2. Select one parameter output (T) or input (Xi) consider it as
p.
Calculate the probability distribution of that parameter.
a. Parameter may take categorical values [Ex. parameter Soil Topography which takes value {1 (Plain), 2 (Gentle slope), 3 (Deep slop)}].
b. Parameter takes continuous values, then split it depending on the probability of the range of parameter values occurred under the class value T
i, parameter values will be partitioned into categories using ranges [Ex. In Fig 1 probability distribution of parameter rainfall is as,
p1 (rainfall <= 38 centimeters),
p2 (rainfall > 38 centimeters)].
p = (p1, p2, p3, …, pn)
3. Calculate the Entropy of P using Equation.
..........(1)
[More the uniformity in the probability distribution, gives more information.]
4. T- set of records which are partitioned based on class values C
1, C
2, ..., C
k
when,
p- is the probability distribution of the partition (C
1, C
2, ..., C
k) then the information needed to identify the class of an element of T is as in Equation.
Info (T) = I (p)
p = (|C1|/|T|,|C2|/|T|, …, |CK|/|T|)..........(2)
5. After partitioning based on class value into sets T
1, T
2, ...,T
n
The information needed to identify the class of an element of T = the weighted average of the information needed to identify the class of an element of Ti,
i.e. the weighted average of info (Ti), Equation.
..........(3)
6. Information Gain of parameter Xi is computed using Equation.
Gain (Xi, T) = Info (T) - Info (Xi, T)..........(4)
7. Repeat the procedure 2 to 6 for all the parameters to get the vector.
G = [Gain (X1, T), Gain (X2, T), ..., Gain (Xn, T)]..........(5)
8. Choose the parameter X
i such that Gain (X
i, T) is higher than the other parameters considered. Identify the sub-branches under that node as below, if the probability distribution of the parameter X
i i.e. P is (for first iteration it will be root node) refer step 2.
p = (
p1, p2, p3, …, pn)
Then,
p1 = Subset of the dataset belongs to sub-branch 1.
p2 = Subset of the dataset belongs to sub-branch 2.
-
-
pn = Subset of the dataset belongs to sub-branch n.
9. Choose one of the sub-branch and respective subset of the dataset. Repeat the procedure 1-9 till last sub-branch will be the class node/leaf node.
10. Prune the tree if required (Logically not graphically).
11. Once the logical tree is ready verify this for the testing dataset.
Error rate allowed in output: +1 level or -1 level.
If error generated > ±1 then, repeat the procedure 1 to 11 for the sub-branch which is identified as misclassified.
II. Model is trained using training dataset and verified using testing dataset.
Now compare the suitability outcome of the decision tree approach with raster scan system.
a. If it is giving correct output with allowed error rate then display current suitability and expected improvements for potential suitability.
b. Else generate the outcome suitability using raster scan system and display it. In this case the conclusion is dataset selected is not appropriate, so we need to train the whole model again with another appropriate dataset.