In this section, the strength of relationship among demographic variables measured in respect to gender male and female as respondents.
In contrast to western countries, consumers are more conscious about quality, convenience, and health concerns about dairy products and their value-addition
(Silvia and Meiselman, 2010). There is a strong need to develop better awareness among consumers about the health benefits of milk, adulteration, harmonisation, and unfair trade practices
(Reddy, 2016). From this study it has been observed that most of the respondents prefer the AMUL brand, and have the largest milk share as a dairy company. In some areas of north India, the mother dairy is the leading brand and most prominent consumer base
(Mangla Kumar Sachin et al., 2019). Consumers prefer A
2 milk dairy farms at nearly 17.8%, and still an emerging market for dairy farms and milk processing units in India
(Allison and Clarke, 2006). In dairy farming, sustainable supply chain practices and technological advancement of paramount importance for efficiency and productivity
(Sinha and Mishra, 2023;
Shruti and Latika, 2016).
Here, odd ratio is a measure of association between outcome and exposure; odds ratios are used to compute the relative odds of the occurrence of the outcome of interest like satisfaction or dissatisfaction in this model given exposure to the variable of interest like food habits,
i.
e. vegetarian or non-vegetarian, the odds ratio can also be used to determine whether a particular exposure is a risk factor for a specific outcome and to compare the magnitude of various risk factors.
Testing for logarithm function
Odds ratio in a linear regression model, the odd ratio logarithm function estimates the probability of success and failure to understand the presence of explanatory variables in response to predictor binomial variables.
Logistic regression provides the knowledge and strength of relationships among variables if it increases greater than 0.1 or decreases less than 0.1. We can say (odds for PV+1) (odds for PV), where PV is a predictor value, shown in the odd ratio table (Table 2).
Odds ratio: Satisfied: Dissatisfied=0.1/0.8=0.125/1
Conditional odds
For male =Satisfied/Dissatisfied=0.57/0.48=1.187.
If we consider satisfaction as success, then the probability of this event happening is more in males.
For Female = Satisfied/Dissatisfied= 0.69/0.51=1.352.
If we consider satisfaction a failure, then the probability of this event happening is less in females.
odds=p (1-p)
Where,
p= Probability of the event occurring.
So if p=0.1, the odds are equal to 0.1/0.9=0.111 (recurring).
Overall odds ratio
1.187/1.352=0.87
• From the overall odds ratio value of 0.87, it is concluded that males are more likely to be satisfied than female consumers.
Model testing
Wald test is used to compute the statistical significance of each beta (b) coefficient in the logistic regression model.
A Wald test calculates the z statistic, which is:
z = Var[ˆβ|X] = σ2 (X'X) -1
Sigma (s2) is the variance of the residuals and has to be estimated from the data and is unknown and X is the design matrix.
Likelihood - Ratio Test (LRT)
This test measures the ratio of the maximised value of the likelihood function for the entire model. (Li) over the maximised value of the likelihood function for the simple model (L0).
(Li) over the maximised value of the likelihood function for the simple model (L0).
The likelihood-ratio statistic equals:
-2Log (L0/L1) = 2 [log (L0) - log (Li)] = 2 (L0 - L1)
The log transformation of the likelihood function yields a chi-square statistic that is considered in the case of backward stepwise elimination.
Results for model estimation
From the contingency table (Table 3), the contingency of transmission computes the conclusions as follows:
a. Probability of Food Habit if vegetarian: 336/446=0.07.
b. Probability of Food Habit if Non- vegetarian = 1-probability of Food Habit if vegetarian: 1-0.07 = 0.93 (i.e., 5/17).
c. Odd ratio of food habits non-veg. as opposed to non-vegetarian food habit=140/336=0.42:1
d. Likewise, the odds of non-veg opposed to veg. Food habits are the inverse=336/140=2.4:1.
Model Base for logarithm function is written as H0: β = 0, where (Constant is zero).
Generalised linear model (glm) ~ LOGIT model
In this study, customer satisfaction is the categorical dependent variable. Hence binomial function is used.
glm (formula = CS ~ 1, family = Binomial(), data = COM_train)
A test statistic is used to build a model to identify the probability of success of all the predictor variables through backward stepwise transformation.
The sample statistic of the output table (Table 4), drawn from the population ranges, lies between a min.-1.8528 and a maximum value of 0.6294. Inter- Quartile range one and 3Q is 0.6294 with a median of the same value, meaning there is a standardisation between observations scattered towards the central. The sample is approximately normally distributed. (Dispersion parameter for binomial family taken to be 1).
In the model base’s summary output table (Table 5), there is no predictor variable. Hence, null and residual deviance have the same values of 445.51 on 472 degrees of freedom. In the base model, no predictor variable is used. In the base model’s summary output of the ANOVA table, there is no predictor variable. Hence, null deviance and residual deviance have the same values.
Base: log ( Odd ratio of food habit non-veg.-1)= -0.5447272
The intercept is <2e-16***, corresponding to the log odds for the target customer satisfaction variable. The logit of odds can be converted back to an odds ratio by taking the exponent of intercept exp( -0.5447272)= 0.58:1,
Odd ratios 0.58 can also be converted back to probabilities
odds = p (1-p)
p=0.58/1.58= 0.3670886
The p-value of (Wald-X2) is 0.367, which is not significant and different from zero(0). In other words, we can say that the food habit of vegetarian consumers is 0.93 and non-vegetarian is 0.07 are no significant effect on customers satisfaction level about milk consumption.
Model: Fit
In this model, customers satisfaction =
β0 + β1 (FOMI) + β2 (Food habit) + β3 (PROM) + e
Where Frequency of milk intake (FOMI), Food Habits and price of milk (PROM) are included as predictor variables in the model.
H0: β0 = β1 = β2 = β3 0
The sample statistic from the population range lies between a min.-2.2359 and a maximum value of 0.8971. Inter- Quartile range 1 is 0.4139 and 3Q is 0.7027 with a median of 0.5421, meaning there is a standardisation between observations scattered towards the central. The sample is positively skewed and has approximately a normal distribution.
From the output table (Table 6), the p-value of (Pr (>|z|) of the explanatory variable PROM and FOMI is less than equal to 0.05 and the p-value of variable food habit is more than 0.05 in the logistic linear model fit. So, the H0 is rejected and suggests a significant effect of two variables on the dependent variable CS. The null deviance of model fit is 335.03 on 362 degrees of freedom and residual deviance is 319.87 on 359 degrees of freedom. The residual deviance is lower than the null deviance. Hence the deviance value is considered a good indicator and the model is valid but needs more transformation.
Model: Fit 1
In this model, customers satisfaction (CS) =
β0 + β1(FOMI) + +β2 (PROM)+ E
Where frequency of milk intake (FOMI), Price of milk (PROM) are included as predictor variables in the model.
H0: β0 = β1= β2= 0
The sample statistic drawn from the population ranges lies between a min.-2.1929 and a maximum value of 0.8663. Inter- Quartile range 1 is 0.4351 and 3Q is 0.6960 with a median of 0.5526, meaning there is a standardisation between observations scattered towards the central. The sample is positively skewed and has approximately a normal distribution.
From the output table (Table 7), the p-value of (Pr (>|z|) of the explanatory variables PROM and FOMI is less than equal to 0.05 and all the predictor variables have significantly affected the target variables. Hence, the H0 is rejected and suggests a significant effect of predictor variables on the dependent variable CS. The null deviance of model fit is 338.91 on 364 degrees of freedom and residual deviance is 326.26 on 362 degrees of freedom. The residual deviance is lower than the null deviance. Hence, the deviance value is a good indicator and the model is valid and robust.
From a given set of tain_data, Akaike Information Criterion (AIC) estimates the relative quality of statistical models and out-of-sample prediction error. AIC estimates the quality of each model relative to each other models. The threshold value for AIC is lower among all the tested models, considered the best model. The AIC of the base model is 447.51, the model fit is 335.03 and model fit1 is 332.26. So, from the above, all the statistics AIC of model fit1 is lower and considered the best fit for the data.
LRT (The likelihood ratio test) measures whether a model best fits a given data set if it demonstrates an improved model with fewer predictors to compare the existing model (fit1) with the base model. Moreover, the log difference between the current and base model is significantly different. The probability value of the model fit1 is less than five per cent in the given output table (Table 8). Hence the null hypothesis is rejected and provides evidence against a base model in favour of existing model fit1 for consideration as the goodness of fit and highly significant.
In the case of linear regression, the proportion of variance is explained by the response variable and predictor variables termed R2. On the other hand, in logistic regression, the functional form of the equation contains a logarithmic function, So instead of pseudo-R2, in the case of Multiple Logistic linear regression model R2. McFadden’s R2 is used to define as 1- [ ln (LM) / ln (LO) ] where ln (LM) is the log-likelihood value for the fitted model and ln (LO) is the log-likelihood for the null model with only an intercept as the predictor. The measures range from zero to just under one, with a value close to zero indicating that the model possesses no predictive power. From the output table (Table 9), McFadden’s R2 is 0.03, ranging between zero to one and above 0.2 is considered satisfactory and variables can explain the proportion of the variance in the dependent variable.
From the output table (Table 10), Hosmer Lemeshow Statistic is a chi-square test statistic used to measure the predicted model is not significantly different in their observed values with a desirable outcome. It tests the goodness of fit if the probability value of the predicted model fit 1 in this study is not less than a five per cent level of significance and the null hypothesis fails to be rejected. Model fit 1 indicated that the p-value of Hosmer Lemeshow’s goodness of fit is 0.8072, which is greater than the 5% significance level and fails to reject the null hypothesis. This output would suggest no difference between observed and predicted values of model fit1 and adequately fitted with the data.
From the output table (Table 11 ), it is observed that the variable price of milk (PROM) strongly associates with other variables and is relevant in the logistic linear regression model fit1.
Confusion Matrix or classification table statistic is used to measure how well the model predicts the target variable in a tabular form to represent the actual and predicted values to determine the model’s accuracy (Table 12). For the given logistic linear regression model fit 1, the recall matrix and sensitivity matrix for the actual and predicted values is 0.9918 with a cutoff of 0.3 on the train data, which is considered goodness of fit.
From the logistic linear equation for model,
glm (formula = CS ~ PROM + Fscore, family = Binomial (logit), + data = COM_train)
The output for manual transmission where probability is coded as 1. To understand the customer satisfaction level with observed variables, the price of milk is 0.47 and the F-score,
i.
e., frequency of milk intake is -0.17, which has a negative linear relationship (Table 13). Probability is being fitted about 95% of the manual transmission in model fit1 from the graph plotted (Fig 3).