Bhartiya Krishi Anusandhan Patrika, volume 36 issue 4 (december 2021) : 334-337

Outlier Removal in Sheep Farm Datasets Using Winsorization

Ambreen Hamadani, Nazir A. Ganai, Tariq Raja, Safeer Alam, Syed Mudasir Andrabi, Ishraq Hussain, Haider Ali Ahmad
1Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir, Srinagar-190 006, Jammu and Kashmir, India.
  • Submitted22-11-2021|

  • Accepted06-01-2022|

  • First Online 26-01-2022|

  • doi 10.18805/BKAP397

Cite article:- Hamadani Ambreen, Ganai A. Nazir, Raja Tariq, Alam Safeer, Andrabi Mudasir Syed, Hussain Ishraq, Ahmad Ali Haider (2022). Outlier Removal in Sheep Farm Datasets Using Winsorization. Bhartiya Krishi Anusandhan Patrika. 36(4): 334-337. doi: 10.18805/BKAP397.
Background: Sheep farm data is often biased by extreme values which are generally introduced due to errors in manual measurement. These values interfere with the accuracy of estimations especially in state-of-the-art techniques like Machine Learning. 
Methods: Therefore, winsorization technique was attempted for the removal of outliers from sheep farm data data for 11 years (2011-2021) for body weights at different ages. Some outliers were deliberately introduced into the data to check the efficiency of the technique. This study was conducted during the year 2021.
Result: Our results indicate that outlier values of 15.3, 42, 44, 60, 90 for birth weight, weaning weight, 6-month, 9 month and 12-month body weight which were far from the normal range were removed using this technique. The mean and standard deviation values were altered after winsorization. Winsorization technique works well for sheep farm data to remove the bias introduced by outliers and also removes, to a large extent, the need for manual outlier removal in data.

  1. Chambers, R., Kokic, P., Smith, P., Cruddas, M. (2000). Winsorization for identifying and treating outliers in business surveys. Proceedings of the Second International Conference on Establishment Surveys, American Statistical Association Alexandria, Virginia. pp. 717-726. 

  2. Gerard-Marchant, P.G. (2007). scipy.stats.mstats.winsorize. https:/ /docs.scipy.org/doc/scipy/reference/generated/scipy. stats.mstats.winsorize.html.

  3. Grubbs, F.E. (1969). Procedures for detecting outlying observations in samples. Technometrics. 11 (1): 1-21. doi: 10.1080/00 401706.1969.10490657. 

  4. Hargrave, M., Clarine, S. (2021). Winsorized Mean. https://www. investopedia.com/terms/w/winsorized_mean.asp.

  5. Hunter, J.D. (2007). Matplotlib: A 2D graphics environment. Computing in Science and Engineering. 9(3): 90-95. 10.1109/MCSE. 2007.55.

  6. Maddala, G.S. (1992). Outliers. Introduction to Econometrics (2nd ed.). New York: MacMillan. pp. 89. ISBN 978-0-02-374545-4. 

  7. Moso, J., Cormier, S., Fouchal, F., de Runz, C., Wandeto, J. (2021). Anomaly Detection on Data Streams for Smart Agriculture. Agriculture. 11: 1083. https://doi.org/ 10.3390/agriculture 11111083.

  8. Widenius, M., Axmark, D., DuBois, P. (2002). Mysql Reference Manual (1st. ed.). O’Reilly and Associates, Inc., USA. ISBN:978-0-596-00265-7. pp 712.

Editorial Board

View all (0)