Loading...

Outlier Removal in Sheep Farm Datasets Using Winsorization

DOI: 10.18805/BKAP397    | Article Id: BKAP397 | Page : 334-337
Citation :- Outlier Removal in Sheep Farm Datasets Using Winsorization.Bhartiya Krishi Anusandhan Patrika.2021.(36):334-337
Ambreen Hamadani, Nazir A. Ganai, Tariq Raja, Safeer Alam, Syed Mudasir Andrabi, Ishraq Hussain, Haider Ali Ahmad escritor005@gmail.com
Address : Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir, Srinagar-190 006, Jammu and Kashmir, India.
Submitted Date : 22-11-2021
Accepted Date : 6-01-2022

Abstract

Background: Sheep farm data is often biased by extreme values which are generally introduced due to errors in manual measurement. These values interfere with the accuracy of estimations especially in state-of-the-art techniques like Machine Learning. 
Methods: Therefore, winsorization technique was attempted for the removal of outliers from sheep farm data data for 11 years (2011-2021) for body weights at different ages. Some outliers were deliberately introduced into the data to check the efficiency of the technique. This study was conducted during the year 2021.
Result: Our results indicate that outlier values of 15.3, 42, 44, 60, 90 for birth weight, weaning weight, 6-month, 9 month and 12-month body weight which were far from the normal range were removed using this technique. The mean and standard deviation values were altered after winsorization. Winsorization technique works well for sheep farm data to remove the bias introduced by outliers and also removes, to a large extent, the need for manual outlier removal in data.

Keywords

Body weights Data correction Outliers Sheep data Winsorization

References

  1. Chambers, R., Kokic, P., Smith, P., Cruddas, M. (2000). Winsorization for identifying and treating outliers in business surveys. Proceedings of the Second International Conference on Establishment Surveys, American Statistical Association Alexandria, Virginia. pp. 717-726. 
  2. Gerard-Marchant, P.G. (2007). scipy.stats.mstats.winsorize. https:/ /docs.scipy.org/doc/scipy/reference/generated/scipy. stats.mstats.winsorize.html.
  3. Grubbs, F.E. (1969). Procedures for detecting outlying observations in samples. Technometrics. 11 (1): 1-21. doi: 10.1080/00 401706.1969.10490657. 
  4. Hargrave, M., Clarine, S. (2021). Winsorized Mean. https://www. investopedia.com/terms/w/winsorized_mean.asp.
  5. Hunter, J.D. (2007). Matplotlib: A 2D graphics environment. Computing in Science and Engineering. 9(3): 90-95. 10.1109/MCSE. 2007.55.
  6. Maddala, G.S. (1992). Outliers. Introduction to Econometrics (2nd ed.). New York: MacMillan. pp. 89. ISBN 978-0-02-374545-4. 
  7. Moso, J., Cormier, S., Fouchal, F., de Runz, C., Wandeto, J. (2021). Anomaly Detection on Data Streams for Smart Agriculture. Agriculture. 11: 1083. https://doi.org/ 10.3390/agriculture 11111083.
  8. Widenius, M., Axmark, D., DuBois, P. (2002). Mysql Reference Manual (1st. ed.). O’Reilly and Associates, Inc., USA. ISBN:978-0-596-00265-7. pp 712.

Global Footprints