A new dataset for coffee rust detection in Colombian crops base on classifiers

  • David Camilo Corrales Universidad del Cauca, Popayán
  • Agapito Ledezma Universidad Carlos III, Madrid
  • Andrés J. Peña Q. Centro Nacional de Investigación del Cafe, Chinchiná
  • Javier Hoyos Supracafé,Popayán
  • Apolinar Figueroa Universidad del Cauca, Popayán
  • Juan Carlos Corrales Universidad del Cauca, Popayán
Keywords: Coffee Rust, Classifier, SVR, BPNN, M5


Coffee production is the main agricultural activity in Colombia. More than 350.000 Colombian families depend on coffee harvest. Since coffee rust disease was first reported in the country in 1983, these families have had to face severe consequences. Recently, machine learning approaches have built a dataset for monitoring coffee rust incidence that involves weather conditions and physic crop properties. This background encouraged us to build a dataset for coffee rust detection in Colombian crops through data mining process as Cross Industry Standard Process for Data Mining (CRISP-DM). In this paper we define a proper data to generate accurate models; once the dataset is built, this is tested using classifiers as: Support Vector Regression, Backpropagation Neural Networks and Regression Trees.


Download data is not yet available.

Author Biographies

David Camilo Corrales, Universidad del Cauca, Popayán

M.Sc., in Telematics Engineering and researcher of Telematics Engineering Group and Environmental Study Group at University of Cauca, Colombia.

Agapito Ledezma, Universidad Carlos III, Madrid

Ph.D., in Sciences, Speciality Computer Engineering and  Full Professor at University Carlos III of Madrid

Andrés J. Peña Q., Centro Nacional de Investigación del Cafe, Chinchiná

M.Sc., in Meteorology and researcher at National Coffee Research Center (Colombia).

Javier Hoyos, Supracafé,Popayán

Agronomic Engineer and Farmer Manager of Los Naranjos (Supracafé - Colombia).

Apolinar Figueroa, Universidad del Cauca, Popayán
Doctor of Biological Sciences, and  Full Professor and Leader of the Environmental Study Group at University of Cauca, Colombia.
Juan Carlos Corrales, Universidad del Cauca, Popayán
Doctor of Philosophy in Sciences, Speciality Computer Science, and  Full Professor and Leader of the Telematics Engineering Group at University of Cauca, Colombia.


Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. Decision Support Systems, 45(1), 110-122. doi: http://dx.doi.org/10.1016/j.dss.2007.12.002

Armstrong, J.S. & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(1), 69-80. doi: http://dx.doi.org/10.1016/0169-2070(92)90008-W

Balasundaram, S. & Gupta, D. (2014). Training Lagrangian twin support vector regression via unconstrained convex minimization. Knowledge-Based Systems, 59(0), 85-96. doi: http://dx.doi.org/10.1016/j.knosys.2014.01.018

Becker, S. (1979) La propagación de la roya del cafeto: Eschborn, Alemania GTZ.

Bonakdar, L. & Etemad-Shahidi, A. (2011). Predicting wave run-up on rubble-mound structures using M5 model tree. Ocean Engineering, 38(1), 111-118. doi: http://dx.doi.org/10.1016/j.oceaneng.2010.09.015

Cintra, M.E., Meira, C.A.A., Monard, M.C., Camargo, H.A., & Rodrigues, L.H.A. (2011, 22-24 Nov. 2011). The use of fuzzy decision trees for coffee rust warning in Brazilian crops. Paper presented at the Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on.

Cristianini, N. & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods: Cambridge, UK: Cambridge University Press.

Dietterich, T.G. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Mach. Learn., 40(2), 139-157. doi: 10.1023/a:1007607513941

Ghosh, J. (2002). Multiclassifier systems: back to the future. Lecture Notes in Computer Sciences [Third International Workshop, MCS 2002 Cagliari, Italy, June 24-26, 2002 Proceedings], 2364, 1-15

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1), 10-18. doi: 10.1145/1656274.1656278

Haykin, S.S. ( 2003). Neural networks: a comprehensive foundation: Prentice Hall.

Huitema, B.E. (1980). The Analysis of Covariance and Alternatives: John Wiley & Sons.

Hyndman, R.J. & Koehler, A.B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679-688. doi: http://dx.doi.org/10.1016/j.ijforecast.2006.03.001

Kim, Y. & Street, W.N. (2004). An intelligent system for customer targeting: a data mining approach. Decision Support Systems, 37(2), 215-228. doi: http://dx.doi.org/10.1016/S0167-9236(03)00008-3

Li, L., Zou, B., Hu, Q., Wu, X., & Yu, D. (2013). Dynamic classifier ensemble using classification confidence. Neurocomputing, 99(0), 581-591. doi: http://dx.doi.org/10.1016/j.neucom.2012.07.026

Luaces, O., Rodrigues, L.H.A., Alves-Meira, C.A., & Bahamonde, A. (2011). Using nondeterministic learners to alert on coffee rust disease. Expert Systems with Applications, 38(11), 14276-14283. doi: http://dx.doi.org/10.1016/j.eswa.2011.05.003

Luaces, O., Rodrigues, L.H.A., Meira, C.A.A., Jos, #233, Quevedo, R., & Bahamonde, A. (2010). Viability of an alarm predictor for coffee rust disease using interval regression. In Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems, Cordoba, Spain, [Vol. 2] (pp.337-346]. Berlin, Alemania: Springer-Varlag

Mannino, M., Yang, Y., & Ryu, Y. (2009). Classification algorithm sensitivity to training data with non representative attribute noise. Decision Support Systems, 46(3), 743-751. doi: http://dx.doi.org/10.1016/j.dss.2008.11.021

Meira, C., Rodrigues, L., & Moraes, S. (2008). Análise da epidemia da ferrugem do cafeeiro com árvore de decisão. Tropical Plant Pathology, 33(2), 114-124.

Meira, C.A.A., & Rodrigues, L.H.A. (2009). Árvore de decisão na análise de epidemias da ferrugem do cafeeiro [Paper - VI Simpósio de Pesquisa dos Cafés do Brasil]. Retrieved from: http://www.sbicafe.ufv.br/bitstream/handle/10820/3466/56.pdf?sequence=2

Meira, C.A.A., Rodrigues, L.H.A., & Moraes, S.A.d. (2009). Modelos de alerta para o controle da ferrugem-do-cafeeiro em lavouras com alta carga pendente. Pesquisa Agropecuária Brasileira, 44, 233-242.

Monedero, I., Biscarri, F., León, C., Guerrero, J. I., Biscarri, J., & Millán, R. (2012). Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. International Journal of Electrical Power & Energy Systems, 34(1), 90-98. doi: http://dx.doi.org/10.1016/j.ijepes.2011.09.009

Opitz, D. & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.

Pérez-Ariza, C.B., Nicholson, A.E., & Flores, M.J. (2012). Prediction of Coffee Rust Disease Using Bayesian Networks, Proceedings of the Sixth European Workshop on Probabilistic Graphical Models, (pp.259-266). Available at http://arrow.monash.edu.au/hdl/1959.1/821316

Poh, H.L. (1991). A neural network approach for marketing strategies research and decision support [Ph.D Thesis], Stanford University

Ranawana, R. & Palade, V. (2006). Multi-Classifier systems: Review and a roadmap for developers. Int. J. Hybrid Intell. Syst., 3(1), 35-61

Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-Validation. In L. Liu & T. Özsu [Eds.], Encyclopedia of Database Systems (pp. 532-538): Springer

Rivillas-Osorio, C., Serna-Giraldo, C., Cristancho-Ardila, M., & Gaitán-Bustamante, A. (2011). La roya del cafeto en Colombia, impacto, manejo y costos de control. In S. Marín [Ed.], Avances Tecnicos Cenicafe. Chinchiná, Colombia: Cenicafé

Shieber, E. & Zentmyer, G. A. (1984). Coffee rust in the western hemisphere Plant disease, 68, 89-93

Smola, A. & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222. doi: 10.1023/b:stco.0000035301.49549.88

Suhasini, A., Palanivel, S., & Ramalingam, V. (2011). Multimodel decision support system for psychiatry problem. Expert Systems with Applications, 38(5), 4990-4997. doi: http://dx.doi.org/10.1016/j.eswa.2010.09.152

Vapnik, V.N. ( 2000). The nature of statistical learning theory. New York, NY: Springer.

Vapnik, V.N. (1999). An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5), 988-999. doi: 10.1109/72.788640

Wang, Y., & Witten, I.H. (1996). Induction of model trees for predicting continuous classes. Working Paper Series, 96(23). Retrieved from de http://www.cs.waikato.ac.nz/pubs/wp/1996/uow-cs-wp-1996-23.pdf

Wei, C.-P., Chen, H.-C., & Cheng, T.-H. (2008). Effective spam filtering: A single-class learning and ensemble approach. Decision Support Systems, 45(3), 491-503. doi: http://dx.doi.org/10.1016/j.dss.2007.06.010

Wirth, R. (2000). CRISP-DM: Towards a standard process model for data mining. In Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, Manchester, UK, (pp29-39).

Zapata, J.C. & Ruíz, G.M. (1988). La variedad Colombia: selección de un cultivar compuesto resistente a la roya del cafeto [Premio Nacional de Ciencias, Fundación Alejandro Angel Escobar, 1986]. Chinchiná, Colombia: Cenicafé

Zhang, D., & Tsai, J. J. P. (2007). Advances in MacHine learning applications in software engineering: Hershey, PA: Idea

Zhu, D. (2010). A hybrid approach for efficient ensembles. Decision Support Systems, 48(3), 480-487. doi: http://dx.doi.org/10.1016/j.dss.2009.06.007
Original Research