Water quality warnings based on cluster analysis in Colombian river basins

Main Article Content

Edwin Ferney Castillo
Wilmer Fernando Gonzales
David Camilo Corrales
Iván Darío López
Miller Guzmán Hoyos
Apolinar Figueroa
Juan Carlos Corrales


Fresh water is considered one of the most important renewable natural resources in the world. Among all the countries, Colombia is one of the places with the highest water supply, and has five watersheds: the Caribbean, Orinoco, Amazon, Pacific and Catatumbo. It is therefore vital to study and evaluate the water quality of the rivers and/or lotic systems. In recent studies, some scientists made use of biological indices to calculate water quality, while others detected water quality through machine learning techniques. However, these studies do not allow users to easily interpret the results. These investigations motivated us to propose a dataset for generating water quality alerts in Piedras river basin based on the analysis of the K-Means clustering algorithm and C.4.5 classification technique.

Article Details

Original Research
Author Biographies

Edwin Ferney Castillo, Universidad del Cauca, Popayán

Currently an undergraduate student of the last semester in Electronics and Telecommunications Engineering at Universidad del Cauca, Colombia. His research interests focus on machine learning, data analysis and the area of Telecommunications and Telematics.

Wilmer Fernando Gonzales, Universidad del Cauca, Popayán

Currently an undergraduate student of the last semester in Electronics and Telecommunications Engineering at Universidad del Cauca, Colombia. His research interests focus on machine learning, data analysis and the area of Telematics.

David Camilo Corrales, Universidad del Cauca, Popayán

Received degrees in Informatics Engineering and Master in Telematics Engineering at Universidad del Cauca, Colombia, in 2011 and 2014 respectively. Currently a PhD student in Telematics Engineering at the Universidad del Cauca and Science and Informatics Technologies at Universidad Carlos III de Madrid. His research interests focus on data mining, machine learning and data analysis.

Iván Darío López, Universidad del Cauca, Popayán

Received the Engineering degree in Information Systems from Universidad del Cauca, Colombia, in 2011, and is an MSc student in Telematics Engineering in the same institute. His current research interests are applications of computational intelligence techniques to modeling and data mining problems.

Miller Guzmán Hoyos, Universidad del Cauca, Popayán

Biologist at Universidad del Cauca, Colombia, and currently an MSc student in Continental Hydrobiological Resources in the same institute. Currently also a researcher at the hydro-biological component of the Group for Environmental Studies at the Universidad del Cauca. His research interests focus on water quality based on benthic macro-invertebrates and water physical and chemical characteristics.

Apolinar Figueroa, Universidad del Cauca, Popayán

Received a degree in biology from Universidad del Cauca, Colombia, in 1982, a master’s degree in Ecology from Universidad de Barcelona, Spain, in 1986, and a PhD in Biological Sciences from Universidad de Valencia, Spain, in 1999. Presently, he is full Professor and leads the Environmental Studies Group at Universidad del Cauca. His research interests focus on environmental impact assessment and biodiversity management.

Juan Carlos Corrales, Universidad del Cauca, Popayán

Engineer (1999) and Master in Telematics Engineering (2004) from the Universidad del Cauca, Colombia, and PhD in sciences, specialty Computer Science, from the University of Versailles Saint-Quentin-en-Yvelines, France (2008). Presently, he is full time Professor and leads the Telematics Engineering Group at the Universidad del Cauca. His research interests focus on service composition and data analysis.


Alba-Tercedor, J. (1996). Macroinvertebrados acuáticos y calidad de las aguas de los ríos. In IV Simposio del agua en Andalucía (SIAGA). Almería (vol. 2, pp. 203-213).

Arango, M. C., Álvarez, L. F., Arango, G. A., Torres, O. E., & Monsalve, A. D. J. (2008). Calidad del agua de las quebradas La Cristalina y La Risaralda, San Luis, Antioquia. Revista EIA, 9, 121-141.

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243-256.

Bae, M. J., & Park, Y. S. (2014). Biological early warning system based on the responses of aquatic organisms to disturbances: a review. Science of the Total Environment, 466, 635-649.

Bucak, I. O., & Karlik, B. (2011). Detection of drinking water quality using CMAC based artificial neural Networks. Ekoloji, 20(78), 75-81.

Corrales, D. C., Corrales, J. C., & Figueroa-Casas, A. (2015). Towards detecting crop diseases and pest by supervised learning. Ingeniería y Universidad, 9(1) 207-228. doi:10.11144/Javeriana.iyu19-1.tdcd

Gan, G., Ma, C., & Wu, J. (2007). Data clustering: theory, algorithms, and applications (vol. 20). Philadelphia, PA: Society for Industrial and Applied Mathematics.

González, D. P. (2010). Algoritmos de agrupamiento basados en densidad y validación de clusters (Thesis), Universitat Jaume I: Castellón, España.

Gurrutxaga, I., Muguerza, J., Arbelaitz, O., Pérez, J. M., & Martín, J. I. (2011). Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognition Letters, 32(3), 505-515.

Lin, C. R., & Chen, M. S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. Knowledge and Data Engineering, IEEE Transactions on, 17(2), 145-159.

Liu, S., Tai, H., Ding, Q., Li, D., Xu, L., & Wei, Y. (2013). A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction. Mathematical and Computer Modelling, 58(3), 458-465.

Madhulatha, T. S. (2012). An overview on clustering methods. IOSR Journal of Engineering, 2(4), 719-725.

de-Mantaras, R. L., & Saitia, L. (2004). Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In ECAI 2004: 16th European Conference on Artificial Intelligence, August 22-27, 2004, Valencia, Spain: Including Prestigious Applications of Intelligent Systems (PAIS 2004): Proceedings (vol. 110, p. 435). IOS Press. Addison.

Moreno, A. H. (2000). La clasificación numérica y su aplicación en la ecología. Santo Domingo, República Dominicana: Instituto Tecnológico de Santo Domingo.

Pang-Ning, T., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston, MA: Addison Wesley.

Park, Y. S., Chon, T. S., Kwak, I. S., & Lek, S. (2004). Hierarchical community classification and assessment of aquatic ecosystems using artificial neural networks. Science of the Total Environment, 327(1), 105-122.

Pérez, G. R. (2003). Bioindicación de la calidad del agua en colombia: Propuesta para el uso del método BMWP Col. Medellín, Colombia Universidad de Antioquia.

Pino, W., García, D. M., Mosquera, M. L., Caicedo, K. P., Palacios, J. A., Castro, A. A., & Guerrero, J. E. (2011). Diversidad de macroinvertebrados y evaluación de la calidad del agua de la quebrada La Bendición, municipio de Quibdó (Chocó-Colombia). Acta Biológica Colombiana, 8(2), 23-30.

Quiroz, R., Pla, F., Badia, J. M., Chover, M. (Eds.). (2007). Métodos informáticos avanzados. Castellón, España: Universitat Jaume I.

Rico, C., Paredes, M., & Fernandez, N. (2009). Modelación de la estructura jerárquica de macroinvertebrados bentónicos a través de redes neuronales artificiales. Acta Biológica Colombiana, 14(3), 71-96.

Saraçli, S., Doğan, N., & Doğan, İ. (2013). Comparison of hierarchical cluster analysis methods by cophenetic correlation. Journal of Inequalities and Applications, 2013(1), 1-8.

Sasirekha, K., & Baby, P. (2013). Agglomerative hierarchical clustering algorithm–A Review. International Journal of Scientific and Research Publications, 3(3) [on-line]. Retrieved from

Singh, K. P., & Gupta, S. (2012). Artificial intelligence based modeling for predicting the disinfection by-products in water. Chemometrics and Intelligent Laboratory Systems, 114, 122-131.

Theodoridis, S., Pikrakis, A., Koutroumbas, K., & Cavouras, D. (2010). Introduction to Pattern Recognition: A Matlab Approach. Punta Gorda, FL: Academic Press.

Velmurugan, T., & Santhanam, T. (2010). Computational complexity between K-means and K-medoids clustering algorithms for normal and uniform distributions of data points. Journal of Computer Science, 6(3), 363-368.

Viceministerio de Ambiente. (2010). Política nacional para la gestión integral del recurso hídrico. Bogotá Colombia: Ministerio de Ambiente, Vivienda y Desarrollo Territorial.