Water quality warnings based on cluster analysis in Colombian river basins

Edwin Ferney Castillo, Wilmer Fernando Gonzales, David Camilo Corrales, Iván Darío López, Miller Guzmán Hoyos, Apolinar Figueroa, Juan Carlos Corrales


Fresh water is considered one of the most important renewable natural resources in the world. Among all the countries, Colombia is one of the places with the highest water supply, and has five watersheds: the Caribbean, Orinoco, Amazon, Pacific and Catatumbo. It is therefore vital to study and evaluate the water quality of the rivers and/or lotic systems. In recent studies, some scientists made use of biological indices to calculate water quality, while others detected water quality through machine learning techniques. However, these studies do not allow users to easily interpret the results. These investigations motivated us to propose a dataset for generating water quality alerts in Piedras river basin based on the analysis of the K-Means clustering algorithm and C.4.5 classification technique.


Clustering; water quality data; aquatic macro-invertebrates; taxon; C.4.5 decision tree.

Full Text:



Alba-Tercedor, J. (1996). Macroinvertebrados acuáticos y calidad de las aguas de los ríos. In IV Simposio del agua en Andalucía (SIAGA). Almería (vol. 2, pp. 203-213).

Arango, M. C., Álvarez, L. F., Arango, G. A., Torres, O. E., & Monsalve, A. D. J. (2008). Calidad del agua de las quebradas La Cristalina y La Risaralda, San Luis, Antioquia. Revista EIA, 9, 121-141.

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243-256.

Bae, M. J., & Park, Y. S. (2014). Biological early warning system based on the responses of aquatic organisms to disturbances: a review. Science of the Total Environment, 466, 635-649.

Bucak, I. O., & Karlik, B. (2011). Detection of drinking water quality using CMAC based artificial neural Networks. Ekoloji, 20(78), 75-81.

Corrales, D. C., Corrales, J. C., & Figueroa-Casas, A. (2015). Towards detecting crop diseases and pest by supervised learning. Ingeniería y Universidad, 9(1) 207-228. doi:10.11144/Javeriana.iyu19-1.tdcd

Gan, G., Ma, C., & Wu, J. (2007). Data clustering: theory, algorithms, and applications (vol. 20). Philadelphia, PA: Society for Industrial and Applied Mathematics.

González, D. P. (2010). Algoritmos de agrupamiento basados en densidad y validación de clusters (Thesis), Universitat Jaume I: Castellón, España.

Gurrutxaga, I., Muguerza, J., Arbelaitz, O., Pérez, J. M., & Martín, J. I. (2011). Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognition Letters, 32(3), 505-515.

Lin, C. R., & Chen, M. S. (2005). Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. Knowledge and Data Engineering, IEEE Transactions on, 17(2), 145-159.

Liu, S., Tai, H., Ding, Q., Li, D., Xu, L., & Wei, Y. (2013). A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction. Mathematical and Computer Modelling, 58(3), 458-465.

Madhulatha, T. S. (2012). An overview on clustering methods. IOSR Journal of Engineering, 2(4), 719-725.

de-Mantaras, R. L., & Saitia, L. (2004). Comparing conceptual, divisive and agglomerative clustering for learning taxonomies from text. In ECAI 2004: 16th European Conference on Artificial Intelligence, August 22-27, 2004, Valencia, Spain: Including Prestigious Applications of Intelligent Systems (PAIS 2004): Proceedings (vol. 110, p. 435). IOS Press. Addison.

Moreno, A. H. (2000). La clasificación numérica y su aplicación en la ecología. Santo Domingo, República Dominicana: Instituto Tecnológico de Santo Domingo.

Pang-Ning, T., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston, MA: Addison Wesley.

Park, Y. S., Chon, T. S., Kwak, I. S., & Lek, S. (2004). Hierarchical community classification and assessment of aquatic ecosystems using artificial neural networks. Science of the Total Environment, 327(1), 105-122.

Pérez, G. R. (2003). Bioindicación de la calidad del agua en colombia: Propuesta para el uso del método BMWP Col. Medellín, Colombia Universidad de Antioquia.

Pino, W., García, D. M., Mosquera, M. L., Caicedo, K. P., Palacios, J. A., Castro, A. A., & Guerrero, J. E. (2011). Diversidad de macroinvertebrados y evaluación de la calidad del agua de la quebrada La Bendición, municipio de Quibdó (Chocó-Colombia). Acta Biológica Colombiana, 8(2), 23-30.

Quiroz, R., Pla, F., Badia, J. M., Chover, M. (Eds.). (2007). Métodos informáticos avanzados. Castellón, España: Universitat Jaume I.

Rico, C., Paredes, M., & Fernandez, N. (2009). Modelación de la estructura jerárquica de macroinvertebrados bentónicos a través de redes neuronales artificiales. Acta Biológica Colombiana, 14(3), 71-96.

Saraçli, S., Doğan, N., & Doğan, İ. (2013). Comparison of hierarchical cluster analysis methods by cophenetic correlation. Journal of Inequalities and Applications, 2013(1), 1-8.

Sasirekha, K., & Baby, P. (2013). Agglomerative hierarchical clustering algorithm–A Review. International Journal of Scientific and Research Publications, 3(3) [on-line]. Retrieved from


Singh, K. P., & Gupta, S. (2012). Artificial intelligence based modeling for predicting the disinfection by-products in water. Chemometrics and Intelligent Laboratory Systems, 114, 122-131.

Theodoridis, S., Pikrakis, A., Koutroumbas, K., & Cavouras, D. (2010). Introduction to Pattern Recognition: A Matlab Approach. Punta Gorda, FL: Academic Press.

Velmurugan, T., & Santhanam, T. (2010). Computational complexity between K-means and K-medoids clustering algorithms for normal and uniform distributions of data points. Journal of Computer Science, 6(3), 363-368.

Viceministerio de Ambiente. (2010). Política nacional para la gestión integral del recurso hídrico. Bogotá Colombia: Ministerio de Ambiente, Vivienda y Desarrollo Territorial.

DOI: http://dx.doi.org/10.18046/syt.v13i33.2077


  • There are currently no refbacks.

Comments on this article

View all comments