Text segmentation by language

Robin Cabeza Ruiz

doi:10.18046/syt.v14i38.2289

Authors

Robin Cabeza Ruiz University of Holguín

DOI:

https://doi.org/10.18046/syt.v14i38.2289

Keywords:

Hidden Markov model, text segmentation by language, natural language processing.

Abstract

There are two approaches for text segmentation by language: first, assuming that language changes happen in the “border” between sentences (never within a sentence); second, assuming that language changes can happen anyplace in the text. This work presents methods for both types of text’s segmentation by languages. On the first proposal, the text is initially segmented by sentence, then the language of each sentence is obtained; the second proposal is an adaptation of hidden Markov model to this task. Both cases, according to results obtained in experimental proofs, exceed the state of art.

Author Biography

Robin Cabeza Ruiz, University of Holguín

Bachelor’s degree in Computer Science from Universidad de Oriente (2015) and student of Master in Design Assisted by Computer at the Universidad de Holguín [UHo], Cuba. Currently he is professor of programming and member of CAD/CAM Studies Center at the Faculty of Engineering of UHo, where he researches about biomechanical

Text segmentation by language

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

Developed By

Language

Information