Development of Methods and Algorithms for Augmenting the Texts with Additional Information

Zhamilya Bimagambetova, Arukyz Sundetulla, Syrym Moldash

Abstract


We have here explored different ways of text augmentation to explain each of them. The purpose of the article is to show methods of augmentation and calculate which one shows the best result in terms of the amount of new data created and the similarity of this data with the original. To do this, we use the subtitles for the movie as data and run our algorithm on each phrase in these subtitles.

References


Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent

neural network for sentiment classification.

Simon Tong and Daphne Koller. 2002. Support vector machine active learning with

applications to text classification.

Patrice Simard, Yann LeCun, John S. Denker, and Bernard Victorri. 1998. Transformation invariance in pattern recognition-tangent distance and tangent propagation.

Oleksandr Kolomiyets, Steven Bethard, and MarieFrancine Moens. 2011. Model-portability experiments for textual temporal analysis.

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding

back-translation at scale.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa.

Natural language processing (almost) from scratch.

Xiang Zhang, Junbo Zhao and Yann LeCun. 2016. Character-level Convolutional

Networks for Text Classification

Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le. 2020. Unsupervised Data Augmentation for Consistency Training

Toxic Comment Classification Challenge. 2018.

https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data

Jason Wei1,2 Kai Zou3. 2019. EDA: Easy Data Augmentation Techniques for

Boosting Performance on Text Classification Tasks


Full Text: PDF

Refbacks

  • There are currently no refbacks.