Neural-Combinatorial Classifiers for Arabic Decomposable Word Recognition
Abstract
Recognition tools and techniques for Arabic script are still under development due to the topological ambiguities and inflectional nature of this language. In this regard, this paper presents an approach based on a combinatorial optimization technique incorporating convolutional neural networks for Arabic word recognition. We handle a wide vocabulary of Arabic decomposable words. We adopt a design that resembles a molecular cloud with words structured according to their roots and patterns. This conception fits well with the Arabic linguistic philosophy of building words from their roots. Hence, each sub-vocabulary represents a sub-cloud, encompassing neighboring words derived from the same root and following different patterns and forms of derivation, inflection and agglutination (proclitic and enclitic). Hence, each sub-vocabulary represents a sub-cloud, encompassing neighboring words derived from the same root and following different schemes and forms of derivation, inflection and agglutination (proclitic and enclitic). Accordingly, as a first step, we have used a recognition approach based on the metaheuristic method of simulated annealing (SA). In a second work, we implemented the SA algorithm by integrating linguistic knowledge. Extending this work, we choose to integrate a convolutional neural network into the recognition process of the SA algorithm to benefit from the advantages of both methods. To conduct our experiments, which yielded promising results, we use a corpus of Arabic words including samples and agglutinated words from the APTI database.