Title: Design of a mathematical expression understanding system
Authors: Lee, HJ
Wang, JS
National Chiao Tung University
Department of Computer Science
Keywords: character segmentation;character recognition;expression formation;error correction
Issue Date: 1-Mar-1997
Abstract: A scientific document usually consists of text and mathematical expressions. In this paper, we present a system for segmenting and understanding text and mathematical expressions in a document, The system can be divided into six stages: page segmentation and labeling, character segmentation, feature extraction, character recognition, expression formation, and error correction and expression extraction. After we extract all text lines in a document, we separate all symbols in each text line and calculate direction-feature vectors and aspect ratios for those symbols. Then, a nearest-neighbor algorithm recognizes characters. In the expression formation stage, we build a symbol relation tree for each text line that represents the relationships among the symbols in the text line. Each text line is decomposed into a collection of primitive tokens: operands, operators and separators. Heuristic rules based on these primitive tokens are used to correct text recognition errors. Finally, we extract all mathematical expressions according to basic expression forms. Several pages of documents were scanned to test the method. All mathematical expressions are understood. In the expressions generated, a few symbols are misrecognized. The average recognition rate was 96.16%. (C) 1997 Elsevier Science B.V.
URI: http://hdl.handle.net/11536/695
ISSN: 0167-8655
Volume: 18
Issue: 3
Begin Page: 289
End Page: 298
Appears in Collections:Articles

Files in This Item:

  1. A1997WZ62900008.pdf