Tuesday, April 23, 2013

A visual approach to sketched symbol recognition

Tom Y. Ouyang and Randall Davis. 2009. A visual approach to sketched symbol recognition. In Proceedings of the 21st international jont conference on Artifical intelligence (IJCAI'09), Hiroaki Kitano (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1463-1468.


This paper proposes a new visual method for sketch recognition. The approach taken represents symbols as feature images, proposes a set of visual features for detecting orientation and endpoints, and introduces an efficient classification technique which is robust to deformations and rotation.

First, strokes are resampled at a constant spatial frequency. Next, each symbol is normalized by translating its center of mass to the origin and scaling it horizontally and vertically so it has unit standard deviation on each axis. Then, five features for each sample point are computed (four for orientation, one for endpoints). These five features are each rendered onto 24 x 24 feature grids. The grids span 2.5 standard deviations of the original symbol's space in each direction. The intensity of a pixel is determined by the maximum feature value of all sample points that fall within that cell. Next, to increase tolerance to shifts and distortions, a Gaussian blur is applied. The image is then down sampled using a MAX filter to 6x6 size.

For recognition, the computed feature images are distance matched (with an image deformation model) to existing templates.

It is necessary to optimize some of these processes so that recognition time remains low. First, a "coarse" metric is established and used before attempting exact matches, this eliminates many of the candidate matches. To do this, images are indexed using their first 128 principal components. Additionally, a branch and bound technique is used to generate a clustering tree. For rotational invariance, 32 orientations of the original feature image are considered.

Testing on the Pens Digits Set revealed 99.2% accuracy, while the HHReco Dataset yielded 98.2% accuracy and the Circuits Dataset showed 96.2% accuracy. The system was capable of classifying about 100 symbols per second on a 2.4GHz processor.

This paper presents a surprising result in that even after significant down-sampling, the feature images are rich enough to provide great accuracy in classification tasks. The paper also illustrates that optimizations can greatly improve the performance of a sketch recognition system, and carefully balances the need for features with the need for processing power. This is overall an elegant solution.

Envisioning sketch recognition: a local feature based approach to recognizing informal sketches

Oltmans, M. (2007). Envisioning sketch recognition: a local feature based approach to recognizing informal sketches (Doctoral dissertation, Massachusetts Institute of Technology).


This dissertation proposes a visual method for recognizing hand drawn sketches. The problem is approached as the recognition of individual "parts" of a sketch. To recognize a sketch, the strokes are first resampled, then a bullseye feature is placed every five pixels of stroke. The bullseye feature is a circular grid which acts as a histogram, counting the amount of ink in each bin. The circles are spaced logarithmically, allowing the outer rings to create context, while the inner rings capture detail. To ensure rotation invariance, the bullseye is rotated to be close to the direction of the stroke. Point orientations are also taken into account when examining a bullseye feature.

The computed features are compared to a codebook of previously learned features and a distance metric identifies the closest match for the bullseye. An SVM is used along with a one-vs-one strategy for classifying each feature. To convert an entire sketch this process is completed on windows of varying sizes across the sketch canvas. Using additional classifiers and overlap information, each area is classified.

This work was evaluated on two datasets and compared against an image-based classifier. When evaluated on the dataset of electrical circuits, the system achieved 89.5% accuracy. On the dataset of PowerPoint style shapes, the system achieved 94.4% accuracy.

The work presented here is a novel modification of vision methods for use on 2D sketches. This system is particularly impressive in its ability to ingest an entire sketch and classify all pieces. There are weaknesses however, for example in the electrical components case the system had to be somewhat modified to understand that unclassified pieces should be considered wires, making this approach not entirely domain independent. It would be interesting to see this technique combined with gesture recognition techniques. I would also like to test it on recognizing geons in 3D sketches.

Tuesday, April 9, 2013

A domain-independent system for sketch recognition

Bo Yu and Shijie Cai. 2003. A domain-independent system for sketch recognition. In Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia (GRAPHITE '03). ACM, New York, NY, USA, 141-146.


This paper examines techniques for a domain independent sketch recognition system. The approach can be divided into two steps: imprecise stroke approximation and post-process. The first step is applied as the user sketches, while the second is applied after the entire drawing is completed. The output of the system is a hierarchy which includes raw stroke data, vertex and primitive shape data, and semantic information.

For stroke approximation, the system uses direction and curvature coupled with feature area. Vertex detection and primitive shape approximation are combined into a single incremental procedure. A recursive approach is first applied, splitting a stroke until its pieces are recognized as primitives. Lines are approximated by fitting a line to the stroke points or the curve graph. Feature area is then used to judge validity. If the ratio between feature area and the candidate line's area is less than 1.0, a line is detected otherwise an arc detector is applied. To detect a circle or ellipse the direction graph should be able to be fitted with a line of slope close to 2PI / n, where n is the total number of stroke points. For arcs, the slope of the line fit to the direction curve should be less t han 2PI/n.

The paper also addresses self intersection strokes. If given a stroke that has self intersection, two copies are made which each follow a different division strategy (one intersection point, one maximum curvature point). The recognition results are then compared and the better is chosen.

The second phase, post-process eliminates false elements, adjust the size and position of elements, and performs object recognition.

This paper contributes some novel techniques, specifically feature area, for performing domain independent recognition of shapes. However, it seems to only work for simple shape drawings. It would be interesting to see this applied to more complex examples.

Sketch recognition algorithms for comparing complex and unpredictable shapes

Martin Field, Stephanie Valentine, Julie Linsey, and Tracy Hammond. 2011. Sketch recognition algorithms for comparing complex and unpredictable shapes. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three (IJCAI'11), Toby Walsh (Ed.), Vol. Volume Three. AAAI Press 2436-2441.


Mechanix is a free sketch recognition system focused on allowing students to solve free body diagram and static truss problems through sketching. The primary purpose is to provide quick feedback to students, while easing the burden of grading a high number of paper submissions for professors and TAs.

Mechanix uses an online process, in which after each stroke is drawn an attempt is made to combine it with others into a more complex shape. Mechanix employs Paleo sketch for low level recognition of strokes. Body identification is a primary task for Mechanix, which is solved by first identifying closed shapes and then entering a comparison phase. The comparison phase is unique, because only one template exists for comparison. An average of the Hausdorff distance, modified Hausdorff distance, and tanimoto coefficient is used with an acceptance coefficient of .65.

Mechanix also faces the issue of identifying trusses as drawn by students. Identifying shared edges is a key step in solving this recognition problem. Mechanix forms a graph with each line segment of the truss being a node and the edges being intersections. Using an elimination of connections technique, a shared edge can then be identified. For the final comparison both the sketch and adjacency graph are considered.

Mechanix represents an application of sketch recognition techniques to a specific domain. It is notable that new techniques were developed and tuned to handle the specific cases of the domain. The paper does not detail the feasibility or conflicts that might arise from attempting to include these techniques in a general recognizer. Further work on a generalized or extended domains approach would be interesting.