Sketch Recognition - Spring 2013

Tuesday, April 23, 2013

A visual approach to sketched symbol recognition

Tom Y. Ouyang and Randall Davis. 2009. A visual approach to sketched symbol recognition. In Proceedings of the 21st international jont conference on Artifical intelligence (IJCAI'09), Hiroaki Kitano (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1463-1468.

This paper proposes a new visual method for sketch recognition. The approach taken represents symbols as feature images, proposes a set of visual features for detecting orientation and endpoints, and introduces an efficient classification technique which is robust to deformations and rotation.

First, strokes are resampled at a constant spatial frequency. Next, each symbol is normalized by translating its center of mass to the origin and scaling it horizontally and vertically so it has unit standard deviation on each axis. Then, five features for each sample point are computed (four for orientation, one for endpoints). These five features are each rendered onto 24 x 24 feature grids. The grids span 2.5 standard deviations of the original symbol's space in each direction. The intensity of a pixel is determined by the maximum feature value of all sample points that fall within that cell. Next, to increase tolerance to shifts and distortions, a Gaussian blur is applied. The image is then down sampled using a MAX filter to 6x6 size.

For recognition, the computed feature images are distance matched (with an image deformation model) to existing templates.

It is necessary to optimize some of these processes so that recognition time remains low. First, a "coarse" metric is established and used before attempting exact matches, this eliminates many of the candidate matches. To do this, images are indexed using their first 128 principal components. Additionally, a branch and bound technique is used to generate a clustering tree. For rotational invariance, 32 orientations of the original feature image are considered.

Testing on the Pens Digits Set revealed 99.2% accuracy, while the HHReco Dataset yielded 98.2% accuracy and the Circuits Dataset showed 96.2% accuracy. The system was capable of classifying about 100 symbols per second on a 2.4GHz processor.

This paper presents a surprising result in that even after significant down-sampling, the feature images are rich enough to provide great accuracy in classification tasks. The paper also illustrates that optimizations can greatly improve the performance of a sketch recognition system, and carefully balances the need for features with the need for processing power. This is overall an elegant solution.

Envisioning sketch recognition: a local feature based approach to recognizing informal sketches

Oltmans, M. (2007). Envisioning sketch recognition: a local feature based approach to recognizing informal sketches (Doctoral dissertation, Massachusetts Institute of Technology).

This dissertation proposes a visual method for recognizing hand drawn sketches. The problem is approached as the recognition of individual "parts" of a sketch. To recognize a sketch, the strokes are first resampled, then a bullseye feature is placed every five pixels of stroke. The bullseye feature is a circular grid which acts as a histogram, counting the amount of ink in each bin. The circles are spaced logarithmically, allowing the outer rings to create context, while the inner rings capture detail. To ensure rotation invariance, the bullseye is rotated to be close to the direction of the stroke. Point orientations are also taken into account when examining a bullseye feature.

The computed features are compared to a codebook of previously learned features and a distance metric identifies the closest match for the bullseye. An SVM is used along with a one-vs-one strategy for classifying each feature. To convert an entire sketch this process is completed on windows of varying sizes across the sketch canvas. Using additional classifiers and overlap information, each area is classified.

This work was evaluated on two datasets and compared against an image-based classifier. When evaluated on the dataset of electrical circuits, the system achieved 89.5% accuracy. On the dataset of PowerPoint style shapes, the system achieved 94.4% accuracy.

The work presented here is a novel modification of vision methods for use on 2D sketches. This system is particularly impressive in its ability to ingest an entire sketch and classify all pieces. There are weaknesses however, for example in the electrical components case the system had to be somewhat modified to understand that unclassified pieces should be considered wires, making this approach not entirely domain independent. It would be interesting to see this technique combined with gesture recognition techniques. I would also like to test it on recognizing geons in 3D sketches.

Tuesday, April 9, 2013

A domain-independent system for sketch recognition

Bo Yu and Shijie Cai. 2003. A domain-independent system for sketch recognition. In Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia (GRAPHITE '03). ACM, New York, NY, USA, 141-146.

This paper examines techniques for a domain independent sketch recognition system. The approach can be divided into two steps: imprecise stroke approximation and post-process. The first step is applied as the user sketches, while the second is applied after the entire drawing is completed. The output of the system is a hierarchy which includes raw stroke data, vertex and primitive shape data, and semantic information.

For stroke approximation, the system uses direction and curvature coupled with feature area. Vertex detection and primitive shape approximation are combined into a single incremental procedure. A recursive approach is first applied, splitting a stroke until its pieces are recognized as primitives. Lines are approximated by fitting a line to the stroke points or the curve graph. Feature area is then used to judge validity. If the ratio between feature area and the candidate line's area is less than 1.0, a line is detected otherwise an arc detector is applied. To detect a circle or ellipse the direction graph should be able to be fitted with a line of slope close to 2PI / n, where n is the total number of stroke points. For arcs, the slope of the line fit to the direction curve should be less t han 2PI/n.

The paper also addresses self intersection strokes. If given a stroke that has self intersection, two copies are made which each follow a different division strategy (one intersection point, one maximum curvature point). The recognition results are then compared and the better is chosen.

The second phase, post-process eliminates false elements, adjust the size and position of elements, and performs object recognition.

This paper contributes some novel techniques, specifically feature area, for performing domain independent recognition of shapes. However, it seems to only work for simple shape drawings. It would be interesting to see this applied to more complex examples.

Sketch recognition algorithms for comparing complex and unpredictable shapes

Martin Field, Stephanie Valentine, Julie Linsey, and Tracy Hammond. 2011. Sketch recognition algorithms for comparing complex and unpredictable shapes. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three (IJCAI'11), Toby Walsh (Ed.), Vol. Volume Three. AAAI Press 2436-2441.

Mechanix is a free sketch recognition system focused on allowing students to solve free body diagram and static truss problems through sketching. The primary purpose is to provide quick feedback to students, while easing the burden of grading a high number of paper submissions for professors and TAs.

Mechanix uses an online process, in which after each stroke is drawn an attempt is made to combine it with others into a more complex shape. Mechanix employs Paleo sketch for low level recognition of strokes. Body identification is a primary task for Mechanix, which is solved by first identifying closed shapes and then entering a comparison phase. The comparison phase is unique, because only one template exists for comparison. An average of the Hausdorff distance, modified Hausdorff distance, and tanimoto coefficient is used with an acceptance coefficient of .65.

Mechanix also faces the issue of identifying trusses as drawn by students. Identifying shared edges is a key step in solving this recognition problem. Mechanix forms a graph with each line segment of the truss being a node and the edges being intersections. Using an elimination of connections technique, a shared edge can then be identified. For the final comparison both the sketch and adjacency graph are considered.

Mechanix represents an application of sketch recognition techniques to a specific domain. It is notable that new techniques were developed and tuned to handle the specific cases of the domain. The paper does not detail the feasibility or conflicts that might arise from attempting to include these techniques in a general recognizer. Further work on a generalized or extended domains approach would be interesting.

Wednesday, March 20, 2013

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Conﬁdence Values for Sketch Recognition

What!?! No Rubine Features?: Using Geometric-based Features to
Produce Normalized Conﬁdence Values for Sketch Recognition

This paper explores the merging of geometric recognition techniques and gesture-based recognition techniques into a single recognizer. The recognizer produced is capable of allowing natural sketches to be classified, while offering normalized confidence values for alternative interpretations. The major surprising result is that geometric features are more helpful for recognition than gesture-based features, when given naturally sketched data.

The paper utilizes a statistical classifier, a quadratic classifier, for examining features from both geometric and gesture space. Initially 44 features were used (31 geometric, and 13 Rubine). Testing was conducted using a 50% split of the data, split by user, so that the system was not trained on data from the participant whose sketches it would be attempting to classify. Feature Subset Selection was employed to order the entry of features. Using only the features which were present at least 50% of the time, the system is able to achieve results that are not statistically significantly different from PaleoSketch.

There are several interesting aspects to this paper. First, it is noted that with only the top six features, 93% accuracy is still achievable. Additionally, only one gesture-based feature, total rotation, was chosen as one of the top features. This is interesting because the testing setup was more typical of a real world use system, where the system is not trained with the user, but is independent between users. This provides further evidence that geometric based properties are more useful for user independent designs.

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Brandon Paulson and Tracy Hammond. 2008. PaleoSketch: accurate primitive sketch recognition and beautification. In Proceedings of the 13th international conference on Intelligent user interfaces (IUI '08). ACM, New York, NY, USA, 1-10. DOI=10.1145/1378773.1378775 http://doi.acm.org/10.1145/1378773.1378775

This paper presents PaleoSketch, a system which is capable of classifying eight primitive shapes as well as combinations of the primitives. The system has recognition rates around 98% and attributes its success to using geometric properties, including two new features and a new ranking algorithm for distinguishing polylines from curved segments.

The recognizer as implemented in the paper uses a three phased approach for recognition. First, duplicate points are removed from the stroke, and a series of measures are computed. Among these measures are, Normalized Distance Between Direction Extremes (NDDE) and Direction to Change Ration (DCR). The former, is a measure of the stroke length between the highest and lowest points on a direction plot, yielding higher readings for more gradual arcs, and lower values for more abrupt changes. The latter measurement, is the maximum change in direction divided by the average change in direction. Polylines typically have higher DCR values than curved strokes. An amount of overtrace is computed, and finally the figure is tested for being either open or closed. Next, the stroke data is fed into various recognizers, which each return a boolean flag. Finally, the results are sent to a hierarchy function which sorts for the best fit.

Robust testing was performed using a large dataset with Paleo sketch and indicated an 98.56% accuracy rate.

The most interesting aspect of Paleo Sketch is its ability to return alternative interpretations. This is a valuable resource to have, when higher level components may be able to reason given this information.

Visual Similarity of Pen Gestures

A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '00). ACM, New York, NY, USA, 360-367. DOI=10.1145/332040.332458 http://doi.acm.org/10.1145/332040.332458

This paper aims to determine which gestures users perceive as similar and develop a computational model for predicting perceived similarity of gestures. The primary contribution appears to be for gesture designers, however the paper contributes several new features for gesture recognition, as well as a model for gesture similarity.

Two studies were performed to hep gather data on perception of similarity between gestures. First, 21 participants were asked to view 364 triads of sketches and identify the sketch that was least like the others for each triad. To determine the similarity, MDS was run on the collected data. A regression analysis was then run to determine which of the geometric features correlated with the similarity. The features examined consisted of the Rubine features, and several additional features (mostly logarithmic interpretations) inspired by work done by Attneave, a psychology researcher.

A second trial was conducted with 20 participants and was designed to examine specific aspects of the features identified in the first study. The data from study 1 was used to predict the results of study 2. The derived model predicted accurately about 70% of the time between study 1 and 2. Further, the correlation between prediction of trial 1 and the data from trial two was about 56%.

This paper illustrates that there are several significant factors which affect the perception of similarity between pen gestures. Unsurprisingly, features which take into consideration the logarithmic nature of some perceptual processes provide great benefit to gesture recognition.