Sketch Recognition - Spring 2013: 2013

Tuesday, April 23, 2013

A visual approach to sketched symbol recognition

Tom Y. Ouyang and Randall Davis. 2009. A visual approach to sketched symbol recognition. In Proceedings of the 21st international jont conference on Artifical intelligence (IJCAI'09), Hiroaki Kitano (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1463-1468.

This paper proposes a new visual method for sketch recognition. The approach taken represents symbols as feature images, proposes a set of visual features for detecting orientation and endpoints, and introduces an efficient classification technique which is robust to deformations and rotation.

First, strokes are resampled at a constant spatial frequency. Next, each symbol is normalized by translating its center of mass to the origin and scaling it horizontally and vertically so it has unit standard deviation on each axis. Then, five features for each sample point are computed (four for orientation, one for endpoints). These five features are each rendered onto 24 x 24 feature grids. The grids span 2.5 standard deviations of the original symbol's space in each direction. The intensity of a pixel is determined by the maximum feature value of all sample points that fall within that cell. Next, to increase tolerance to shifts and distortions, a Gaussian blur is applied. The image is then down sampled using a MAX filter to 6x6 size.

For recognition, the computed feature images are distance matched (with an image deformation model) to existing templates.

It is necessary to optimize some of these processes so that recognition time remains low. First, a "coarse" metric is established and used before attempting exact matches, this eliminates many of the candidate matches. To do this, images are indexed using their first 128 principal components. Additionally, a branch and bound technique is used to generate a clustering tree. For rotational invariance, 32 orientations of the original feature image are considered.

Testing on the Pens Digits Set revealed 99.2% accuracy, while the HHReco Dataset yielded 98.2% accuracy and the Circuits Dataset showed 96.2% accuracy. The system was capable of classifying about 100 symbols per second on a 2.4GHz processor.

This paper presents a surprising result in that even after significant down-sampling, the feature images are rich enough to provide great accuracy in classification tasks. The paper also illustrates that optimizations can greatly improve the performance of a sketch recognition system, and carefully balances the need for features with the need for processing power. This is overall an elegant solution.

Envisioning sketch recognition: a local feature based approach to recognizing informal sketches

Oltmans, M. (2007). Envisioning sketch recognition: a local feature based approach to recognizing informal sketches (Doctoral dissertation, Massachusetts Institute of Technology).

This dissertation proposes a visual method for recognizing hand drawn sketches. The problem is approached as the recognition of individual "parts" of a sketch. To recognize a sketch, the strokes are first resampled, then a bullseye feature is placed every five pixels of stroke. The bullseye feature is a circular grid which acts as a histogram, counting the amount of ink in each bin. The circles are spaced logarithmically, allowing the outer rings to create context, while the inner rings capture detail. To ensure rotation invariance, the bullseye is rotated to be close to the direction of the stroke. Point orientations are also taken into account when examining a bullseye feature.

The computed features are compared to a codebook of previously learned features and a distance metric identifies the closest match for the bullseye. An SVM is used along with a one-vs-one strategy for classifying each feature. To convert an entire sketch this process is completed on windows of varying sizes across the sketch canvas. Using additional classifiers and overlap information, each area is classified.

This work was evaluated on two datasets and compared against an image-based classifier. When evaluated on the dataset of electrical circuits, the system achieved 89.5% accuracy. On the dataset of PowerPoint style shapes, the system achieved 94.4% accuracy.

The work presented here is a novel modification of vision methods for use on 2D sketches. This system is particularly impressive in its ability to ingest an entire sketch and classify all pieces. There are weaknesses however, for example in the electrical components case the system had to be somewhat modified to understand that unclassified pieces should be considered wires, making this approach not entirely domain independent. It would be interesting to see this technique combined with gesture recognition techniques. I would also like to test it on recognizing geons in 3D sketches.

Tuesday, April 9, 2013

A domain-independent system for sketch recognition

Bo Yu and Shijie Cai. 2003. A domain-independent system for sketch recognition. In Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia (GRAPHITE '03). ACM, New York, NY, USA, 141-146.

This paper examines techniques for a domain independent sketch recognition system. The approach can be divided into two steps: imprecise stroke approximation and post-process. The first step is applied as the user sketches, while the second is applied after the entire drawing is completed. The output of the system is a hierarchy which includes raw stroke data, vertex and primitive shape data, and semantic information.

For stroke approximation, the system uses direction and curvature coupled with feature area. Vertex detection and primitive shape approximation are combined into a single incremental procedure. A recursive approach is first applied, splitting a stroke until its pieces are recognized as primitives. Lines are approximated by fitting a line to the stroke points or the curve graph. Feature area is then used to judge validity. If the ratio between feature area and the candidate line's area is less than 1.0, a line is detected otherwise an arc detector is applied. To detect a circle or ellipse the direction graph should be able to be fitted with a line of slope close to 2PI / n, where n is the total number of stroke points. For arcs, the slope of the line fit to the direction curve should be less t han 2PI/n.

The paper also addresses self intersection strokes. If given a stroke that has self intersection, two copies are made which each follow a different division strategy (one intersection point, one maximum curvature point). The recognition results are then compared and the better is chosen.

The second phase, post-process eliminates false elements, adjust the size and position of elements, and performs object recognition.

This paper contributes some novel techniques, specifically feature area, for performing domain independent recognition of shapes. However, it seems to only work for simple shape drawings. It would be interesting to see this applied to more complex examples.

Sketch recognition algorithms for comparing complex and unpredictable shapes

Martin Field, Stephanie Valentine, Julie Linsey, and Tracy Hammond. 2011. Sketch recognition algorithms for comparing complex and unpredictable shapes. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three (IJCAI'11), Toby Walsh (Ed.), Vol. Volume Three. AAAI Press 2436-2441.

Mechanix is a free sketch recognition system focused on allowing students to solve free body diagram and static truss problems through sketching. The primary purpose is to provide quick feedback to students, while easing the burden of grading a high number of paper submissions for professors and TAs.

Mechanix uses an online process, in which after each stroke is drawn an attempt is made to combine it with others into a more complex shape. Mechanix employs Paleo sketch for low level recognition of strokes. Body identification is a primary task for Mechanix, which is solved by first identifying closed shapes and then entering a comparison phase. The comparison phase is unique, because only one template exists for comparison. An average of the Hausdorff distance, modified Hausdorff distance, and tanimoto coefficient is used with an acceptance coefficient of .65.

Mechanix also faces the issue of identifying trusses as drawn by students. Identifying shared edges is a key step in solving this recognition problem. Mechanix forms a graph with each line segment of the truss being a node and the edges being intersections. Using an elimination of connections technique, a shared edge can then be identified. For the final comparison both the sketch and adjacency graph are considered.

Mechanix represents an application of sketch recognition techniques to a specific domain. It is notable that new techniques were developed and tuned to handle the specific cases of the domain. The paper does not detail the feasibility or conflicts that might arise from attempting to include these techniques in a general recognizer. Further work on a generalized or extended domains approach would be interesting.

Wednesday, March 20, 2013

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Conﬁdence Values for Sketch Recognition

What!?! No Rubine Features?: Using Geometric-based Features to
Produce Normalized Conﬁdence Values for Sketch Recognition

This paper explores the merging of geometric recognition techniques and gesture-based recognition techniques into a single recognizer. The recognizer produced is capable of allowing natural sketches to be classified, while offering normalized confidence values for alternative interpretations. The major surprising result is that geometric features are more helpful for recognition than gesture-based features, when given naturally sketched data.

The paper utilizes a statistical classifier, a quadratic classifier, for examining features from both geometric and gesture space. Initially 44 features were used (31 geometric, and 13 Rubine). Testing was conducted using a 50% split of the data, split by user, so that the system was not trained on data from the participant whose sketches it would be attempting to classify. Feature Subset Selection was employed to order the entry of features. Using only the features which were present at least 50% of the time, the system is able to achieve results that are not statistically significantly different from PaleoSketch.

There are several interesting aspects to this paper. First, it is noted that with only the top six features, 93% accuracy is still achievable. Additionally, only one gesture-based feature, total rotation, was chosen as one of the top features. This is interesting because the testing setup was more typical of a real world use system, where the system is not trained with the user, but is independent between users. This provides further evidence that geometric based properties are more useful for user independent designs.

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Brandon Paulson and Tracy Hammond. 2008. PaleoSketch: accurate primitive sketch recognition and beautification. In Proceedings of the 13th international conference on Intelligent user interfaces (IUI '08). ACM, New York, NY, USA, 1-10. DOI=10.1145/1378773.1378775 http://doi.acm.org/10.1145/1378773.1378775

This paper presents PaleoSketch, a system which is capable of classifying eight primitive shapes as well as combinations of the primitives. The system has recognition rates around 98% and attributes its success to using geometric properties, including two new features and a new ranking algorithm for distinguishing polylines from curved segments.

The recognizer as implemented in the paper uses a three phased approach for recognition. First, duplicate points are removed from the stroke, and a series of measures are computed. Among these measures are, Normalized Distance Between Direction Extremes (NDDE) and Direction to Change Ration (DCR). The former, is a measure of the stroke length between the highest and lowest points on a direction plot, yielding higher readings for more gradual arcs, and lower values for more abrupt changes. The latter measurement, is the maximum change in direction divided by the average change in direction. Polylines typically have higher DCR values than curved strokes. An amount of overtrace is computed, and finally the figure is tested for being either open or closed. Next, the stroke data is fed into various recognizers, which each return a boolean flag. Finally, the results are sent to a hierarchy function which sorts for the best fit.

Robust testing was performed using a large dataset with Paleo sketch and indicated an 98.56% accuracy rate.

The most interesting aspect of Paleo Sketch is its ability to return alternative interpretations. This is a valuable resource to have, when higher level components may be able to reason given this information.

Visual Similarity of Pen Gestures

A. Chris Long, Jr., James A. Landay, Lawrence A. Rowe, and Joseph Michiels. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '00). ACM, New York, NY, USA, 360-367. DOI=10.1145/332040.332458 http://doi.acm.org/10.1145/332040.332458

This paper aims to determine which gestures users perceive as similar and develop a computational model for predicting perceived similarity of gestures. The primary contribution appears to be for gesture designers, however the paper contributes several new features for gesture recognition, as well as a model for gesture similarity.

Two studies were performed to hep gather data on perception of similarity between gestures. First, 21 participants were asked to view 364 triads of sketches and identify the sketch that was least like the others for each triad. To determine the similarity, MDS was run on the collected data. A regression analysis was then run to determine which of the geometric features correlated with the similarity. The features examined consisted of the Rubine features, and several additional features (mostly logarithmic interpretations) inspired by work done by Attneave, a psychology researcher.

A second trial was conducted with 20 participants and was designed to examine specific aspects of the features identified in the first study. The data from study 1 was used to predict the results of study 2. The derived model predicted accurately about 70% of the time between study 1 and 2. Further, the correlation between prediction of trial 1 and the data from trial two was about 56%.

This paper illustrates that there are several significant factors which affect the perception of similarity between pen gestures. Unsurprisingly, features which take into consideration the logarithmic nature of some perceptual processes provide great benefit to gesture recognition.

Tuesday, March 5, 2013

Sketch Based Interfaces: Early Processing for Sketch Understanding

Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis. 2006. Sketch based interfaces: early processing for sketch understanding. In ACM SIGGRAPH 2006 Courses (SIGGRAPH '06). ACM, New York, NY, USA, , Article 22 . DOI=10.1145/1185657.1185783 http://doi.acm.org/10.1145/1185657.1185783

This paper describes a sketch interface which uses multiple sources of knowledge to provide processing of freehand sketches. The approach consists of three phases: approximation, beautification, and basic recognition. The paper spends most of its time describing approximation. Approximation uses data from stroke direction and stroke speed. An interesting aspect of the work is the hybrid fit technique, which generates a set of potential fits and selects the best one. A practical approach to handling discretization of bezier curves is also provided for handling curves. Beautification is briefly discussed as adjusting the slopes of lines using a clustering method. Basic recognition is carried out using a set of hand tailored templates. An evaluation was conducted in which 13 of 14 students liked this system better than a non-sketch alternative tool. The system displayed 96% accuracy.

This work has definite application domains, but the state of the system described in this paper seems unprepared for general usage. Specifically, the hand-tailored recognition templates seem a bit terrifying, but perhaps in a restricted domain that isn't an issue. The user study conducted for this system was very informal and probably shouldn't have been reported in this paper. My final complaint is about beautification. I don't like it in some cases, and I think depending on the software, the interface must support a more interactive experience using the pen, rather than just converting what I sketch into primitives.

Wednesday, February 27, 2013

Protractor: a fast and accurate gesture recognizer

Yang Li. 2010. Protractor: a fast and accurate gesture recognizer. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 2169-2172. DOI=10.1145/1753326.1753654 http://doi.acm.org/10.1145/1753326.1753654

This paper discusses Protractor, a template-based gesture recognizer focused on low memory requirements and fast classification. Protractor uses a nearest neighbor approach, learning from user input training data. The gesture can be specified to be sensitive to orientation, or invariant. The preprocessing of the gesture is similar to the $1 recognizer, with 16 points total used. Classification is accomplished using the optimal angular distances (inverse cosine distance between template and sample vector values). Protractor also uses a closed form solution to find a rotation of the gesture which leads to the least distance. Protractor was compared to the $1 recognizer and was found to perform similarly, but with slightly faster response times.

Protractor is a promising contribution for mobile devices constrained for processing power and memory. It offers a simple algorithm for user defined gestures.

Wednesday, February 6, 2013

"Those look similar!" issues in automating gesture design advice

A. Chris Long, James A. Landay, and Lawrence A. Rowe. 2001. "Those look similar!" issues in automating gesture design advice. In Proceedings of the 2001 workshop on Perceptive user interfaces (PUI '01). ACM, New York, NY, USA, 1-5. DOI=10.1145/971478.971510 http://doi.acm.org/10.1145/971478.971510

This paper discusses developing a system for application designers to use for judging the similarity of gestures. The paper claims that both human perceptual similarity and computer recognition similarity are considered in advising the designer. The human perceptual data is based on a very limited sampling of which gestures people believed to be similar or different. The machine recognition similarity is based on how well the gesture is distinguished using the Rubine algorithm.

Overall this paper's results are questionable at best. The paper makes several contradictory claims and uses a limited sample size as its basis. Furthermore, the authors follow their "intuition" rather than designing rigorous studies, and admit that their system is incorrect in several cases. Much of the paper is spent describing basic HCI which is not related to gestures. The paper offers very few technical descriptions. In summary, this paper is of limited use for modern gesture design.

Tuesday, January 29, 2013

Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

Jacob O. Wobbrock, Andrew D. Wilson, and Yang Li. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings of the 20th annual ACM symposium on User interface software and technology (UIST '07). ACM, New York, NY, USA, 159-168. DOI=10.1145/1294211.1294238 http://doi.acm.org/10.1145/1294211.1294238

This 2007 paper offers instructions for a simple gesture recognizer that will recognize gestures using a template matching technique. The major goal of the work is to provide prototypers with an easy but somewhat robust way to implement gesture recognition in their applications.

The algorithm is based on resampling, rotating, and translating the stroke data, followed by finding the nearest-neighbor example. The method supports adding new examples to the templates database to allow for different strokes. Online data capture is not necessary, as temporal information is not analyzed. A Golden Section Search is used to find the appropriate rotation.

The authors conducted tests which show the $1 recognizer to be as good as the Rubine algorithm and Dynamic Time Warping. However the algorithm suffers from being unable to detect the differences between circles and ovals or squares and rectangles. Horizontal and vertical lines can also be troublesome, but can be mitigated using a modification to the scaling method.

Specifying Gestures by Example

Dean Rubine. 1991. Specifying gestures by example. In Proceedings of the 18th annual conference on Computer graphics and interactive techniques (SIGGRAPH '91). ACM, New York, NY, USA, 329-337. DOI=10.1145/122718.122753 http://doi.acm.org/10.1145/122718.122753

This 1991 paper offers a simple feature based method for detecting single-stroke gestures. The paper provides 13 features, two which are optional (based on temporal information), and a simple linear classifier. The features are derived from single stroke input (with temporal information) using simple geometric reasoning techniques. The features were derived from empirical observation of what worked well. The GRANDMA toolkit (an implementation) allows users to teach new gestures to the classifier.

The paper admits that more work must be done to distinguish certain gestures, for which the 13 features are not enough. Additionally, the methods described may be extended to support eager classification or multi-finger recognition.

Overall a very interesting paper for its time. Defines a simple and easy to use method for basic gesture recognition which uses temporal information.

Tuesday, January 22, 2013

Sketchpad - Ivan E. Sutherland

Paper Bibliography
Sutherland, I. E. (1964, January). Sketch pad a man-machine graphical communication system. In Proceedings of the SHARE design automation workshop (pp. 6-329). ACM.

Ivan Sutherland's Sketchpad system created not only a new modality for interaction with a computer, but also contributed greatly to the development of object oriented programming, CAD software, and the graphical user interface in general.

As current day computer scientists read this paper, they may be confused, that is until they see the date it was originally published: 1963. The paper spends ample time discussing not the novel interface of a pen, but rather the novelty of the algorithms and design which underpinned the content created through the pen. The early roots of object oriented programming can be seen as Sutherland describes the creation of instances from a master. Furthermore, the ideas of polymorphism and inheritance are touched upon in that though instances are originally copies of their masters, they can also be modified to be different from their original masters.

Aside from pioneering major aspects of object oriented programming, Sutherland's program reflects many of the core components of modern day CAD software. The ability to apply constraints and relationships to and between drawn shapes was pivotal in giving computer software a role in drafting and design problems. With Sutherland's creation, the computer suddenly became a more competent and faster solution than traditional drafting.

Finally, the concept of screen graphics and pen input was extraordinarily ahead of its time. In an age where graphical displays were novel, Sutherland's work materialized concepts that were only seen as a distant vision in the course of technology. The undoubtable impact of this work is evident by simply surveying the changes and innovations which have occurred since its publication. While there have been vast increases in speed and accuracy, along with expansions of this work, it is fair to say that much of the original work found in this paper remains as the basis for modern systems.

Wednesday, January 16, 2013

Introduction

Email me at zmhenkel [at] tamu [dot] edu

Howdy! My name is Zack Henkel. This is my third semester as a PhD student at TAMU in Computer Engineering. I've answered all of the questions for this entry below, but if you want to know more about my research interests or me, you can visit my personal website at https://zmhenkel.com .

Why are you taking this class?
I'm taking this class because I would like to know more about sketch recognition and how it is accomplished. I'm not familiar with any of the techniques, algorithms, or limitations of sketch recognition, so I'm looking forward to learning them. I also think that sketch recognition is applicable to my area of study, human-robot interaction.

What experience do you bring to this class?
My undergrad is in Computer Science, so I bring with me all the fundamentals from that. More interestingly though, I enjoy working on the hardware side of things and have experience from doing robotics research for a few years. I also have a strong background in psychology and am experienced at running large-scale human participant studies. I also really enjoy working with video and anything to do with visual effects. I spent a few years self-employed making local commercials and other video projects. Finally, I enjoy web-based technologies and have a lot of experience building web-based solutions for a wide variety of applications.

What are your professional life goals?
My main goal professionally is to always work on technologies which have a social impact. More concretely, I've had fun developing a couple of small businesses and would like to spend more time doing things like that. I'm also extremely interested in teaching and leading research at the university level.

What are your personal life goals?
My personal life goals are vague. I want to experience as many things as possible and try to understand things from as many different perspectives as possible. I guess my main personal life goal is to contribute to the world in a positive way.

What do you want to do after you graduate?
Postdoc? A Faculty Position? Start a business? I don't know!

What do you expect to be doing in 10 years?
Something amazing. I can't be any more specific than that.

What do you think will be the next biggest technological advancement in computer science?
It seems like it will be in the area of human-computer interaction. Modalities like gesture and speech are becoming popular at the consumer level. I think eye-gaze tracking will become an important part of regular interactions soon. It will be the fusion of these different modalities into something that works and is enjoyable to use.

If you could travel back in time, who would you like to meet and why?
Jesus, the central figure of Christianity. I grew up studying Christianity and a lot of people I know make their decisions based on things that Jesus said, so I would have a ton of questions for him.

Describe your favorite shoes and why they are your favorite?
They are tannish colored with dark blue stripes. They are comfortable and functional.

If you could be fluent in any foreign language that you're not already fluent in, which one would it be and why?
German or Russian, because they generally sound kind of intense and angry when saying everyday phrases.

Give some interesting fact/story about yourself:
I really like live productions of any kind: theatre, concerts, etc. Specifically I like all the technical behind the scenes stuff. I used to do lighting design for local theatre and bands, and I still enjoy even the smallest technical roles.