<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss'><id>tag:blogger.com,1999:blog-7226341358299927342</id><updated>2009-09-08T23:11:40.700-07:00</updated><title type='text'>Kevin's Blog for Gesture Recognition</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default?start-index=26&amp;max-results=25'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>40</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-8401154899672430155</id><published>2008-05-11T11:26:00.001-07:00</published><updated>2008-05-11T11:26:50.904-07:00</updated><title type='text'>TIKL: Development of a Wearable Vibrotactile Feedback Suit for Improved Human Motor Learning</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;The goal of this paper is to create a vibrotactile feedback suit that teach people to learn motions. Based on this idea, they build a system called TIKL (Tactile Interaction for Kinesthetic Learning). With this suit, the learner can learn using multiple channels because it can give tactile feedback on every joint simultaneously. In constrast, a traditional teacher can only teach the people by correcting each joint one by one. Such a system can be used in sports training, motor rehabilitation, dance, postural retraining for health, etc. The feedback system consists of four main modules: users, vicon motion capture system, control software and motor-system feedback.&lt;br /&gt;&lt;br /&gt;They tested it with a simple motion: holding a fixed position with their right arm. 40 people were tested on this motion, among which only 20 of them provided with the additional vibration feedback. Their results using a 5-DOF robotic suit show a 27% improvement in accuracy while performing the target motion, and an accelerated learning rate of up to 23%.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I like the idea of this paper. It would be particularly useful for training motions for a group of people. However, I have a few concerns on this paper. First, the experiment they included in the paper was very simple, and it is hard to imagine how the system would help to improve learning for a more complex dynamic gesture. Second, different people have different skeleton sizes, so how to retarget the teacher's reference motion to fit learners' skeleton. This retargeting is needed because, for different users, the same joint values won't guarantee the motions look like the same or generate the same effect.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-8401154899672430155?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/8401154899672430155/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=8401154899672430155' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8401154899672430155'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8401154899672430155'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/tikl-development-of-wearable.html' title='TIKL: Development of a Wearable Vibrotactile Feedback Suit for Improved Human Motor Learning'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-3915632653829194987</id><published>2008-05-11T10:41:00.000-07:00</published><updated>2008-05-11T10:42:00.551-07:00</updated><title type='text'>FreeDrawer – A Free-Form Sketching System on the Responsive Workbench</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;This paper presents a sketching system for spline-based free-form sufaces based on a 'Responsive Workbench'. They propose 3D tools for curve drawing and deformation techniques for curves and surfaces. The user directly draws curves in the virtual environment, using a tracked stylus as an input device.&lt;br /&gt;&lt;br /&gt;They claim their interface has the following advantages: closed-form parametric representations, easy transfer into standard CAD packages, fast triangulation and evaluation algorithms, infinitesimal smoothness of curves and surfaces and efficient deformation algorithms based on variational modeling.&lt;br /&gt;&lt;br /&gt;A drawer can draw in the virtual 3D space freely to create a curve network. One can also change it after creation. After this, surfaces can be filled based on these created curves. They also offer a variety of modification tools: curve smoothing and sharpening, curve dragging and surface sculpting. In this paper, they also demonstrate how a drawer can create a seat by going through the above steps.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I like this paper because the usage of splines is pretty clever, which avoids possible undesired curve shapes when drawing in 3D. But the biggest problem, I think, is that how a naive user is able to know which curve he or she should draw. Take look at the teapot result, it's probably hard for me to draw this. In addition, it seems this system can only generate some simple 3D objects.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-3915632653829194987?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/3915632653829194987/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=3915632653829194987' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/3915632653829194987'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/3915632653829194987'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/freedrawer-free-form-sketching-system.html' title='FreeDrawer – A Free-Form Sketching System on the Responsive Workbench'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-3534777792056911482</id><published>2008-05-11T10:08:00.001-07:00</published><updated>2008-05-11T10:08:53.079-07:00</updated><title type='text'>American Sign Language Finger Spelling Recognition System</title><content type='html'>&lt;p&gt;&lt;br /&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper presents a system that recognizes letters in American Sign Language, and show them on the screen with the sound of the recognized letter. They use neural network to train the postures. And each posture is a 18-dimensional vector, collected by a CyberGlove. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;The neural network they use is a perceptron network because they claim that this generates the best recognition results. The perceptron network was trained using an 18x24 input matrix and a 24x24 target matrix, a identity matrix. Because the letters 'J' and 'Z' are not static postures, they just omitted them in the training set.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Their testing results showed the perceptron network got recognition accuracy of 90%, which is for user-dependent testing. And they didn't train and test the perceptron for user-independent case.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper is only a two-page paper, so not a lot information we can gain from it. Everything they showed here is pretty simple. The obvious drawback is that they omitted the two letters ('J' and 'Z'), which make the recognition system would not work for the whole set of ASL letters. Another thing is that they only trained and tested postures for user-dependent case, so it is hard to say how high the accuracy would be for general users.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-3534777792056911482?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/3534777792056911482/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=3534777792056911482' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/3534777792056911482'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/3534777792056911482'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/american-sign-language-finger-spelling.html' title='American Sign Language Finger Spelling Recognition System'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-8776203685664831632</id><published>2008-05-11T09:54:00.003-07:00</published><updated>2008-05-11T09:54:57.380-07:00</updated><title type='text'>Invariant features for 3-D gesture recognition</title><content type='html'>[Summery]&lt;br /&gt;&lt;br /&gt;This paper reports the recognition results of 10 different feature vectors for gesture recognition. To investigate this, they compared recognition performance on a set of 18 T’ai Chi gestures. They construct a training set of 54 six-gesture “sentences;” and test on another 54 sentence set. A “sentence” is a set of six gestures performed in sequence which captures the co-articulation exhibited by T’ai Chi.&lt;br /&gt;&lt;br /&gt;The recognition method they use is HMM. It has 5 states, forward chaining with jumps possible to the same state or each of the next two states.So every kind of feature is trained with HMM. The two kinds of feature vectors are the raw position , the Cartesian velocity, the polar velocity with angular velocity term , the polar velocity with tangential velocity term, and two sets with instantaneous speed and local curvature.&lt;br /&gt;&lt;br /&gt;They separate the testing data into three groups: 'original', 'shifted', 'rotated'. The results showed that the feature vector (dr,dtheta,dz) had the best overall recognition rates (95%), while the raw feature vector (x,y,z) gave the lowest recogntion rates (34%). In addition, all feature sets perform worse on shifted and rotated data.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;They only reported their results on a specific HMM topology, so I was wondering how the results will be if using other topologies. In addition, their reports on different kinds of feature vectors is only for two-hand-and-head gestures. However, my biggest concern is why they use T'ai Chi gestures as their testing gestures. From the images in the paper, all the gestures are performed by a man just sitting in a chair, but actually T'ai Chi should be performed with the full-body.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-8776203685664831632?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/8776203685664831632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=8776203685664831632' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8776203685664831632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8776203685664831632'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/invariant-features-for-3-d-gesture.html' title='Invariant features for 3-D gesture recognition'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-8706810425350644129</id><published>2008-05-11T09:54:00.001-07:00</published><updated>2008-05-11T09:54:29.175-07:00</updated><title type='text'>Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation</title><content type='html'>[Summery]&lt;br /&gt;&lt;br /&gt;This paper presents an interface, Cyber Composer, that lets user to control the tonality and the melody of the music by hand motion  and gestures that they generate. Pitch, rhythm and volume of the melody can be controlled and generated in real-time by wearing a pair of CyberGloves and a Polhemus Fastrak.&lt;br /&gt;&lt;br /&gt;The Cyber Composer system is composed of several interface modules: the music interface, CyberGlove interface, background music generation module, melody generation module, and the main program which links all the components together.  Seven musical expressions are mapped to specific gestures: rhythm, pitch, pitch-shifting, dynamics, volume, dual-instrument mode, and cadence.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;The biggest problem of reading this paper is that they didn't mention how they did the gesture recognition and how the results looked like. They only defined a set of gestures to control the characteristic of music. Although the gestures defined in this paper looks reasonable, it's hard to say it is practical to use without any experiments. In addition, we need to notice that they constrained the number of notes that a user can perform by specifying a overall tonal base, which would make the system works only for some simple music.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-8706810425350644129?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/8706810425350644129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=8706810425350644129' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8706810425350644129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8706810425350644129'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/cyber-composer-hand-gesture-driven.html' title='Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-8133318108984030085</id><published>2008-05-01T23:55:00.000-07:00</published><updated>2008-05-11T09:49:01.874-07:00</updated><title type='text'>Using Ultrasonic Hand Tracking to Augment Motion Analysis Based Recognition of Manipulative Gestures</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;Instead of using data gloves or accelerometers, this paper presents a novel hand gesture capture method by adopting ultrasonic sensors. Because using only the ultrasonic sensors would arise some issues, such as reflections and occlusions. The authors actually combine the ultrasonic sensors with accelerometer and gyroscope sensors.&lt;br /&gt;&lt;br /&gt;The authors tried different classification techniques: model based classification, frame based classification and fused classification.&lt;br /&gt;&lt;br /&gt;For the model based calssification, a set of left-right HMMs are trained to recognize gestures. For frame based classification, they tried a so called C4.5 classifier and a k-nearest neighbor classifier. For the fused classification, the final lassification is then based on a combination of above two classifiers' rankings and the associated probabilities.&lt;br /&gt;&lt;br /&gt;To validate their method of fusing inertial and ultrasonic sensor data, they set up an experiment comprising various manipulative gestures based on a bicycle repair task. Within this experiment: a set of sensors are used: (a) ultrasonic senors for distance measurement, (b) acceleration sensors and (c) gyroscopes, the latter two types to capture the motion of relevant body parts of the user. Three users were asked to perform 21 bicycle repair tasks three different times. The results showed that frame-based classification with the accelerometer and gyroscope data produced a 84-percent accuracy. A Model-Based Time Series classification approach gave 65-percent accuracy. Fusion classification results improved the classification accuracy. Use of the ultrasonic sensor data resulted in a 90% accuracy.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I like this paper since it is the first paper I read using ultrasonic sensors to help collect hand gesture data. I know there's a Siggraph paper talking about using ultrasonic sensors to capture human body motion last year. I think this paper is pretty novel since the authors in this paper said they are the first one using ultrasonic sensors to classify gestures. But I don't know if this is the first paper to use ultrasonic to capture motion data. This paper also pointed out the problems of using ultrasonic sensors alone, which are kind of limit the application of ultrasonic sensors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-8133318108984030085?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/8133318108984030085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=8133318108984030085' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8133318108984030085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8133318108984030085'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/using-ultrasonic-hand-tracking-to.html' title='Using Ultrasonic Hand Tracking to Augment Motion Analysis Based Recognition of Manipulative Gestures'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-667736297890984734</id><published>2008-05-01T22:35:00.000-07:00</published><updated>2008-05-11T09:49:16.506-07:00</updated><title type='text'>Enabling Fast and Effortless Customisation in Accelerometer Based Gesture Interaction</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;The purpose of this paper is to make the training of HMMs are more easy and efficient. The most time-consuming part of HMMs' training may be data collection. To get high accruacy, the training data also need to be segmented, which is very expensive since there is no automatical and practical segmentation method so far. This paper tries to alleviate this problem by adding noises to the captured data to generate a largger set of training data.&lt;br /&gt;&lt;br /&gt;In this paper, accelerometers are used as gesture capturing device. They use a vector quantized codebook of size 8 and then perform recognition using HMMs. They experimented with different types of noise: uniformly distribution noise and Gaussian distribution noise.&lt;br /&gt;&lt;br /&gt;For experiments, the authors tested 8 gestures used to control a DVD player. For this set of eight gestures, each trained with two original gestures and with two Gaussian noise-distorted duplicates, the average recognition accuracy was 97%, and with two original gestures and with four noise-distorted duplicates, the average recognition accuracy was 98%, cross-validated from a total data set of 240 gestures. And they also found that the Gaussian distributed noise is slightly better than the uniformly distributed noise.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I think it was a good idea adding some noise to enlarge the training data set. And this work's comparison between different noises also give us some experimental proofs for this idea.&lt;br /&gt;&lt;br /&gt;However, there may be a few things need to be considered. For example, how to determine the parameters of the noise efficiently, how to determine the number of noised data that should be added and whether it would be possible to decrease the recognition accuracy for some other gestures.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-667736297890984734?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/667736297890984734/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=667736297890984734' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/667736297890984734'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/667736297890984734'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/enabling-fast-and-effortless.html' title='Enabling Fast and Effortless Customisation in Accelerometer Based Gesture Interaction'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-2983095489684215261</id><published>2008-05-01T21:54:00.000-07:00</published><updated>2008-05-11T09:49:25.110-07:00</updated><title type='text'>3D Visual Detection of Correct NGT Sign Production</title><content type='html'>&lt;p&gt;&lt;/p&gt;&lt;p&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;In this paper, the authors create a system that can help people in learning Dutch Sign Language (DSL). The recognition system is vision-based. Two calibrated video cameras are set on top on a table where people perform their hand gestures. The user's head and hands are tracked based on following skin-colored segments of the image from frame to frame. The head is used as a stationary reference point. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;The adaptive chrominance model can work with different lighting and backgrounds. Skin color is modeled by a 2D Gaussian per-pendicular to the main direction of the distribu-tion of the positive skin samples in RGB space. Tracking the hands and head is done separately in both cameras by following their respective blobs over consecutive frames or, when hand blobs cannot be separated (due to occlusion), by performing a template search over skin areas using the gray image of the hand in the pre-vious frame. The hands and head locations are reinitialized by their position using the three largest skin blobs in the image and tracked by finding the nearest blob or best template match in the next frame.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;For classification, fifty different properties have been derived that are related to the 2D/3D location and movement of the hands. These properties are measured in each frame. And each property is trained as one classifier.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;A set of 120 different NGT signs performed by 70 individuals are used to test the sign classification. They also perform the cross validation, and the overall recognition accuray is 95%. They compare their results with linear time warping and dynamic time warping. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;One problem of this vision-based recognition system is that they cannot recognize the hand-crossing gestures correctly since they use simply the left blob as the left hand and right blob as the right hand. There's another problem which is also a common problem for vision-based tracking is that the occlusion problem. Becuase they put the video camera very close to each other (15cm) and both pointing at the hands from the similar direction, it is hard to avoid that there would be some occlusion parts in video.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Another thing is that I really don't think train each feature separately is a good idea since differen feature are related rather than independent. In addition, 50 features may be too many, why they didn't consider using some feature selection techniques.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-2983095489684215261?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/2983095489684215261/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=2983095489684215261' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/2983095489684215261'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/2983095489684215261'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/3d-visual-detection-of-correct-ngt-sign.html' title='3D Visual Detection of Correct NGT Sign Production'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-2290537313849828910</id><published>2008-05-01T21:01:00.000-07:00</published><updated>2008-05-11T09:49:46.415-07:00</updated><title type='text'>Gesture Recognition Using an Acceleration Sensor and Its Application to Musical Performance Control</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;This paper presents a hand gesture recognition system that is applied to the music performance control. The gestures are captured with an 3D accelerometer. The author extract a new set of feature parameters from the 3D acceleration instead of use this acceleration directly. The new feature space is still 3-dimensional, and it is just different combinations of 2D projections.&lt;br /&gt;&lt;br /&gt;In order to recognize a gesture from the acceleration time series, the start of the gesture must be detected. They used a user-dependent magnitude of acceleration to identify the start of gestures. The musical performance control also need recognization of mucical tempo, which are recognized by simply looking at the y-z accelerations. In addition, the rhythm points can be identified in real time by detecting the maxima that appear most periodically.&lt;br /&gt;&lt;br /&gt;In this paper, they also tested the musical performance control with 10 gestures. They achieved an accuracy of 100% for the user-dependent training, and 70%~100% accuracy for the user-independent training.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I think this is an interesting paper in term of their application of the hand gesture recognition. A big challenge for recognize sequential gestures is to segment the data streams. This paper did this by detecting the starting point of a gesture. From their experiments, the recognition performance is pretty good. But I guess it is probably because the starting points of the testing gestures are easy to detect. Since I don't believe there're simple and practical method to do the segmentation and reocognition jobs on complicated gestures so far.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-2290537313849828910?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/2290537313849828910/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=2290537313849828910' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/2290537313849828910'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/2290537313849828910'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/gesture-recognition-using-acceleration.html' title='Gesture Recognition Using an Acceleration Sensor and Its Application to Musical Performance Control'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-8943312596667076119</id><published>2008-05-01T13:14:00.001-07:00</published><updated>2008-05-11T09:49:55.449-07:00</updated><title type='text'>Hand gesture modelling and recognition involving changing shapes and trajectories, using a Predictive EigenTracker</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;This paper presents an approach to recognize hand gestures in video. Both the hand shape and hand position information are involved into the recognition process. They call their method as "Predictive EigenTracker". In addition, this method also allows users to choose a gesture vocabulary so as to maximize recognition accuracy.&lt;br /&gt;&lt;br /&gt;Basically, they employ a Particle Filtering (condensation) predictive framework to track the hand first. The dynamic model used in this tracker is a second-order Markov chain with noise. The tracker is initialized by detecting the hand skin color.&lt;br /&gt;&lt;br /&gt;After tracking the hand position in video, a shape-trajectories eigenspace is modeled by principle components analysis. And then Mahalanobis distance between gestures are computed to help users to select a proper gesture set with highest accuracy.&lt;br /&gt;&lt;br /&gt;To demonstrate the performance, they showed an application of their tracker and recognizer, controlling an audio player with hand gestures. This application ended up with a 100% accuracy.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I think this system will not be robust. The only result shown in this paper is a set of very simple gestures with totally different color as background, and the performer are in a black shirt, which makes the tracking problem very easy. Moreover, this system cannot work on a larger set of gestures. This paper only showed us a set with 8 gestures.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-8943312596667076119?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/8943312596667076119/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=8943312596667076119' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8943312596667076119'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8943312596667076119'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/hand-gesture-modelling-and-recognition.html' title='Hand gesture modelling and recognition involving changing shapes and trajectories, using a Predictive EigenTracker'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-8530066161324403499</id><published>2008-05-01T13:10:00.002-07:00</published><updated>2008-05-11T09:50:06.190-07:00</updated><title type='text'>Television Control by Hand Gestures</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;This paper presents a vision-based approach for controlling a television using hand gestures. To make the gesture based interface easier to use, the author use the visual feed back of the television display. This also avoid the problem that the user may need to remember a lot of complicated hand gestures. The user uses only one gesture: the open hand, facing the camera. He controls the television by moving his hand. On the display, a hand icon appears which follows the user's hand. The user can then move his own hand to adjust various graphical controls with the hand icon.&lt;br /&gt;&lt;br /&gt;The open hand presents a characteristic image which the computer can detect and track. They perform a normalized correlation of a template hand to the image to analyze the user's hand. A local orientation representation is used to achieve some robustness to lighting variations.&lt;br /&gt;&lt;br /&gt;They built a real-time prototype by recognize hand gestures on a computer and then translate the gestures into the remote control signal to control a TV. They also demonstrate that it is a tradeoff between the system response time and field-of-view.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I have to say that this television-control application is not that useful since it is so easy to control a television just with a remote control. To make this work, an extra camara must go with the television, increasing the cost of the television. In addition, it is obvious that performing hand gestures to control TVs is much more tiring. More importantly, it is hard to guarantee that the recognition process is robust enough to do this control.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-8530066161324403499?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/8530066161324403499/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=8530066161324403499' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8530066161324403499'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8530066161324403499'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/television-control-by-hand-gestures.html' title='Television Control by Hand Gestures'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-3686474740045426530</id><published>2008-05-01T13:10:00.001-07:00</published><updated>2008-05-11T09:50:15.415-07:00</updated><title type='text'>A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence</title><content type='html'>&lt;p&gt;&lt;/p&gt;&lt;p&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper presents a recognition method for sequential Japenese sign language words as a part of automatical JSL interpretion. They actually extended previous work for sequential sign language recognition by the following techniques: 1) a method to detect the borders of signed words from ordinary sign-language gestures, (2) a method to detect whether the signed gesture is represented by one hand or both hands, and (3) a method for segregating the segments representing the singed words from the transitional gesture segments.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;A glove-based input device (CyberGlove) is used as input. The word-recognition step identifies each signed word represented in an inputted gesture. The signed words are described by combining the gesture primitives, such as hand shape, palm direction, linear motion, and circular motion. During the recognition process, the gesture primitives are identified from the inputted gesture, then the signed word is recognized by the time and spatial relationship between the gesture primitives. The gesture-segmentation step detects the borders of the signed words and divides the gestures into severalsegments representing words or transitions. The hand-determinationstep determines whether the gesture in each segment is represented by one or both hands. The wordtransition distinction step differentiates between the gestures that represent words and those that representtransitions. The word-allocation step analyzes the relationshipbetween the recognized signed-words and the segments, then assigns the words to the segments. Finally, the sequence-generation step combines the recognized signed-words and generates a sequence of words.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;They collected 200 samples of JSL sentences. The samples included 960 words. Among them, 100 sentences were used to determine the parameters, and the other 100 sentences were used to evaluate the methods. The results show that the accuracy for the word was improved from 77.6% to 86.6%, and the accuracy for the sentence was improved from 46.0% to 58.0% by using the developed methods. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;As an extension work from previous work, this paper added three extra techniques that help to improve the sequential gesture data. These three techniques are very intuitive and reasonable to add into the system, so the recognition accuracies increased. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;However, the recognition performance (58%) for a signed sentence is far from adequate for a practical system. And the author also mention this, and they suggested that this problem might be solved by improving recognition accuracy for the signed word as well as developing a method to recognize non-manual gestures such as nods, glances, and facial expressions, which are used to convey grammaticalinformation in sign language.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-3686474740045426530?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/3686474740045426530/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=3686474740045426530' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/3686474740045426530'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/3686474740045426530'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/method-for-recognizing-sequence-of-sign.html' title='A Method for Recognizing a Sequence of Sign Language Words Represented in a Japanese Sign Language Sentence'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-6304111068347161460</id><published>2008-05-01T13:09:00.004-07:00</published><updated>2008-05-11T09:50:27.877-07:00</updated><title type='text'>American Sign Language Recognition in Game Development for Deaf Children</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;This paper presents an American Sign Language (ASL) game, CopyCat, that helps deaf children to learn and practice ASL. The system recognize hand postures performed by children and control an animated character with these recognized sign language. The database of signing samples was collected from user studies of deaf children playing aWizard of Oz version of the game at the Atlanta Area School for the Deaf (AASD). The dataset consisted of 541 phrase samples and 1,959 individual sign samples of five children signing game phrases from a 22 word vocabulary.&lt;br /&gt;&lt;br /&gt;The children wear small colored gloves with wireless accelerometers mounted on the back of their wrists. The hand shape information is captured from a three-axis accelerometer data and computer-vision-based method. The collected data is used as features to train hidden Markov models for recognition.&lt;br /&gt;&lt;br /&gt;Their recognition approach uses color histogram adaptationfor robust hand segmentation and tracking. The vision data are combined with (x, y, z) values from each accelerometer. And this feature vector is then fed into a 4-state, left-right HMM that was implemented with the Georgia Tech Gesture Toolkit.&lt;br /&gt;&lt;br /&gt;They evaluated our approach by using leave–one–out validation; this technique iterates through each child, training on data from four children and testing on the remaining child's data. They achieved average word accuracies per child ranging from 91.75% to 73.73% for the user–independent models, while the average sentense accuracies are very low, 68% for user-dependent models and 50% for user-independent models.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;I like this paper's idea that use a game to help deaf children learn ASL. I think it is a very good application for hand gesture recognition. And they have relatively complete user-study results.&lt;br /&gt;&lt;br /&gt;Their implementation is based on the Georgia Tech Gesture Toolkit, and the word recognition accuracies are good while sentense recognition accuracies are poor (50% for user-independent model). It indicates the problem of the hidden markov model based recognition, that is, that the performance is not good enough for unsegmented data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-6304111068347161460?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/6304111068347161460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=6304111068347161460' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6304111068347161460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6304111068347161460'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/american-sign-language-recognition-in.html' title='American Sign Language Recognition in Game Development for Deaf Children'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-7676702015507273913</id><published>2008-05-01T13:09:00.003-07:00</published><updated>2008-05-11T09:50:37.885-07:00</updated><title type='text'>A Hidden Markov Model Based Sensor Fusion Approach for Recognizing Continuous Human Grasping Sequences</title><content type='html'>&lt;p&gt;&lt;/p&gt;&lt;p&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper uses Hidden Markov Model to recognize human grasping motions. This system is a part of Programming by Demonstration (PbD), aiming at teaching a robot to accomplish a task by learning from a human demonstration. In this paper, they want robots to 'understand' what a human grasping 'mean'.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;In order to capture the human grasps, they use an 18-sensor CyberGlove and 16 pressure sensitive sensors. Because computer vision based method usually can neither deal with occlusion problem nor detect contact points between human hands and objects. In addition, these 16 pressure sensitive sensors also give the force information rather than just 'touching or not touching'.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;The authors classify grasps according to manipulation primitives defined by the Kamakura taxonomy. This taxonomy distinguishes 14 different grasp types: 5 power grasps, 4 intermediate grasps, 4 precision grasps, and 1 thumbless grasp. They chose to use this taxonomy because it places no restrictions on the handled objects or domain, and also because it focuses more on the hand shape and fingers involved rather than on the purpose of the grasp.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;They learn a Hidden Markov model for each of the 14 grasp primitives and use the outputs from the CyberGlove and pressure sensors as data features. And another HMM is also created to represent the rest motion. Beside these, a 'garbage model' is created to model other unwanted motions. Each HMM has 9 states, and it is a flat topology model.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;The training is based on 112 example motions. They also test the HMMs on other 112 testing examples. An accuracy of up to 92.2% for a single user system, and 90.9% for a multiple user system could be achieved.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Grasping motion recognition is a hard problem, since extra contacting information should be modeled, besides the gestures itself. For just a single object, there would be a lof of different grasps. And this grasp recognition can be a part of an augmented reality interface. Most of hand gesture applications just focus on the gesture itself without considering the interact between human and the environment. Those hand gesture recognition systems are more like a 'commanding system'. But one can easily imagine that an actually augmented reality interface should make the interaction between human and visual enviroment as real as possible, so grasping motion is a topic that very essential.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-7676702015507273913?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/7676702015507273913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=7676702015507273913' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/7676702015507273913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/7676702015507273913'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/hidden-markov-model-based-sensor-fusion.html' title='A Hidden Markov Model Based Sensor Fusion Approach for Recognizing Continuous Human Grasping Sequences'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-6815010716038325746</id><published>2008-05-01T13:09:00.001-07:00</published><updated>2008-05-11T09:50:54.226-07:00</updated><title type='text'>Computer Vision-Based Gesture Recognition for an Augmented Reality Interface</title><content type='html'>&lt;p&gt;&lt;/p&gt;&lt;p&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper presents a computer vision-based gesture recognition system as a part of an augmented reality system. It can recognize a 3D pointing gesture, a click gesture, and five static gestures. Each of these five static gestures has one several fingers outstretched, e.g., the third gesture has three fingers outstretched. Choosing these gestures is because the author think these gestures would be the minimum requires for a AR interface and also they are easy to recognize.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;The task of the low-level segmentation is to detect andrecognise the above mentioned PHO and pointers, as wellas hands in the 2D images captured with the HMC. They use normalised RGB, also called chromaticities, to achieve invariance to the intensity, which are calculated by dividing the RGB elements with their first norm. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;After having segmented the hand pixels from the image, they first detect the number of outstretched fingers to recognize the above five static gestures and then handles the point and click gestures. They simply count the number of "rectangles" which correspond to the fingers. The "point and click" gesture is recognized simply by being in the state where a single finger is extended and the user "clicks" by quickly extending and bringing back in the thumb. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;The biggest problem of this paper is that they didn't really provide any recognition result, even for this very simple gesture set. They only claimed their recognition was "sufficient" to be used in a AR interface.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;And their recognition method is totally heuristic, and it is impossible to use it in a larger gesture set. Most of the efforts are actually on the computer-vision based segmentation.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-6815010716038325746?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/6815010716038325746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=6815010716038325746' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6815010716038325746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6815010716038325746'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/computer-vision-based-gesture.html' title='Computer Vision-Based Gesture Recognition for an Augmented Reality Interface'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-7969262224106829522</id><published>2008-05-01T13:08:00.001-07:00</published><updated>2008-05-11T09:51:04.923-07:00</updated><title type='text'>A Spatio-temporal Extension to Isomap Nonlinear Dimension Reduction</title><content type='html'>&lt;p&gt;&lt;/p&gt;&lt;p&gt;[Summary]&lt;/p&gt;&lt;p&gt;This paper extended the standard Isomap algorithm for data with both spatial and temporal relationships. Basically, a embedding space is learned from the tempo-spatial data, and the similar gestures are put in close locations. Two instantiations of ST-Isomap are presented for sequentially continuous and segmented data. Continuous ST-Isomap is suited for uncovering spatio-temporal manifolds of data exhibiting temporal coherence, where sequentially adjacent samples are incrementally different. Segmented ST-Isomap is suited for uncoveringspatio-temporal clusters in segmented data, where the input data is prepartitioned.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Two added two more techniques into the standard Isomap: proximal disambiguation of spatially proximal data points in the input space that are structurally different, and distal correspondence of spatially distal data points in the input space that share common structure. With these two extra steps, the algorithm can distinguish behaviors (e.g., "wave left" and "wave right") that should be separated to distal locations in the embedding space, and can also put close similar gestures (e.g., "low wave" and "high wave") that should be placed into proximity in the resulting embedding.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;They tested the ST-Isomap on robonaut sensor data and human motion data. For the robonaut sensor data, they applied continuous ST-Isomap. The data are 57-dimensional grasping motions. They compare the embeddings between from PCA and from ST-Isomap, and ST-Isomap gives more reasonable embedding. For the human motions, ST-Isomap also gives an embedding with obvious structures. These tests demonstrate that ST-Isomap should be a better dimensionality reduction algorithm for spatial-temporal data.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;I selected this paper because I thought it would give some hint to deal with temparal information embedded in hand gestures. We already see some papers using PCA, but PCA is not really suitable for the hand gestures, which are not static. And Isomap is superior to the PCA, so I thought this extension from standard Isomap would help.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;From their results, I did see some advantages that other dimensionality reduction algorithms cannot obtain. And after the dimensionality reduction, one can perform recognition in the low-dimensional space more efficiently and easier, since, actually, the dimensionality reduction itself would cluster the similar hand gestures.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;However, there are some obvious disadvantages for ST-Isomap. For example, there are two very important parameters (C_CTN and C_ATN) that are hard to specify. These two parameters actually balance spatial similarity and temporal similarity.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-7969262224106829522?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/7969262224106829522/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=7969262224106829522' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/7969262224106829522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/7969262224106829522'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/spatio-temporal-extension-to-isomap.html' title='A Spatio-temporal Extension to Isomap Nonlinear Dimension Reduction'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-6140274465603831814</id><published>2008-05-01T13:07:00.000-07:00</published><updated>2008-05-11T09:51:13.505-07:00</updated><title type='text'>Articulated Hand Tracking by PCA-ICA Approach</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;This paper uses PCA and ICA to model hand gestures. PCA can only be used to represent the global features, so they use ICA (independent component analysis) to represent local features. To build the model, they first capture different hand gestures with a data glove. Each frame contains 20 degree of freedom, and they resample 100 frames to get a 2000-dimensional vector. As other old approach, PCA is performed on these 2000-dimensional vectors to reduce their dimensionality. They choose the first 5 principal components which preserve 95% of the data energy. Then they perform ICA on each individual principal component in the PCA subspace.&lt;br /&gt;&lt;br /&gt;They tested the learned PCA-ICA model by tracking a hand on video. In this experiment, they use particle filtering to perform the tracking, and the learned model is used as the dynamic model in the particle filtering.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;The main idea is that the author tried to build a model to represent hand gestures. This might be useful, and this learned model might be used to recognize hand gestures, synthesize new hand motions or do the video compression.&lt;br /&gt;&lt;br /&gt;However, I am actually not convinced by the only vision tracking experiment, since I think the tracking result would be good even without the learned PCA-ICA model. They probably could do some cross validation to show that the learned PCA-ICA model actually can cover the hand gesture space.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-6140274465603831814?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/6140274465603831814/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=6140274465603831814' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6140274465603831814'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6140274465603831814'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/05/articulated-hand-tracking-by-pca-ica.html' title='Articulated Hand Tracking by PCA-ICA Approach'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-5726013651728711045</id><published>2008-04-28T11:37:00.002-07:00</published><updated>2008-05-11T09:51:24.384-07:00</updated><title type='text'>Toward Natural Gesture Speech HCI A Case Study of Weather Narration</title><content type='html'>&lt;p&gt;&lt;br /&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper discusses a case study of weather narration to analyze co-occurrence of different gestures with some spoken keywords. It aims to study the interaction between speech and gesture, demonstrate the power of a gesture and speech-based HCI, and to show that speech can be used to help the system increase the recognition accuracy for gestures.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;The author uses HMM to recognize weatherman's gestures. A left-to-right HMM with three main phases are built. The three phases are: a preparation phase, a retraction phase and an actual stroke phase. The actual stroke phase includes four kinds of gestures, pointing, area, contour, and rest. They chooses a 10 dimensional vector as a gesture feature set, including the distances between the center of hands and the center of the face, the angles between the vertical and the distance vectors, and the velocities of the above parameters. The training process a set of 20 well formed isolated gestures samples, and tested on 12 test samples of isolated gestures and 4 continuos data.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;A co-occurrence analysis is performed on certain keywords, such as "here", location-related words and direction-related words. The results show that gestures and speech are correlated closely. For example, "here" was during a gesture phase in a probability of 83%.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;They then use speech to improve the performance of the gesture recognition. The four tested sequences show that, the recognition accuracy of three increases in around 10% with the help of spoken keywords.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;I like this paper's idea, even though it's just a case study. The results from their co-occurrence between gestures and speech can help people, to some degree, to design a real gesture/speech based HCI although they didn't come up with some principles or rules for the design. But, for this case study itself, I think it would be better if they could test it on more sequences since four testing data might not be very convincing.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-5726013651728711045?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/5726013651728711045/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=5726013651728711045' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/5726013651728711045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/5726013651728711045'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/toward-natural-gesture-speech-hci-case.html' title='Toward Natural Gesture Speech HCI A Case Study of Weather Narration'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-6142416240243060579</id><published>2008-04-28T11:37:00.001-07:00</published><updated>2008-05-11T09:51:33.531-07:00</updated><title type='text'>Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition</title><content type='html'>&lt;p&gt;&lt;/p&gt;&lt;p&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper introduces a HMM-based gesture recognition library, called Georgia Tech Gesture Toolkit, which leverages Cambridge University's speech recognitiontoolkit, HTK. It abstracts the lower level details of the HMM process and allows users to focus instead on high level gesture recognition concepts. Georgia Tech Gesture Toolkit provides users with tools for preparation, training, validation and recognition. It also provides tools allowing novice users to automatically generate models with different topologies.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;All the data put into this library must be annotated by the user, and each gesture is modeled using a separate HMM. In addition, GT2K accepts a rule-based or stochastic grammar to make use of knowledge about the structure of data. It provides two kinds of traning/validation techniques: cross-validation and leave-one-out validation. &lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper also shows four applications of GT2K. The first one is "Gesture Panel", which provides a gesture recognition in automobiles to let users control a radio. It employs a black and white camera and a grid of 72 infrared lights. It has an recognition accuracy of 99.20%. The second is for blink pattern recognition, named "Prescott". It aims to use "blinkprint" as a way to identify people in a restricted area. The next system, "TeleSign", is a sign language recognition system for mobile environments. It achieved an accuracy of 90.48%. The fourth application is recognizing human activities, such as sawing, hammering, drilling, etc, in a workshop. It achieved an accuracy of 93.33%.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;I welcome this kind of libraries since it hides low-level details of HMM designs, which makes it easy to build prototype systems. And I also like the idea of specifying grammars to define data structures, thus helping recognize data streams. However, since it uses HMMs, GT2K may suffer the same problems as HMM does. For example, it may not be able to deal with data set having a large number of categories, and may have low accuracy on un-segmented data.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-6142416240243060579?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/6142416240243060579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=6142416240243060579' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6142416240243060579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6142416240243060579'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/georgia-tech-gesture-toolkit-supporting.html' title='Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-2786561165003029204</id><published>2008-04-28T11:35:00.002-07:00</published><updated>2008-05-11T09:51:41.763-07:00</updated><title type='text'>A Survey of Hand Posture and Gesture Recognition Techniques and Technology</title><content type='html'>[Summary]&lt;br /&gt;&lt;br /&gt;This paper summerizes existing common algorithms for hand posture and gesture recognition, and discusses possible applications. The author classifies the techniques into three categories: Feature extraction, statistics and models; Learning algorithms; and Miscellaneous techniques.&lt;br /&gt;&lt;br /&gt;Template matching can be used in both glove-based and vision-based solutions. It has two parts. The first is to create the templates by collecting data values for each posture in the posture set. The second part is to compare the current sensor readings with the given set of templates to find the posture template most closely matching the current data record. Template matching is simple to implement, and accurate for small set of postures, but it is not suited for hand gestures. Feature extraction method analyzes low-level information from the raw data to produce higher-level semantic information and then used higher-level information to recognize postures and gestures. It can recognize both postures and gestures. Performing PCA on images can be used to recognize 25 to 35 kinds of postures, but it requires training by more than one person for accurate results. Neural Networks can be used to recognize relatively larger posture and gesture data set. The data could be either from data-glove or from images. It requires adequate training to ge high accuracy. One disadvantage of neural network is that it is hard to determine which configuration is best without implementing them. Hidden Markov Models are trained separately for each gesture class, and then their probabilities are evaluated for each new gesture. It can be used in either a vision-based or glove-based solution, and can recognize relatively larger data sets. Instance-based learning uses k-nearest neighbors to determine the category of a new posture. It requires more time and memory space, only works on postures.&lt;br /&gt;&lt;br /&gt;Then the author discussed possible applications: Sign Language, Gesture-to-Speech, Presentations, Virtual Environments, Television Control, 3D Modeling, Multimodal Interaction and Human/Robot Manipulation and Instruction.&lt;br /&gt;&lt;br /&gt;[Discussion]&lt;br /&gt;&lt;br /&gt;First, I think it would be better if the author separate gestures and postures into two parts. Because they are really very different in terms of data and algorithms. Like HMM can only be used to recognize time-series data, rather than just the static postures. Second, the accuracy of each algorithm the auther had in this paper didn't really give much information to me which algorithm is better since all the testing are based on different data set and different classes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-2786561165003029204?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/2786561165003029204/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=2786561165003029204' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/2786561165003029204'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/2786561165003029204'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/survey-of-hand-posture-and-gesture.html' title='A Survey of Hand Posture and Gesture Recognition Techniques and Technology'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-8444339753465828411</id><published>2008-04-28T11:35:00.001-07:00</published><updated>2008-05-11T09:51:48.469-07:00</updated><title type='text'>A Dynamic Gesture Interface for Virtual Environments Based on Hidden Markov Models</title><content type='html'>&lt;p&gt;[Summary]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper adopts Hidden Markov Model to recognize continuous dynamic gestures.The gesture data are collected by CyberGlove giving 20 degree of freedoms. They change a gesture represented by a 20 x t vector to a 20 x 1 vector by perform standard deviation on each dimension during time. They claim this representation will be helpful to solve the spotting problem which which is the task of segmenting meaningful gesture patterns from non-gesture parts in a continuous sequence of hand motions.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Three simple gestures are used to control the rotation of a cube. Each type of gestures are trained with one HMM that has 20 hidden states. But no testing accuracy is given in this paper.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;[Discussion]&lt;/p&gt;&lt;p&gt;&lt;br /&gt;First, HMM is designed for time-series, rather than static data. If standard deviation can separate the data set, why not just use a very simple linear classifier to do this recognition task. Second, their experiment said almost nothing about their method. No testing results, very simple testing data set (only three types).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-8444339753465828411?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/8444339753465828411/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=8444339753465828411' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8444339753465828411'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/8444339753465828411'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/dynamic-gesture-interface-for-virtual.html' title='A Dynamic Gesture Interface for Virtual Environments Based on Hidden Markov Models'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-453291718524769926</id><published>2008-04-21T21:46:00.001-07:00</published><updated>2008-05-11T09:51:59.545-07:00</updated><title type='text'>Real-time Locomotion Control by Sensing Gloves</title><content type='html'>&lt;strong&gt;[Summary]&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;This paper proposed an intuitive character control using data gloves. Users are allowed to control humans or animals (e.g.: dogs) in real-time. A P5 glove is used to collect hand gestures. Then the collected hand gestures are mapped to the locomotion of 3D characters at runtime via a mapping function.&lt;br /&gt;&lt;br /&gt;The method we propose in this research can be divided into the calibration stage and the control stage. In the calibration stage, the mapping function that defines the relationship between the motion of the fingers and the character is generated by mimicing the reference character motions using the hand. In the control stage, the player performs a new movement by the hand to generate a new motion.&lt;br /&gt;&lt;br /&gt;They tested this system with two different characters: the human and the dog. A human walking and a dog trotting were used as reference motion to build the mapping functions. After calibration, the user can perform new motions by moving the index and middle fingers. In addition, the hopping motion is tested in a 3D environment which requires the player to control a robot to jump over obstacles.&lt;br /&gt;&lt;br /&gt;Four users performed the same task controlling a human character to run through the maze and reach the goal as quick as possible without hitting the walls and obstacles. The results showed that the average time needed to accomplish the task was longer and the number of collisions is less when using the data glove.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;[Discussion]&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;First, I'd like to say that this hand gesture-based character control is very intuitive. This recalled me that I used this kind of gestures to mimic the character motion when I was a child.&lt;br /&gt;&lt;br /&gt;However, it would have very limited controllability, since our hands have much less joints and degree of freedoms than the full human body. And that is why it can only generate some coarse locomotions. In other word, it may generate walking, running and hopping with different step sizes and speeds, but there's no way for this interface to produce more detailed motions, such as upper-body motions.&lt;br /&gt;&lt;br /&gt;And also, I think keyboard would be a better interface for controlling game characters since this data glove interface would be really tiring to use. Moreover, there would be another issue which is how to map the physical space to the virtual space.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-453291718524769926?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/453291718524769926/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=453291718524769926' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/453291718524769926'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/453291718524769926'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/real-time-locomotion-control-by-sensing.html' title='Real-time Locomotion Control by Sensing Gloves'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-5113878472235645912</id><published>2008-04-21T10:48:00.001-07:00</published><updated>2008-05-11T09:52:07.539-07:00</updated><title type='text'>Wiizards 3D Gesture Recognition for Game Play Input</title><content type='html'>&lt;strong&gt;&lt;/strong&gt;&lt;br /&gt;&lt;strong&gt;[Summary]&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;This paper presents a two player zero-sum game called Wiizards, using realtime user-performed gesture as input. The game allows users to cast spells, including three kinds: Actions, Modifiers and Blockers. The gestures are captured by a Wiimote's accelerometers. A sequence of a 3-dimensional accelerometer vector is collected when the user perform gestures.&lt;br /&gt;&lt;br /&gt;They then just put the collected data into a set of learned Hidden Markov Models to test the probability of the input data being in each of HMM, thus classifying the gesture by comparing the probability scores. They gathered training data from 7 different users. Each user was presented with images of the gestures from the game, and performed each gesture over 40 times. HMM was created with the data from all of the users. A recognition rate of over 90% was achieved with ten states, and 93% recognition rate with 15 states. But the learned HMMs can only get 50% accuracy when tested on the gestures collected from users that were not in the training performers.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;[Discussion]&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;First, I am wondering how they did the segmentation on a stream of gesture data. Is there any button used to seperate them when performing gestures? Second, I don't how novel this paper was, since HMM-based hand gesture recognition had been widely used. But, at least, I would like to say this sepcific game application is sort of interesting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-5113878472235645912?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/5113878472235645912/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=5113878472235645912' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/5113878472235645912'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/5113878472235645912'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/wiizards-3d-gesture-recognition-for.html' title='Wiizards 3D Gesture Recognition for Game Play Input'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-7835666296007272167</id><published>2008-04-21T00:38:00.001-07:00</published><updated>2008-05-11T09:52:34.083-07:00</updated><title type='text'>The 3D Tractus: A Three-Dimensional Drawing Board</title><content type='html'>&lt;strong&gt;[Summary]&lt;/strong&gt;&lt;br /&gt;&lt;strong&gt;&lt;/strong&gt;&lt;br /&gt;This paper tries to build a 3D-drawing system to let users draw 3D curves or models directly in the physical space. The drawing system consists of two components: a tablet PC and a table that can be moved up and down. The user is allowed to draw with a pen on the tablet PC. And the depth of the drawing is control by the height of the moving table which has a height sensor to measure the actual physical depth. On the software side, a overview window is provided to let users know what they've drawn.&lt;br /&gt;&lt;br /&gt;They also tried different visual cues to show the drawn curves: the gray scale intensity, the color scale intensity. But, finally, they chose to not show the parts above the interaction surface. They also considered different projection: orthographic and perspective. They chose perspective projection. A deletion tool is also provided to users.&lt;br /&gt;&lt;br /&gt;Three people were tested on the 3D Tractus. And 4 drawing results are shown: gum package, Aibo Bone, game controller and stuffed animal.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;[Discussion]&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;It's hard to imagine to draw on a surface and, at the same time, the surface is moved up and down. I don't think that would be easy to use, because it probably needs some "skills" to get a 3D drawing that you really want. Just imagine how you draw a straight line that is neither perpendicular nor parallel to the drawing surface. After taking look at the drawing results, especially the 'stuffed animal', I don't think this system would be practical at all.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-7835666296007272167?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/7835666296007272167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=7835666296007272167' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/7835666296007272167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/7835666296007272167'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/3d-tractus-three-dimensional-drawing.html' title='The 3D Tractus: A Three-Dimensional Drawing Board'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7226341358299927342.post-6273546467474510989</id><published>2008-04-20T21:50:00.000-07:00</published><updated>2008-04-20T21:51:26.773-07:00</updated><title type='text'>Taiwan sign language (TSL) recognition based on 3D data and neural networks</title><content type='html'>&lt;p&gt;&lt;br /&gt;&lt;strong&gt;[Summery]&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;This paper performs Taiwan sign language recognition using neural networks. Only 20 static postures are used as reference examples. The training and testing data are collected by the Vicon system, which is an optical capture system with multiple cameras capturing the 3D position of reflective markers attached on performers' one hand.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Fifteen geometirc distances are adopted as the feature representation of different hand gestures. For example, the distances between the finger tip and the palm, the distances between finger tips. The posture data are collected from 10 students, each performing the 20 hand gestures 15 times. And all the performed gestures are started with gesture '0' and ending with the assigned gesture.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;A back-propagation neural network was adopted in their recognition. It has 15 neurons in the input layer, 20 neurons in the output layer and two hidden layers. Their results showed that the recognition accuracy rose if the number of the hidden neurons increased. They obtained the highest recognition accuracy of 94.65% when using 250 hidden neurons in each hidden layers. And they said their recognition algorithm was robust because the recognition accuracies on the testing data and on the training data are similar, 94.65% and 98.5% respectively.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;strong&gt;[Discussion]&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;I don't know how novel this paper is. All that the they did is putting posture data into a neural network, and testing on different number of hidden neurons. However, they did use a different feature set (geometric distances), but they even didn't say why they select this feature set. I guess it's only because this is the simplest coordinate-invarient feature set.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7226341358299927342-6273546467474510989?l=kevinhaptics.blogspot.com'/&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kevinhaptics.blogspot.com/feeds/6273546467474510989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=7226341358299927342&amp;postID=6273546467474510989' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6273546467474510989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7226341358299927342/posts/default/6273546467474510989'/><link rel='alternate' type='text/html' href='http://kevinhaptics.blogspot.com/2008/04/taiwan-sign-language-tsl-recognition.html' title='Taiwan sign language (TSL) recognition based on 3D data and neural networks'/><author><name>Kevin Wei</name><uri>http://www.blogger.com/profile/17730823343016542169</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='17482539782253888880'/></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry></feed>