Gestural Interaction Survey

This website contains the tabular description of a literature review about the usage of gestures in a human machine interaction (HMI) scenario.

This literature review refers to a new gesture taxonomy build upon state-of-the-art gesture taxonomies. As a preliminary note, we stress out that our taxonomy is focused on the issues that each gesture characterisation implies, where no sociological connotation is involved. In the discussion that follows, we first define the main characteristics, and then we analyse how they affect the problem statement. We consider four features, which we refer to as effect, time, focus, and space.

Effect

The effect is aimed at describing how a gesture is going to affect the machine the human is interacting with. According to their effect, gestures can be either continuous or discrete. The information yield by a continuous gesture is mapped at each time instant to a change in the interface state, while for discrete gestures such change is atomically associated with the whole gesture. A gesture effect has also implications on the system state, which should be continuous or discrete according to the selected kind of gestures. If we consider a parallelism with the mouse-keyboard interface, continuous gestures are mouse movements while discrete gestures are the keystrokes.

Time

In continuity with state-of-the-art taxonomies, the time characteristic classify gestures in static or dynamic. However, differently from what is typically postulated in the literature, where phases are only foreseen for dynamic gestures, we organise all gestures in three phases, i.e., preparation, stroke and retraction. A classification of a gesture as static or dynamic is aimed at describing the human behaviour during the stroke phase. Therefore, in our case, a person performing a static gesture first moves to the desired pose (preparation), then keeps a static pose for an arbitrary amount of time (stroke), and finally returns to the rest position (retraction).

Our taxonomy enforces that static gestures cannot be considered continuous. In fact, although a static gesture per se may be considered continuous, it is not possible to associate a static gesture to a continuous system state. To make this possible, it would be necessary to define an infinite amount of postures for each possible instance element of the continuous system state, and this is not only impractical but de facto impossible.

Focus

In the taxonomy, focus describes which body parts are relevant for a gesture. The name of this characteristic has been chosen to highlight the fact that the relevance of a body part is determined a priori by the HMI requirements, and it does not depend on the specific action. For example, in an application whereby the focus is on the arm, whichever gesture is performed by the hand is ignored. The focus characteristic does not divide gestures in distinct classes, but describes each gesture using the names of the relevant body parts. Referring again to our gesture definition, the focus on a specific body part implies that we are interested only in the status of the joints relevant to that body part, e.g., in an hand gesture the interested joints are the ones modelling the wrist, essentially. For this reason, a waving gesture is first of all an arm gesture, since the motion is generated by arm joints, and can be considered a combined arm-hand gesture if we are interested even in the status of the wrist. It is noteworthy that focus can also model gestures referring to multiple, or even non directly connected, body parts, e.g., bi-manual gestures.

Space

The last gesture characteristic we include, namely space, determines whether the meaning associated with a gesture is influenced by the physical location of the body part performing it. According to this distinction, gestures can be spatially related or unrelated. We have previously seen how the discrete gesture of a mid-air virtual button pressure can be spatially related. Similarly, a manipulative gesture for dragging and dropping a virtual object is spatially related, however a manipulative gesture for controlling the velocity of a mobile robot by means of the arm inclination and rotation, as done in, is spatially unrelated.

Problem Statement

From an operational and engineering perspective the Gestural Interaction for User Interfaces (GI-UI) problem statement is structured as the interplay among three, interrelated, sub-problems, i.e., sensing, data processing and system reaction. Although the structure of the problem statement is fixed and well-defined, i.e., sensing influences data processing, which is mapped to a certain system reaction, the design choices of sub-problems are tightly related to the gestures an HMI interface can consider.

Sensing

The kind of sensors used to collect data about gestures depend on the particular application requirements, e.g., privacy, price or computational efficiency. Two conflicting paradigms affect gesture sensing, namely come as you are and wearability. The phrasing come as you are refers to a system design whereby its users should not wear anything to interact with the machine. On the contrary, wearability refers to systems assuming that users are wearing such interaction tools as data gloves or smartwatches, i.e., they bring the HMI interface with them. This distinction is typically used to classify sensing approaches as non image-based and image-based. Non image-based methods include both wearable devices, such as instrumented gloves, wristbands and body suits, and non-wearable devices, using radio frequency or electric field sensing. Image-based approaches are probably one of the most widely explored research fields, as many literature reviews address specifically issues and solutions related to this sensing modality only. However, the list of sensing solutions is wide and not easy to analyse nor classify. Our aim is not to list different approaches or to propose a classification. Instead, what we want to highlight is the influence that the kind of gestures we consider can have in the choice of the sensing device, while taking into consideration that – as we have seen previously – other factors may affect our choice as well. According to our observations, the temporal and the spatial characteristics have a significant influence on the sensing strategy. In fact, static and dynamic gestures can generally be sensed using the same sensors. However, some sensors are more suited for specific kinds of gestures. For instance, in stationary conditions, an accelerometer can be used to determine its inclination with respect to the horizontal plane, therefore making it easy to monitor simple static gestures. Instead, if we consider a dynamic gesture the usage of accelerometers is not enough to track the sensor pose and information from other sensors, such as gyroscopes or magnetometers, should be integrated. Similarly, in order to recognise a spatial gesture, such as a pointing gesture, one must select a sensor allowing for the extraction of spatial information, e.g., a camera. Finally, we can observe that the influence of focus on the sensing strategy is limited, and it is mainly associated with wearable devices. In fact, depending on the gesture focus, the sensor should be placed to have visibility of the movement, e.g., for a gesture involving fingers IMUs should be placed on the fingers and not on the arm.

Data Processing

Many factors can influence data processing. One in particular, however, drastically changes the nature of the problem an HMI designer must solve, i.e., the effect. As a matter of fact, depending whether we consider continuous or discrete gestures, the problem statement completely changes. In the case of continuous gestures, at each instant its representation must be directly associated with a machine reaction, or function. In certain cases, this is possible using raw data. However, data should be processed to extract relevant, possibly semantic features, such as the 2D position of the hand with respect to an image plane. As it is customary, we refer to this procedure as feature extraction. Instead, for discrete gestures feature extraction is part of a more complex problem. Raw data or the extracted features should be analysed to determine the start and the end points of the gesture, and to classify it according to a predefined dictionary, a process usually termed gesture recognition.

Gesture recognition has been described as the process whereby specific gestures are recognised and interpreted. Some studies consider gesture recognition as a three-step process, composed of identification, tracking and classification, or alternatively of detection, feature extraction and classification. Other studies consider it a two-step process, made up of detection and classification. In the literature, the characterisation of the gesture recognition problem is highly influenced by the considered sensors and the target application. Here, we try to give a general definition. The gesture recognition problem is the process that, given sensory data and a dictionary of discrete gestures, determines whether any gesture has occurred, and which one. To this aim, sensory data are supposed to undergo three computational steps, namely pre-processing and feature extraction, detection and classification. In the first phase, raw sensory data are manipulated to de-noise and to extract relevant features. In the detection phase (also referred to as segmentation or spotting), filtered data are analysed to determine the occurrence of a gesture, and its start and end moments. Detection is usually performed using filtering techniques or threshold-based mechanisms. The classification phase determines which gesture present in the dictionary has been performed. Often, classification is probabilistic, and together with the label it returns a confidence value. The literature has explored different approaches for gesture recognition, encompassing purely model-based techniques to machine learning, whereby the adopted techniques are highly dependent on the data source. The order used to discuss the three phases is not binding, especially with reference to detection and classification. In fact, gesture detection can be direct, when it is performed on sensory data, or indirect, when it is performed on classification results.

Other gesture characteristics can have minor effects on data processing. Static and dynamic gestures imply intrinsically different data, since gestures belonging to the former class are not related to time, whereas dynamic gestures are. Therefore, the techniques adopted to process static and dynamic gestures are different. Similarly, the focus of a gesture influences the kind of collected data, or the features that the system should extract, i.e., head or hand gestures extracted from video streams imply a different data processing pipeline. Finally, spatially related gestures require the feature extraction process to consider the position of the body part performing the gesture as a feature.

System Reaction

The way the HMI system responds and acts to a given gesture varies depending on the application. As an example, it may consist of a switch in an interface menu (HCI), or a velocity command for a mobile robot (HRI). Obviously enough, system responses are the results of a mapping between data processing and machine behaviours. This mapping serves as a sort of semantic attribution to gestures in the context of the interaction process. The reaction is to a large extent agnostic to gesture characteristics, since their effects are absorbed by the sensing and the data processing phases. The effect characteristic affects of course system reaction, since continuous and discrete gestures by their own nature implies continuous and discrete system functions, respectively.

Repository content

The repository contains four files:

How to contribute

It is possible to contribute to this survey by submitting a pull request where all the files contained in the repository have been properly updated to include a new reference. Each pull request should involve only one single new article. Contributors will be akcnowledge in the website page.

Authors

Alessandro Carfì, University of Genoa, alessandro.carfi@dibris.unige.it
Fulvio Mastrogiovanni, University of Genoa, fulvio.mastrogiovanni@dibris.unige.it

This work is part of the PhD thesis of Alessandro Carfì (University of Genoa) and it is has been submitted for review to IEEE Transaction of Cybernetics.

Article Sensing Reaction Processing Gestures
Ref Year Sensor(1) Sensor(2) Sensor(3) Problem User Defined Size Effect Temporal Focus Spatial
Kortenkamp et al. 1996 1996 Stereo Monochrome Camera - - HRI Gesture Recognition No 6 Discrete Static Arm Yes
Cipolla et al. 1996 1996 Stereo Monochrome Camera - - HRI Feature Extraction No - Continuous Dynamic Fingers Yes
Starner et al. 2000 2000 Monochrome Camera Infrared Illumination Button - Gesture Recognition No 8 Discrete Continuous Static Dynamic Fingers Arm No
Gesture Recognition Yes 6 Discrete Dynamic (Bi)Fingers Hand No
Yeasin et al. 2000 2000 Camera - - - Gesture Recognition No 5 Discrete Dynamic Fingers Hand No
Kuno et al. 2000 2000 Camera - - HRI Gesture Recognition Yes - Discrete Dynamic (Bi) Hand No
Mu-Chun Su 2000 2000 Electro-mechanical Strain Gauges - - - Gesture Classification No 90 Discrete Dynamic (Bi) Fingers No
Rogalla et al. 2002 2002 Camera - - - Gesture Classification No 6 Discrete Static Fingers Hand No
Ramamoorthy et al. 2003 2003 Camera - - - Gesture Recognition No 5 Discrete Dynamic Fingers Hand No
Chao Hu et al. 2003 2003 RGB Camera - - - Gesture Classification No 6 Discrete Static Fingers Hand No
Hasanuzzaman et al. 2004 2004 RGB Camera - - HRI Gesture Recognition No 8 Discrete Static (Bi) Fingers Arm No
Gesture Recognition No 2 Discrete Dynamic Head No
Mäntryjärvi et al. 2004 2004 Accelerometer Button - - Gesture Recognition No 8 Discrete Dynamic Hand No
Kang et al. 2004 2004 RGB Camera - - HCI Gesture Recognition No 10 Discrete Dynamic (Bi) Arm Torso No
Kela et al. 2006 2006 Accelerometer Button - HCI Gesture Recognition No 8 Discrete Dynamic Hand No
Feature Extraction No - Continuous Dynamic Hand No
Nickel and Stiefelhagen 2007 2007 Stereo Camera - - - Gesture Recognition No 1 Discrete Dynamic Hand Yes
Kim et al. 2007 2007 Proximity Sensors - - - Gesture Recognition No 5 Discrete Dynamic Static Hand No
Bailador et al. 2007 2007 Accelerometer Button - - Gesture Recognition No 8 Discrete Dynamic Hand No
Schlomer et al. 2008 2008 Accelerometer Button - - Gesture Recognition No 5 Discrete Dynamic Hand No
Neto et al. 2009 2009 Accelerometer - - HRI Gesture Classification No 12 Discrete Dynamic Arm No
Gesture Recognition No 2 Discrete Static Arm No
Liu et al. 2009 2009 Accelerometer Button - HCI Gesture Recognition No 8 Discrete Dynamic Hand No
Gesture Recognition Yes - Discrete Dynamic Hand No
Wu et al. 2010 2010 Accelerometer - - HRI Gesture Recognition No 6 Discrete Dynamic Arm No
Akl and Valaee 2010 2010 Accelerometer Button - - Gesture Recognition No 18 Discrete Dynamic Hand No
Ni et al.2011 2011 Motion Capture Button - HCI Feature Extraction No - Continuous Dynamic Arm No
Zhang et al. 2011 2011 Accelerometer EMG - HCI Gesture Recognition No 72 Discrete Dynamic Hand Fingers No
Gesture Recognition No 18 Discrete Dynamic Static Hand Fingers No
Chen et al. 2012 2012 Accelerometer Button Marker Based MoCap - Gesture Recognition No 20 Discrete Dynamic Hand No
Khan et al. 2012 2012 Accelerometer - - - Gesture Classification No 8 Discrete Dynamic Hand No
Gupta et al. 2012 2012 Microphone Speaker - HCI Gesture Recognition No 5 Discrete Dynamic (Bi) Hand No
Ruppert et al. 2012 2012 RGB-D Camera - - HCI Gesture Recognition No 4 Discrete Dynamic Hand No
Feature Extraction No - Continuous Dynamic Hand Yes
Porzi et al. 2013 2013 Accelerometer Touch Screen - HCI Gesture Recognition No 8 Discrete Dynamic Arm No
Lee and Cho 2013 2013 Accelerometer - - - Gesture Classification No 20 Discrete Dynamic Hand No
Murugappan et al. 2013 2013 RGB-D Camera - - - Gesture Recognition No 3 Continuous Discrete Dynamic Static Hand Fingers Yes
Zhou et al. 2014 2014 Accelerometer Camera - - Gesture Classification No 10 Discrete Dynamic Hand No
Iengo et al. 2014 2014 RGB-D Camera - - - Gesture Recognition No 5 Discrete Dynamic Arm No
- Gesture Recognition No 12 Discrete Dynamic Full Body No
HRI Gesture Recognition No 3 Discrete Dynamic Arm No
Lu et al. 2014 2014 EMG Accelerometer - HCI Gesture Recognition No 4 Discrete Static Fingers Hand No
Gesture Recognition No 15 Discrete Static Dynamic Fingers Arm No
Lee et al. 2014 2014 Stereo Dynamic Vision Sensor - - - Gesture Recognition No 11 Discrete Dynamic Hand No
Liu et al. 2014 2014 RGB-D Camera 6-axis IMU - - Gesture Classification No 5 Discrete Dynamic Static Arm No
Yin et al. 2014 2014 Accelerometer Button - - Gesture Recognition No 8 Discrete Dynamic Hand No
Ohn-Bar and Trivedi 2014 2014 RGB-D Camera - - - Gesture Recognition No 19 Discrete Dynamic Fingers Hand No
Duffner et al. 2014 2014 6-axis IMU - - - Gesture Classification No 9 Discrete Dynamic Hand No
Du et al. 2014 2014 Infrared Camera - - HRI Gesture Recognition No 2 Continuous Discrete Dynamic Static Hand Fingers Yes
Xie et al. 2015 2015 Accelerometer - - - Gesture Recognition No 12 Discrete Dynamic Fingers No
Caramiaux et al. 2015 2015 Infrared Camera - - HCI Gesture Recognition No 3 Discrete Dynamic Hand No
Marqués and Basterretxea 2015 2015 Accelerometer - - - Gesture Classification No 7 Discrete Dynamic Hand No
Cicirelli et al. 2015 2015 RGB-D Camera - - HRI Gesture Recognition No 10 Discrete Dynamic Arm No
Kılıboz and Güdükbay 2015 2015 Magnetic 3D Position Tracker - - HCI Gesture Recognition No 11 Discrete Dynamic Hand No
Hsu et al. 2015 2015 9-axis IMU - - - Gesture Recognition No 10 Discrete Dynamic Hand No
Gesture Recognition No 26 Discrete Dynamic Hand No
Gesture Recognition No 8 Discrete Dynamic Hand No
Georgi et al. 2015 2015 EMG 6-axis IMU Button - Gesture Recognition No 12 Discrete Dynamic Fingers Hand No
Wang et al. 2015 2015 RGB-D - - - Gesture Classification No 10 Discrete Static Fingers No
Ramani Vinayak 2015 2015 RGB-D Camera - - HCI Feature Extraction No - Continuous Dynamic Hand Fingers Yes
Infrared Camera
Li et al. 2015 2015 9-axis IMU - - - Gesture Recognition No 11 Discrete Dynamic Hand No
Ahmad et al. 2015 2015 Infrared Camera - - - Feature Extraction No 1 Discrete Dynamic Fingers Yes
Galka et al. 2016 2016 7 Accelerometer - - - Gesture Classification No 40 Discrete Dynamic Fingers Hand Arm No
Moazen et al. 2016 2016 Accelerometer - - - Gesture Recognition No 5 Discrete Dynamic Arm No
Gesture Recognition No 5 Discrete Dynamic Arm No
Hong et al. 2016 2016 Accelerometer Button - - Gesture Recognition No 9 Discrete Dynamic Hand No
Gesture Recognition No 7 Discrete Dynamic Hand No
Wen et al. 2016 2016 6-axis IMU - - - Gesture Recognition No 5 Discrete Dynamic Fingers No
Xie and Cao 2016 2016 Accelerometer Button - - Gesture Recognition No 8 Discrete Dynamic Hand No
Gesture Recognition No 16 Discrete Dynamic Hand No
Gupta et al. 2016 2016 6-axis IMU - - HCI Gesture Recognition No 6 Discrete Dynamic Hand No
Gupta et al. 2016 2016 RGB-D Camera Stereo Infrared Camera - - Gesture Classification No 25 Discrete Dynamic Fingers Hand No
Lai et al. 2016 2016 RGB-D Camera - - HRI Gesture Recognition No 1 Discrete Static (Bi) Arm Yes
Feature Extraction No - Continuous Dynamic Arm No
Molchanov et al. 2016 2016 RGB-D Camera Stereo Infrared Camera - HCI Gesture Recognition No 25 Discrete Dynamic Fingers Hand No
Zhou et al. 2016 2016 RGB Camera - - - Gesture Classification No 14 Discrete Static Hand Fingers No
Haria et al. 2017 2017 RGB Camera - - HCI Gesture Recognition No 6 Discrete Static Fingers No
Gesture Recognition No 1 Discrete Dynamic Hand No
Coronado et al. 2017 2017 Accelerometer - - HRI Feature Extraction No - Continuous Dynamic Arm No
Shin et al. 2017 2017 Epidermal Tactile Sensor - - - Gesture Classification No 5 Discrete Static Fingers Hand No
Mendes et al. 2017 2017 Infrared Camera - - HRI Gesture Recognition No 12 Discrete Static Fingers Hand No
Gesture Recognition No 10 Discrete Dynamic Hand No
Naguri and Bunescu 2017 2017 Infrared Camera - - - Gesture Recognition No 6 Discrete Dynamic Fingers Hand No
Oyedotun and Khashman 2017 2017 RGB Camera - - - Gesture Classification No 24 Discrete Static Fingers Hand No
Bao et al. 2017 2017 RGB Camera - - - Gesture Classification No 7 Discrete Static Fingers No
Liang et al. 2017 2017 RGB-D Camera - - - Feature Extraction No - Continuous Dynamic Hand Yes
Bolano et al. 2018 2018 RGB-D Camera - - HRI Feature Extraction No - Continuous Dynamic Hand Yes
Buddhikot et al. 2018 2018 RGB Camera - - HCI Gesture Recognition No 6 Discrete Dynamic Fingers Hand No
Zeng et al. 2018 2018 Infrared Camera - - - Gesture Classification No 10 Discrete Dynamic Fingers No
Gesture Classification No 26 Discrete Dynamic Fingers No
Hu et al. 2018 2018 RGB Camera - - - Gesture Recognition No 5 Discrete Dynamic Fingers Arm No
Ma et al. 2018 2018 RGB-D Camera - - - Gesture Recognition No 8 Discrete Dynamic (Bi) Hand Arm No
Carfi et al. 2018 2018 Accelerometer - - - Gesture Recognition No 6 Discrete Dynamic Arm No
Kim et al. 2019 2019 Accelerometer Button - - Gesture Recognition No 10 Discrete Dynamic Hand No
Islam et al. 2019 2019 RGB Camera - - HRI Gesture Recognition No 8 Discrete Static Fingers No
Huang et al. 2019 2019 RGB-D Camera - - - Gesture Recognition No 10 Discrete Dynamic Hand Arm No
- Gesture Recognition No 10 Discrete Dynamic Hand Arm No
HCI Gesture Recognition No 10 Discrete Dynamic Hand Arm No
HCI Gesture Recognition No 10 Discrete Dynamic Hand Arm No
HCI Gesture Recognition No 15 Discrete Continuous Static Dynamic Fingers Arm No
Park and Lee 2019 2019 Infrared Camera - - HRI Feature Extraction No - Continuous Dynamic Hand No
HCI Gesture Recognition No 1 Discrete Continuous Static Dynamic Finger Hand Yes
Pomboza-Junez et al. 2019 2019 EMG - - HCI Gesture Recognition No 4 Discrete Static Fingers Hand No
Avola et al. 2019 2019 Stereo Infrared Camera - - - Gesture Classification No 30 Discrete Static Dynamic Fingers Hand No
Feng et al. 2019 2019 Stereo Infrared Camera - - - Gesture Classification No 10 Discrete Static Fingers No
Zhang et al. 2019 2019 EMG - - - Gesture Recognition No 5 Discrete Dynamic Fingers Hand No