Скачать 42.47 Kb.
Multimodal interaction on mobile devices
Mobile devices are an integral part of our lives and we use them for a variety of purposes ranging from communication to entertainment. As the number of features available on mobile devices increase, users spend more visual attention on interacting with their device and therefore less on their day-to-day activities. Alternative methods of interaction are necessary to enable users to use their device at the same time lets them continue doing their regular tasks. So the aim is to move towards interfaces for mobile devices that provide a means for non-visual interaction. An example of such an interface is Shoogle, which allows users to interact with their mobile devices through their auditory and haptic senses alone and when required. The user scenarios provide examples of how and where the Shoogle system would be used and the prototypes display how such an interface has been implemented on PDAs and standard Nokia phones.
Multimodal, vibrotactile, sonification, haptic
ACM Classification Keywords
H.5.2 User Interfaces: Haptic I/O; H.5.2 User Interfaces:
Interaction Styles; H.5.2 User Interfaces: Auditory (non-speech) feedback
As mobile devices become more complex, they come with a variety of functionality and features which demand more of the user’s time and attention. Most of these features use the small screen for visual display and feedback and take input from the small keypad or through a stylus or fingers for devices that have touch screens (Hinckley, Pierce, Horvitz, & Sinclair, 2005, p. 31-32), all of which require a significant amount of attention from the user while interacting with the device. This however is not very convenient when a user is busy doing real world tasks such as driving, walking, eating, attending meetings or important social events and so on where the user cannot afford to allocate majority of his or her attention to the mobile device (Hinckley et al., 2005, p. 31-32). Not only this, but audio notifications, alerts or alarms can also be disruptive and distracting to the user and in some situations socially inappropriate (Williamson, Murray-Smith, & Hughes, 2007a, p. 121).
As identified by D. Ley (2007, p. 64) in “Ubiquitous Computing”, the idea is to move towards a more user-centred computing that adapts to our needs and preferences and remains in the background until required. It is useful and necessary to aim for such a computing interface for mobile devices as well which will demand less attention from the user, thus allowing them to also carry on with their day-to-day tasks. Therefore alternative modes of interaction with mobile devices have been explored to overcome the restrictions posed with the small screens, keypads and touch based interactions. Input to mobile devices through speech recognition is not very ideal in noisy environments as it can reduce the accuracy of recognition and can also be inappropriate depending on where the user is (Linjama, Häkkilä & Ronkainen, 2005). Input through visual recognition of human gestures and actions from the video captured a video camera has been discussed by R. O’Hagan and A. Zelinsky (1997) and these required complex algorithms and processing in order to make it a robust input mode. Again such an input required a fair amount of the user’s attention and the gestures themselves have to be less conspicuous so as to not draw other people’s attention towards the user, if being used as an input mode to mobile devices.
Getting a mobile device to detect and accept haptic gestures like shaking, tilting and tapping as input, only requires single handed interaction and such gestures do not demand significant amount of a user’s attention. Along with such gestures, the use of non-speech audio and vibrotactile feedback as device output makes the whole interaction non-visual since the user is able to feel and hear the feedback without having to take their attention off the task he or she is currently engaged in (Pirhonen, Brewster & Holguin, 2002, p. 291). A system that allows such type of multimodal interaction with mobile devices has been implemented by J. Williamson, R. Murray-Smith and S. Hughes (2007a) which is known as Shoogle. It is an interactive interface for mobile devices that allows users to use haptic gestures such as shaking or tapping as input to the device. The user can hear and feel the contents of the device moving and bouncing around in the device on applying an input gesture (Williamson et al., 2007a, p. 121).
The feedback the user perceives is rich and realistic (Williamson et al., 2007a). Shoogle provides a complete non-visual interface (Williamson et al., 2007a), for users thus allowing them to also carry on with their day-to-day tasks without much interruption. The Shoogle system incorporates the following concepts namely active sensing techniques, sonification of device content and their impact, inertial sensing of haptic gestures using sensing hardware, simulation of the physics and realistic non-speech audio and vibrotactile display and feedback. This report discusses works similar or related to the Shoogle, followed by an overview of the system, then a discussion about the various concepts it incorporates, a conclusion on the findings and finally future work for Shoogle.
A similar prototype to Shoogle was implemented by Y. Sekiguchi, K. Hirota and M. Hirose (2005) which was called ‘Ubiquitous Haptic device’. The idea was that when a user shakes the device he/she can feel the sensation of a virtual object moving inside the device. The device, for example, could represent a mobile phone and so the object inside the device could represent an email received on mobile phone. The prototype was quite simple with a custom built box containing a simple acceleration sensor and two linear solenoids to create the impact and sensation of the presence of an object inside the device. The simulation to drive the collision of the object with the inner walls of the device was implemented on a desktop PC (Sekiguchi et al., 2005) and for that the device had to be connected to a desktop PC to be able to function well, which meant that it was not mobile any more. Another similar research work known as ‘ActionCube’ was presented by J. Linjama, P. Korpipää, J. Kela and T. Rantakokko (2008) which was meant for phones which have sensors that can sense tactile input to the device. The aim of this interactive tutorial application was to introduce users who were new to the idea of multimodal interaction to such an interaction with mobile devices through the use of haptic gestures such as tilting, tapping, turning and shaking. The screen displayed a visual animation of a cube bouncing around inside the device coupled with vibration as the tactile feedback to the user when he or she applied a haptic gesture as input. However, the focus of ‘ActionCube’ application was to provide minimalist visual and tactile effects (Linjama et al., 2008).
The ‘ActionCube’ itself was based on another prototype that involved the use of tap input to drive the motion of a virtual ball inside a mobile device which was presented by J. Linjama, J. Häkkilä and S. Ronkainen (2005). Taping has also been used as input to mobile devices as shown by the experiment conducted by S. Ronkainen, J. Häkkilä, S. Kaleva, A. Colley and J. Linjama (2007), where tapping gesture was used to interact with mobile device to carry out tasks such as muting the phone, snoozing the alarm, scrolling, previewing messages, interacting with the music player etc. Another common form of input gesture is tilting the device. Both the works workds by P. Eslambolchilar and R. Murray-Smith (2004) and (2008), discuss the use of tilt as the user input and non-speech audio and haptic feedback as output for mobile devices using sensors for scrolling and zooming for browsing and navigation tasks.
Overvide of Shoogle System
Shoogle provides the users of mobile devices with an interface that promotes multimodal interaction with the device. The system is based on a metaphor where a number of balls can be felt bouncing around in a box when the box is shaken in order to hear and feel its content. A user can use gestures such as shaking, tilting or tapping the mobile device to find out about the presence of SMS messages, email messages or say the amount of battery life remaining in a mobile device all without actually having to look at it (Williamson et al., 2007a, p. 121). When the user applies such an action, immediately the user can hear as well feel the presence of objects moving around inside the device indicating to the user the presence of, for example SMS messages. Thus the feedback is strictly related to the sort of action the user applies to the device and so the user needs to know which action will provide feedback about the type of information they want to know so that they can interpret the output in that context. Some of the user scenarios presented in the research paper (Williamson et al., 2007a) are as follows:
Eyes-Free Message Inbox (Williamson et al., 2007a, p. 122)
In this scenario the user shakes the phone lightly without looking at it and the contents of the inbox turn into virtual balls and bounce around inside the device. One ball sounds deep and metallic and feels heavier than the rest indicating that a long txt message has been received from a close friend. As messages are received by the phone they turn into balls and drop into the device with the corresponding sound and impact.
Keys in a Pocket (Williamson et al., 2007a, p. 122)
In this scenario the user carries the phone in a pocket. While walking the user can hear and feel objects jangling around in the phone just like keys or coins indicating subtly to the user that a message has arrived.
Liquid Battery Life (Williamson et al., 2007a, p. 122)
Another scenario is where the user wants to know how much battery life is left in the phone and thus shakes the phone rapidly. If the battery is full then the phone feels and
Figure 2 Keys or Coins in a Pocket (Williamson, Murray-Smith, & Hughes, 2007b, p. 2015)
Figure 1 Eyes-Free Message Inbox (Williamson, Murray-Smith, & Hughes, 2007b, p. 2015)
Figure 3 Liquid Battery Life (Williamson, Murray-Smith, & Hughes, 2007b, p. 2015)
ounds like a bucket full of water sloshing around and when there is only a small amount of life left then it feels and sounds as if water droplets are splashing around.
The above scenarios are based on the user’s intuition of how objects such as metallic balls, keys, coins or water would sound and feel if placed inside a container and shaken. These properties of the objects convey the meta-information such as the length of a text message or the amount of battery life left, things which the user is interested in finding out.
The user’s actions applied to the device are sensed by sensing hardware built into the device. The hardware detects the different types of user input gestures and accordingly generates the non-speech audio and vibrotactile output. So the user can hear the audio cues, for example, of the balls bouncing and colliding inside the device and he or she can also feel the impact on their hand which is holding the mobile device, created by the objects’ motion through vibrations generated by the hardware inside the device. Such a combination of output is rich in information and realistic for the user.
The Shoogle interface has been implemented on HP iPaq 5550, which is a PDA, with the help of MESH inertial sensing hardware which provides a transducer, internal vibration motor, accelerometer, gyroscope, magnetometer
which are used to generate the vibrotactile display and
feedback. Shoogle has also been implemented on standard Nokia 6660 phone using the custom-built Bluetooth SHAKE (Sensing Hardware for Kinaesthetic Expression) wire-less inertial sensor pack which also provides an accelerometer, magnetometer, channels for input and sensing and an internal vibrating motor.
DISCUSSION OF CONCEPTS
Active sensing techniques
In “Active Perception”, R. Bajcsy (1998) stated that “the problem of Active Sensing can be stated as a problem of controlling strategies applied to the data acquisition process which will depend on the current state of the data interpretation and the goal or the task of the process.” In relation to this idea, the Shoogle system only provides the necessary feedback about the content inside the device when the user actually applies the haptic gestures to the device or in the case when new content such as new SMS messages or emails arrive. That is, the feedback is dependent on the current state of the system as well as the input gesture sensed by the system (Williamson et al., 2007a). One of the scenarios demonstrates that the user’s natural motions are also sensed by the device for example when the user is walking around with the mobile device in the pocket; his or her gait motion is also sensed (Williamson et al., 2007a, p. 121). This implies that system is actively waiting for such events to happen and immediately senses them as and when they happen and concurrently provides the corresponding feedback to the user. Thus the system takes into account the context of these events and user inputs to provide the related feedback and so avoiding unnecessary distraction and interruption to the user. In the case where objects drop into the device as messages arrive they do so subtly as opposed to loud alerts or notifications thus minimizing the amount of distraction to the user. The user also negotiates or excites the system with a specific intention, in most cases, and a prior knowledge of what to expect as the output since he or she would already know what gesture or motion they have applied to the device.
Sonification of content and their impact
Shoogle system relies on the user’s intuition, perceptual knowledge and sensation of real world objects such as balls, keys, coins, liquids etc. and of the sort of impact that these would create if placed inside a container and moved around. ‘Model-based Sonification’ is a concept that was discussed by T. Hermann and R. Ritter (1999) and the idea was to define a ‘virtual physics’ in which the data of any kind is represented by real world objects or materials and thus exhibit the same properties in terms of how they would feel when touched, the sound and the impact they produce when they interact with each other as well as with their surrounding environment when they get excited by external forces.
This concept has been well incorporated into the Shoogle system, where the SMS messages or email messages are represented by virtual balls. The meta-data of the messages such as the length of the messages are represented respectively by the weight and the type of impact produced by the balls. For example in the ‘Eyes-free Message box’ (Williamson et al., 2007a, p. 121) scenario long messages were represented by heavy metallic balls, and shorter ones with more lighter and glassy balls. And the user perceives all this information by shaking the device which generates these virtual balls depending upon the messages that are received or stored in the inbox and their meta-properties. The shaking causes them to bounce around in the device which represents a box and this generates sound and impact.
In the ‘Keys or coins in a Pocket’ (Williamson et al., 2007a, p. 121) scenario, jangling of virtual keys or coins was used to represent the arrival of messages and the environment was that of a user’s pocket. And in the ‘Liquid battery life’ (Williamson et al., 2007a, p. 121) scenario, the amount of battery life left in a device is represented by the amount of water left inside a bucket. So the user can sense the impact of water sloshing around inside a bucket when he or she shakes the device and depending on the sound and impact generated by the device the user can figure how much battery is left.
Inertial sensing using sensor hardware
Sensing of user’s gestures and motions reliably and accurately is very important in order for the system to be able to provide the corresponding feedback (Williamson et al., 2007a, p. 122). In their research, K. Hinckley, J. Pierce, M. Sinclair and E. Horvitz (2000) have stated that a linear accelerometer can detect linear accelerations caused by the shaking motion applied to a device. Shoogle prototypes both incorporate accelerometers for sensing the user’s haptic gestures and motions. The MESH inertial sensing hardware for PDAs and the SHAKE inertial sensor pack for standard mobile phones both contain a number of sensors for sensing movements and motion applied to the device to which they are attached. They also come with magnetometers which calculate the position of the device relative to the co-ordinates of the world. These hardware technologies can be easily incorporated into mobile devices as can be seen through the Shoogle prototypes and thus allowing them to have more interactive interfaces, providing exciting and interesting features to users.
Realistic audio and vibrotactile feedback
When sonifying the contents of the mobile device to represent real world objects and their interactions with the surrounding environment it is necessary to generate realistic sounding non-speech audio as well as impact so that the users can get a natural feel and sensation of the objects and their containers simulated in the mobile device. The experiments carried out by M. Rath and D. Rocchesso (2005) and H.-Y. Yao and V. Hayward (2006), suggest that simulating realistic behaviour of objects inside a device leads to a more richer and effective user experience than providing static vibration alerts. And therefore this can be achieved through providing audio and vibrotactile feedback coupled tightly with the motion and impact of the objects inside the device. This feedback is directly sensed by the user through his or her auditory and haptic senses and thus leads to a more continuous interaction and feedback scenario.
The Shoogle system provides a very rich and realistic interaction interface for the mobile devices. Shoogle uses a sample-based technique where a number of impact sounds of objects moving around inside containers have been pre-recorded using real world objects and containers (Williamson et al., 2007a, p. 124). Thus the real world impact sounds make it easier for users to identify the contents and their motion. The vibrotactile feedback is generated by the internal vibration motors of the MESH and SHAKE sensing hardware packs. These motors generate waveforms which have unique properties that represent the different kinds of impact types that are generated by the motion for the objects inside specific types of containers (Williamson et al., 2007a, p. 123). The experiment conducted by S. O’Modhrain and G. Essl (2004) involved the recording of the sounds generated by the movements of pebbles in a box and the grabbing actions applied to bags that contained cereal, cornflakes and styrofoam. The distinct properties of the sounds recorded were used to create a granular synthesis engine which was to be used as a haptic controller that would generate the feel and sounds that represent granular events (S. O’Modhrain & G. Essl, 2004, p. 75). This type of granular approach has also been used in Shoogle where the sine waves generated by the vibration motors, inside the mobile device, simulated dense and gritty texture which enhances the impact so that the whole experience seems very realistic to the user (Williamson et al., 2007a, p. 124). However the authors of the Shoogle system state that the draw back with having pre-recorded impact sounds is that it requires a lot more effort and less flexibility (Williamson et al., 2007a, p. 124).
Simulation of the physics
The metaphor that Shoogle is based on is that of a virtual container containing virtual balls that bounce around and collide inside the device. In order to represent this metaphor the underlying physics of the motion of the balls needs to be simulated. The acceleration values detected by the sensors were directly used in the calculations of the balls’ motion. The positioning of these balls in the device was random and was based on Hooke’s law for springs (Williamson et al., 2007a, p. 123). So the simulation of motion applied to the virtual objects inside the device depends on the type of haptic gesture or motion the user applies to the device. These calculations also take into account the intensity and force of the motion of these objects in the different types of containers (Williamson et al., 2007a, p. 124).
Multimodal interaction using haptic gestures like shaking, tilting, tapping and so on provide alternative modes of interaction with mobile devices which help move beyond the restrictions posed by input means such as keypad, touch screen, speech recognition and human body gesture recognition. Shoogle system provides this type of multimodal interaction with mobile devices and has been implemented with the help of inertial sensing hardware technologies such as MESH and SHAKE that provide all the necessary means for detecting the user’s tactile input. They also simultaneously provide corresponding audio and vibrotactile feedback about the contents of a mobile device and their properties which is totally non-visual. The sonification and simulation of the device and its contents to represent real world objects and their impact types with their surrounding environment truly enhances the user experience and making it rich and realistic.
Future improvement to Shoogle would involve applying a variety of audio and vibrotactile transformations to messages to obtain a variety of objects and their motion inside a virtual container based on the meta-data of the SMS messages or email messages such as the language used or the level of positivity or negativity of the content or certain keywords used in the message (Williamson et al., 2007a, p. 124). And simulating a variety of container types will also make the interface more rich and effective. Also the Shoogle system is still in its prototype form and requires more usability studies to be conducted to see how useful and helpful the system is to the end users. The research paper on Shoogle (Williamson et al., 2007a) has not specified how much it would cost to actually create a final commercial product, thus this needs to be known in order to be sure that it is would make a commercially viable product.
Bajcsy, R. (1988). Active Perception. Proceedings of the IEEE, 76(8), 966 – 1005. Retrieved from http://www-inst.eecs.berkeley.edu/~cs294-6/fa06/papers/Bajcsy.active%20Perception.pdf
Eslambolchilar, P., & Murray-Smith, R. (2008). Interact, Excite, and Feel. Proceedings of the 2nd international conference on Tangible and embedded interaction, 131 – 138. Retrieved from http://eprints.pascal-network.org/archive/00003504/01/EslMur08.pdf
Eslambolchilar, P., & Murray-Smith, R. (). Tilt-Based Automatic Zooming and Scaling in Mobile
Devices – a state-space implementation. Lecture Notes in
Computing Science, 3160, 120 – 131. Retrieved from http://www.invensense.com/shared/pdf/zooming_scaling_mobile.pdf
Hermann, T., & Ritter, H. (1999). Listen to your data: Model-based sonification for data analysis. Paper presented at Advances in intelligent computing and multimedia systems Conference, 1999. Retrieved from the Neuroinformatics Group Website: http://ni.www.techfak.uni-bielefeld.de/files/HermannRitter1999-LTY.pdf
Hinckley, K., Pierce, J., Horvitz, E., & Sinclair, M. (2005). Foreground and Background Interaction with
Sensor-Enhanced Mobile Devices. ACM Transactions on Computer-Human Interaction (TOCHI), 12(1), 31 – 52. Retrieved from http://doi.acm.org/10.1145/1057237.1057240
Hinckley, K., Pierce, J., Sinclair, M., & Horvitz, E. (2000). Sensing Techniques for Mobile Interaction. Proceedings of the 13th annual ACM symposium on User interface software and technology, 91 – 100. Retrieved from http://www.cc.gatech.edu/classes/AY2004/cs4470_fall/readings/hinckley-etal-uist2000.pdf
Hughes, S., Oakley, I., & O’Modhrain S. (2004) MESH: Supporting Mobile Multi-modal Interfaces. UIST, 2004.
Retrieved from http://whereveriam.org/work/palpable/162_HughesCopyright.pdf
Ley, D. (2007). Ubiquitous Computing. Emerging Technologies for Learning, 2, 64-79. Retrieved from http://www.canterbury.ac.uk/education/protected/ppss/docs/emerg-tech07-chap6.pdf
Linjama, J., Korpipää, P., Kela, J., & Rantakokko, T. (2008). ActionCube – A Tangible Mobile Gesture Interaction
Tutorial. Proceedings of the 2nd international conference on Tangible and embedded interaction, 169-172. Retrieved from http://doi.acm.org/10.1145/1347390.1347428
Linjama, J., Hakkila, J., & Ronkainen S. (2005). Gesture interfaces for mobile devices - minimalist approach for haptic interaction. Proceedings from CHI Workshop: Hands on Haptics: Exploring Non-Visual Visualisation Using the Sense of Touch. Retrieved from University of Glasgow Website: http://www.dcs.gla.ac.uk/haptic/haptics%20web%20pages_files/Linjama%20et%20al..pdf
O'Hagan, R., & Zelinsky, A. (1997). Finger Track - A Robust and Real-Time Gesture Interface. Proceedings of Australian Joint Conference on Artificial Intelligence. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.3269&rep=rep1&type=pdf
O’Modhrain, S., & Essl, G. (2004). Pebblebox and crumblebag: Tactile interfaces for granular synthesis. Proceedings of the 2004 conference on New interfaces for musical expression, 74 - 79. Retrieved from http://www.suac.net/NIME/NIME04/paper/NIME04_2A02.pdf
Pirhonen, A., Brewster, S., Holguin, C. (2002). Gestural and Audio Metaphors as a Means of Control for Mobile Devices. Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, 291 – 298. Retrieved from http://doi.acm.org/10.1145/503376.503428
Rath, M., & Rocchesso, D. (2005). Continuous sonic feedback from a rolling ball. IEEE MultiMedia, 12(2), 60–69. Retrieved from http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1423935
Ronkainen, S., Häkkilä, J., Kaleva, S., Colley, A., & Linjama, J. (2007).. Tap Input as an Embedded Interaction Method for Mobile Devices. Proceedings of the 1st international conference on Tangible and embedded interaction, 267 – 270.
Retrieved from http://www.cc.gatech.edu/classes/AY2003/cs4803h_spring/readings/james-reischel-mobile-text-input.ps
Sekiguchi, Y., Hirota, K., & Hirose, M. (2005). The Design and Implementation of Ubiquitous Haptic Device. Proceeding from Eurohaptics Conference, 2005, 527 – 528. Retrieved from IEEE Xplore 2.0 database.