Supporting IKIGAI in Older Adults with the QT Robot

Indiana University Bloomington: Weslie Khoo, Natasha Randall, Swapna Joshi, Long-Jing Hsu, Waki Kamino, Shekinah Lungu, Wei-Chu Chen, Manasi Swaminathan, David Crandall, and Selma Šabanovic

Contact Selma at

Toyota Research Institute: Abhijeet Agnihotri and Kate Tsui

Previous research in human-robot interaction has explored using robots to increase objective and hedonic aspects of well-being and quality of life, but there is no literature on how robots might be used to support eudaimonic aspects of well-being (such as meaning in life). A sense of meaning has been shown to positively affect health and longevity. We frame our study around the Japanese concept of ikigai, which is widely used with Japanese older adults to enhance their everyday lives, and is closely related to the concept of eudaimonic well-being (EWB) known in Western countries. 

We started by conducting interviews and workshops with ikigai experts to ground our understanding of the term. Then, using a mixed-methods approach involving interviews, surveys, and live interactions with QT, we explored how older adults in the US and Japan experience ikigai and how QT might support further support it. These allowed us to build empirically test a model of cross-cultural ikigai.

Ongoing work

We are in the process of defining a number of activities and interventions that can support older adults’ ikigai, such as storytelling (timeslips) and reflection-based prompts.

Examples of storytelling (timeslips).

Our technically-oriented ongoing work involves building deep learning models to determine ikigai and engagement during these activities, by using computer vision to determine when older adults are engaged with QT and discussing a topic related to their ikigai (iki-iki face), natural language processing (NLP) to use conversational content from interactions with QT, and audio waveforms to detect emotion through speech. 

Computer vision to detect iki-iki face

We define an iki-iki face as the face someone’s make when they experience ikigai. Currently we are utilizing OpenFace 2.0 (OpenFace 2.0: Facial Behavior Analysis Toolkit Tadas Baltrušaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency, IEEE International Conference on Automatic Face and Gesture Recognition, 2018) to analyze eye gaze and facial expressions.

Example of eye gaze estimation.

In order to estimate eye gaze, we use a Constrained Local Neural Field (CLNF) landmark detector to detect eyelids, iris, and the pupil. For training the landmark detector we used the SynthesEyes training dataset. We use the detected pupil and eye location to compute the eye gaze vector individually for each eye. We fire a ray from the camera origin through the center of the pupil in the image plane and compute it’s intersection with the eye-ball sphere. This gives us the pupil location in 3D camera coordinates. The vector from the 3D eyeball center to the pupil location is our estimated gaze vector. This is a fast and accurate method for person independent eye-gaze estimation in webcam images.

Examples of facial expression recognition.

In order to account for personal differences when processing videos the median value of the features is subtracted from the current frame. To correct for person specific bias in Action Unit (AU) intensity prediction, we take the lowest nth percentile (learned on validation data) of the predictions on a specific person and subtract it from all of the predictions. The models are trained on DISFA, SEMAINE, BP4D, UNBC-McMaster, Bosphorus and FERA 2011 datasets. Where the AU labels overlap across multiple datasets we train on them jointly.


The word co-occurrence network represents the high-frequent pairs of two words in a dialogue between the interviewer and the interviewee. In this study, a dialogue comprises a conversation that starts with the interviewer(s) and ends with the interviewee. 
A node in the plot represents a vocabulary, and a link between two nodes indicates that the two vocabularies have appeared in the same dialogue at least over 15 times in the collection of 17 interview transcripts. 

Note: The relationship (i.e., a link between two vocabularies) is undirected. 

The IU Computer Vision Lab's projects and activities have been funded, in part, by grants and contracts from the Air Force Office of Scientific Research (AFOSR), the Defense Threat Reduction Agency (DTRA), Dzyne Technologies, EgoVid, Inc., ETRI, Facebook, Google, Grant Thornton LLP, IARPA, the Indiana Innovation Institute (IN3), the IU Data to Insight Center, the IU Office of the Vice Provost for Research through an Emerging Areas of Research grant, the IU Social Sciences Research Commons, the Lilly Endowment, NASA, National Science Foundation (IIS-1253549, CNS-1834899, CNS-1408730, BCS-1842817, CNS-1744748, IIS-1257141, IIS-1852294), NVidia, ObjectVideo, Office of Naval Research (ONR), Pixm, Inc., and the U.S. Navy. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.