Multimodal Focused Interaction Dataset

Recording of daily life experiences from a first-person perspective has become more prevalent with the increasing availability of wearable cameras and sensors. This dataset was captured during development of a system for automatic detection of social interactions in such data streams, and in particular focused interactions in which co-present individuals, having mutual focus of attention, interact by establishing face-to-face engagement and direct conversation. Existing public datasets for social interaction captured from first person perspective tend to be limited in terms of duration, number of people appearing, continuity and variability of the recording.

We contribute the Focused Interaction Dataset which includes video acquired using a shoulder-mounted GoPro Hero 4 camera, as well as inertial sensor data and GPS data, and output from a voice activity detector. The dataset contains 377 minutes (including 566,000 video frames) of continuous multimodal recording captured during 19 sessions, with 17 conversational partners in 18 different indoor/outdoor locations. The sessions include periods in which the camera wearer is engaged in focused interactions, in unfocused interactions, and in no interaction. Annotations are provided for all focused and unfocused interactions for the complete duration of the dataset. Anonymised IDs for 13 people involved in the focused interactions are also provided. In addition to development of social interaction analysis, the dataset may be useful for applications such as activity detection, personal location of interest understanding, and person association.

The dataset includes

RGB 1080p video (without audio) data at 25 Hz
Voice activity detection audio features at 25 Hz
Inertial sensors (accelerometer, gyroscope, magnetometer) and GPS data captured at 2 Hz
Focused interaction annotations in ELAN format
Focused interaction annotations for each video frame at csv files
6-fold crossvalidation data spilt used in our experiments

LICENSE

The Multimodal Focused Interaction Dataset dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0). In addition, reference must be made to the following publication whenever research making use of this dataset is reported in any academic publication or research report:

Sophia Bano, Tamas Suveges, Jianguo Zhang and Stephen J.McKenna
Multimodal Egocentric Analysis of Focused Interactions
IEEE Access, Vol. 6, 2018.

DATASET LINK

- Instructions for using the dataset: Read me
- Dataset (video, inertial sensor, voice activity detection features) and the ground-truth annotations: GOPR0177, GOPR0184b, GOPR0185, GOPR0188,
  GOPR0193, GOPR0194, GOPR0196, GOPR0198, GOPR0201b, GOPR0202b, GOPR0203, GOPR0204,
  GOPR0207, GOPR0208, GOPR0209, GOPR4037, GOPR4038, GOPR4041, GOPR4042
- 6-fold validation split (used in our related publications): 6-folds

RELATED PUBLICATION

- - Sophia Bano, Tamas Suveges, Jianguo Zhang and Stephen J.McKenna
    Multimodal Egocentric Analysis of Focused Interactions
    IEEE Access, Vol. 6, 2018.
- - Sophia Bano, Jianguo Zhang and Stephen J. McKenna
    Finding Time Together: Detection and Classification of Focused Interaction in Egocentric Video
    The IEEE International Conference on Computer Vision EPIC-Workshop (ICCVW 2017),
    Venice, Italy, 2017.

ACKNOWLEDGEMENTS

This research received funding from the UK EPSRC project (EP/N014278/1) titled ACE-LP: Augmenting Communication using Environmental Data to drive Language Prediction.
Special thanks to Annalu Waller (University of Dundee), the entire ACE-LP project team and members of the CVIP group (University of Dundee) for useful discussions and assistance with dataset collection.

CONTACT

For comments, suggestions or feedback, or if you experience any problems with this website or the dataset, please contact Stephen McKenna