B Datasets
This Appendix has a list with a description of all the datasets used in this book. A compressed file with a compilation of most of the datasets can be downloaded here: https://github.com/enriquegit/behavior-free-datasets
I recommend you to download the datasets compilation file and extract its contents to a local directory. Due to some datasets with large file sizes or license restrictions, not all of them are included in the compiled set. But you can download them separately. Even though a dataset may not be included in the compiled set, it will have a corresponding directory with a README file with instructions on how to obtain it.
Each dataset in the following list, states whether or not it is included in the compiled set. The datasets are ordered alphabetically.
B.1 COMPLEX ACTIVITIES
Included: Yes.
This dataset was collected with a smartphone and contains \(5\) complex activities: ‘commuting’, ‘working, at home’, ‘shopping at the supermarket’ and ‘exercising’. An Android 2.2 application running on a LG Optimus Me cellphone was used to collect the accelerometer data from each of the axes (x,y,z). The sample rate was set at \(50\) Hz. The cellphone was placed in the user’s belt. A training and a test set were collected on different days. The duration of the activities varies from about \(5\) minutes to a couple of hours. The total recorded data consists of approximately \(41\) hours. The data was collected by one user. Each file contains a whole activity.
B.2 DEPRESJON
Included: Yes.
This dataset contains motor activity recordings of \(23\) unipolar and bipolar depressed patients and \(32\) healthy controls. Motor activity was monitored with an actigraph watch worn at the right wrist (Actiwatch, Cambridge Neurotechnology Ltd, England, model AW4). The sampling frequency was \(32\) Hz. The device uses the inertial sensors data to compute an activity count every minute which is stored as an integer value in the memory unit of the actigraph watch. The number of counts is proportional to the intensity of the movement. The dataset also contains some additional information about the patients and the control group. For more details please see Garcia-Ceja, Riegler, Jakobsen, et al. (2018).
B.3 ELECTROMYOGRAPHY
Included: Yes.
This dataset was made available by Kirill Yashuk. The data was collected using an armband device that has \(8\) sensors placed on the skin surface that measure electrical activity from the right forearm at a sampling rate of \(200\) Hz. A video of the device can be seen here: https://youtu.be/OuwDHfY2Awg.
The data contains \(4\) different gestures: 0-rock, 1-scissors, 2-paper, 3-OK, and has \(65\) columns. The last column is the class label from \(0\) to \(3\). Each gesture was recorded \(6\) times for \(20\) seconds. The first \(64\) columns are electrical measurements. \(8\) consecutive readings for each of the \(8\) sensors. For more details, please see Yashuk (2019).
B.4 FISH TRAJECTORIES
Included: Yes.
The Fish4Knowledge32 (Beyan and Fisher 2013) project made this database available. It contains \(3102\) trajectories belonging to the Dascyllus reticulatus fish observed in the Taiwanese coral reef. Each trajectory is labeled as ‘normal’ or ‘abnormal’. The trajectories were extracted from underwater video. Bounding box’s coordinates over time were extracted from the video. The data does not contain the video images but the final coordinates. The dataset compilation in this book also includes a .csv file with extracted features from the trajectories.
B.5 HAND GESTURES
Included: Yes.
The data was collected using an LG Optimus Me smartphone using its accelerometer sensor. The data was collected by \(10\) subjects which performed \(5\) repetitions for each of the \(10\) different gestures (‘triangle’, ‘square’, ‘circle’, ‘a’, ‘b’, ‘c’, ‘1’, ‘2’, ‘3’, ‘4’) giving a total of \(500\) instances. The sensor is a tri-axial accelerometer which returns values for the x, y, and z axes. The sampling rate was set at \(50\) Hz. To record a gesture the user presses the phone screen with her/his thumb, performs the gesture, and stops pressing the screen. For more information, please see Garcia-Ceja, Brena, and Galván-Tejada (n.d.).
B.6 HOME TASKS
Included: Yes.
Sound and accelerometer data were collected by \(3\) volunteers while performing \(7\) different home task activities: ‘mop floor’, ‘sweep floor’, ‘type on computer keyboard’, ‘brush teeth’, ‘wash hands’, ‘eat chips’, and ‘watch t.v’. Each volunteer performed each activity for approximately \(3\) minutes. If the activity lasted less than \(3\) minutes, another session was recorded until completing the \(3\) minutes. The data were collected with a wrist-band (Microsoft Band 2) and a cellphone. The wrist-band was used to collect accelerometer data and was worn by the volunteers in their dominant hand. The accelerometer sensor returns values from the x, y, and z axes, and the sampling rate was set to \(31\) Hz. A cellphone was used to record environmental sound with a sampling rate of \(8000\) Hz and it was placed on a table in the same room the user was performing the activity. To preserve privacy, the dataset does not contain the raw audio recordings but extracted features. Sixteen features from the accelerometer sensor and \(12\) Mel frequency cepstral coefficients from the audio recordings. For more information, please see Garcia-Ceja, Galván-Tejada, and Brena (2018).
B.7 HOMICIDE REPORTS
Included: Yes.
This dataset was compiled and made available by the Murder Accountability Project, founded by Thomas Hargrove33.
It contains information about homicides in the United States. This dataset includes the age, race, sex, ethnicity of victims, and perpetrators, in addition to the relationship between the victim and perpetrator and weapon used. The original dataset includes the database.csv
file. The files processed.csv
and transactions.RData
were generated with the R scripts included in the examples code of the corresponding sections to facilitate the analysis.
B.8 INDOOR LOCATION
Included: Yes.
This dataset contains Wi-Fi signal recordings fo access points from different locations in a building including their MAC address and signal strength. The data was collected with an Android 2.2 application running on a LG Optimus Me cell phone. To generate a single instance, the device scans and records the MAC address and signal strength of the nearby access points. A delay of \(500\) ms is set between scans. For each location, approximately \(3\) minutes of data were collected while the user walked around the specific location. The data includes four different locations: ‘bedroomA’, ‘beadroomB’, ‘tv room’ and the ‘lobby’. To preserve privacy, the MAC addresses are encoded as integer numbers. For more information, please, see Garcia and Brena (2012).
B.9 SHEEP GOATS
Included: No.
The dataset was made available by Kamminga et al. (2017) and can be downloaded from https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:76131. The researchers placed inertial sensors on sheep and goats and tracked their behavior during one day. They also video-recorded the session and annotated the data with different types of behaviors such as ‘grazing’, ‘fighting’, ‘scratch-biting’, etc. The device was placed on the neck with random orientation and it collects acceleration, orientation, magnetic field, temperature, and barometric pressure. In this book, only data from one of the sheep is used (Sheep/S1.csv
).
B.10 SKELETON ACTIONS
Included: No.
The authors of this dataset are Chen, Jafari, and Kehtarnavaz (2015). The data was recorded by \(8\) subjects with a Kinect camera and an inertial sensor unit and each subject repeated each action \(4\) times. The number of actions is \(27\) and some of the actions include: ‘right hand wave’, ‘two hand front clap’, ‘basketball shoot’, ‘front boxing’, etc. More information about the collection process and pictures can be consulted on the website https://personal.utdallas.edu/~kehtar/UTD-MHAD.html. You only need to download the Skeleton_Data.zip file
.
B.11 SMARTPHONE ACTIVITIES
Included: Yes.
This dataset is called WISDM34 and was made available by Kwapisz, Weiss, and Moore (2010). The dataset includes \(6\) different activities: ‘walking’, ‘jogging’, ‘walking upstairs’, ‘walking downstairs’, ‘sitting’, and ‘standing’. The data was collected by \(36\) volunteers with the accelerometer of an Android phone located in the users’ pants pocket and with a sampling rate of \(20\) Hz.
B.12 SMILES
Included: No.
This dataset contains color face images of \(64 \times 64\) pixels and is published here: http://conradsanderson.id.au/lfwcrop/. This is a cropped version (Sanderson and Lovell 2009) of the Labeled Faces in the Wild (LFW) database (Gary B. Huang et al. 2008). Please, download the color version (lfwcrop_color.zip) and copy all ppm files into the faces/
directory.
A subset of the database was labeled by O. A. Arigbabu et al. (2016), O. Arigbabu (2017). The labels are provided as two text files (SMILE_list.txt, NON-SMILE_list.txt), each, containing the list of files that correspond to smiling and non-smiling faces (CC BY 4.0 https://creativecommons.org/licenses/by/4.0/legalcode). The smiling set has \(600\) pictures and the non-smiling has \(603\) pictures.
B.13 STUDENTS’ MENTAL HEALTH
Included: Yes.
This dataset contains \(268\) survey responses that include variables related to depression, acculturative stress, social connectedness, and help-seeking behaviors reported by international and domestic students at an international university in Japan. For a detailed description, please see (Nguyen et al. 2019).