B Datasets

This Appendix has a list with a description of all the datasets used in this book. A compressed file with a compilation of most of the datasets can be downloaded here: https://github.com/enriquegit/behavior-datasets.

I recommend you to download the datasets compilation file and extract its contents to a local directory. Due to some datasets with large file sizes or license restrictions, not all of them are included in the compiled set. But you can download them separately. Even though a dataset may not be included in the compiled set, it will have a corresponding directory with a README file with instructions on how to get it.

Each dataset in the following list, states whether or not it is included in the compiled set. The datasets are ordered alphabetically.


Included: Yes.

This dataset was collected with a smartphone and contains \(5\) complex activities: commuting, working, at home, shopping at the supermarket and exercising. An Android 2.2 application running on a LG Optimus Me cellphone was used to collect the accelerometer data from each of the axes (x,y,z). The sample rate was set at \(50\) Hz. The cellphone was placed in the user’s belt. A training and a test set were collected on different days. The duration of the activities varies from about \(5\) minutes to a couple of hours. The total recorded data consists of approximately \(41\) hours. The data was collected by one user. Each file contains a whole activity.


Included: Yes.

This dataset contains motor activity recordings of \(23\) unipolar and bipolar depressed patients and \(32\) healthy controls. Motor activity was monitored with an actigraph watch worn at the right wrist (Actiwatch, Cambridge Neurotechnology Ltd, England, model AW4). The sampling frequency was \(32\) Hz. The device uses the inertial sensors data to compute an activity count every minute which is stored as an integer value in the memory unit of the actigraph watch. The number of counts is proportional to the intensity of the movement. The dataset also contains some additional information about the patients and the control group. For more details please see Garcia-Ceja, Riegler, Jakobsen, et al. (2018).


Included: Yes.

This dataset was made available by Kirill Yashuk. The data was collected using an armband device that has \(8\) sensors placed on the skin surface that measure electrical activity from the right forearm at a sampling rate of \(200\) Hz. A video of the device can be seen here: https://youtu.be/1u5-G6DPtkk.

The data contains \(4\) different gestures: 0-rock, 1-scissors, 2-paper, 3-OK, and has \(65\) columns. The last column is the class label from \(0\) to \(3\). Each gesture was recorded \(6\) times for \(20\) seconds. The first \(64\) columns are electrical measurements. \(8\) consecutive readings for each of the \(8\) sensors. For more details, please see Yashuk (2019).


Included: Yes.

The Fish4Knowledge31 (Beyan and Fisher 2013) project made this database available. It contains \(3102\) trajectories belonging to the Dascyllus reticulatus fish observed in the Taiwanese coral reef. Each trajectory is labeled as ‘normal’ or ‘abnormal’. The trajectories were extracted from underwater video. Bounding box’s coordinates over time were extracted from the video. The data does not contain the video images but the final coordinates. The dataset compilation in this book also includes a .csv file with extracted features from the trajectories.


Included: Yes.

The data was collected using an LG Optimus Me smartphone using its accelerometer sensor. The data was collected by \(10\) subjects which performed \(5\) repetitions of \(10\) different gestures (triangle, square, circle, a, b, c, 1, 2, 3, 4) giving a total of \(500\) instances. The sensor is a tri-axial accelerometer which returns values for the x, y, and z axes.

The data was collected by \(10\) volunteers who performed \(5\) repetitions per gesture. The sampling rate was set at \(50\) Hz. To record a gesture the user presses the phone screen with his/her thumb, performs the gesture, and stops pressing the screen. For more information, please see Garcia-Ceja, Brena, and Galván-Tejada (2014).


Included: Yes.

The sound and accelerometer data were collected by \(3\) volunteers while performing \(7\) different home task activities: mop floor, sweep floor, type on computer keyboard, brush teeth, wash hands, eat chips and watch t.v. Each volunteer performed each activity for approximately \(3\) min. If the activity lasted less than \(3\) min, another session was recorded until completing the \(3\) min. The data were collected with a wrist-band (Microsoft Band 2) and a cellphone. The wrist-band was used to collect accelerometer data and was worn by the volunteers in their dominant hand. The accelerometer sensor returns values from the x, y, and z axes, and the sampling rate was set to \(31\) Hz. The cellphone was used to record environmental sound with a sampling rate of \(8000\) Hz and it was placed on a table in the same room the user was performing the activity. To preserve privacy, the dataset does not contain the raw recordings but extracted features. \(16\) features from the accelerometer sensor and \(12\) Mel frequency cepstral coefficients from the audio recordings. For more information, please see Garcia-Ceja, Galván-Tejada, and Brena (2018).


Included: Yes.

This dataset was compiled and made available by the Murder Accountability Project, founded by Thomas Hargrove32. It contains information about homicides in the United States. This dataset includes the age, race, sex, ethnicity of victims and perpetrators, in addition to the relationship between the victim and perpetrator and weapon used. The original dataset includes the database.csv file. The files processed.csv and transactions.RData were generated with the R scripts included in the examples code to facilitate the analysis.


Included: Yes.

This dataset contains WiFi signal recordings from different locations in a building including the MAC address and signal strength. The data was collected with an Android 2.2 application running on a LG Optimus Me cell phone. To generate a single instance, the device scans and records the MAC address and signal strength of the nearby access points. A delay of \(500\) ms is set between scans. For each location, approximately \(3\) minutes of data were collected while the user walked around the specific location. The data has four different locations: bedroomA, beadroomB, tv room and the lobby. To preserve privacy, the MAC addresses are encoded as integer numbers. For more information, please, see Garcia and Brena (2012).


Included: No.

The dataset was made available by Kamminga et al. (2017) and can be downloaded from https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:76131. The researchers placed inertial sensors on sheep and goats and tracked their behavior during one day. They also video-recorded the session and annotated the data with different types of behaviors such as grazing, fighting, scratch-biting, etc. The device was placed on the neck with random orientation and it collects acceleration, orientation, magnetic field, temperature, and barometric pressure. In this book, only data from one of the sheep is used (Sheep/S1.csv).


Included: No.

The authors of this dataset are Chen, Jafari, and Kehtarnavaz (2015). The data was recorded by \(8\) subjects with a Kinect camera and an inertial sensor unit and each subject repeated each action \(4\) times. The number of actions is \(27\) and some of the actions are: right hand wave, two hand front clap, basketball shoot, front boxing, etc. More information about the collection process and pictures can be consulted on the website https://personal.utdallas.edu/~kehtar/UTD-MHAD.html.


Included: Yes.

This dataset is called WISDM33 and was made available by Kwapisz, Weiss, and Moore (2010). The dataset has \(6\) different activities: walking, jogging, walking upstairs, walking downstairs, sitting and standing. The data was collected by \(36\) volunteers with the accelerometer of an Android phone located in the users’ pants pocket and with a sampling rate of \(20\) Hz.


Included: No.

This dataset contains color face images of \(64 \times 64\) pixels and is published here: http://conradsanderson.id.au/lfwcrop/. This is a cropped version (Sanderson and Lovell 2009) of the Labeled Faces in the Wild (LFW) database (Huang et al. 2008). Please, download the color version (lfwcrop_color.zip) and copy all ppm files into the faces/ directory.

A subset of the database was labeled by Arigbabu et al. (2016), Arigbabu (2017). The labels are provided as two text files (SMILE_list.txt, NON-SMILE_list.txt), each, containing the list of files that correspond to smiling and non-smiling faces. The smiling set has \(600\) pictures and the non-smiling has \(603\) pictures.


Included: Yes.

This dataset contains \(268\) survey responses that include variables related to depression, acculturative stress, social connectedness, and help-seeking behaviors reported by international and domestic students at an international university in Japan. For a detailed description, please see (Nguyen et al. 2019).


Arigbabu, Olasimbo. 2017. Dataset for Smile Detection from Face Images. http://dx.doi.org/10.17632/yz4v8tb3tp.5.

Arigbabu, Olasimbo Ayodeji, Saif Mahmood, Sharifah Mumtazah Syed Ahmad, and Abayomi A Arigbabu. 2016. “Smile Detection Using Hybrid Face Representation.” Journal of Ambient Intelligence and Humanized Computing 7 (3): 415–26.

Beyan, Cigdem, and Robert B Fisher. 2013. “Detecting Abnormal Fish Trajectories Using Clustered and Labeled Data.” In 2013 IEEE International Conference on Image Processing, 1476–80. IEEE.

Chen, Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2015. “UTD-Mhad: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor.” In 2015 Ieee International Conference on Image Processing (Icip), 168–72. IEEE.

Garcia, Enrique A., and Ramon F. Brena. 2012. “Real Time Activity Recognition Using a Cell Phone’s Accelerometer and Wi-Fi.” In Workshop Proceedings of the 8th International Conference on Intelligent Environments, 13:94–103. Ambient Intelligence and Smart Environments. IOS Press. https://doi.org/10.3233/978-1-61499-080-2-94.

Garcia-Ceja, Enrique, Ramon Brena, and CarlosE. Galván-Tejada. 2014. “Contextualized Hand Gesture Recognition with Smartphones.” In Pattern Recognition, edited by JoséFrancisco Martínez-Trinidad, JesúsAriel Carrasco-Ochoa, JoséArturo Olvera-Lopez, Joaquín Salas-Rodríguez, and ChingY. Suen, 8495:122–31. Lecture Notes in Computer Science. Springer International Publishing. https://doi.org/10.1007/978-3-319-07491-7_13.

Garcia-Ceja, Enrique, Carlos E Galván-Tejada, and Ramon Brena. 2018. “Multi-View Stacking for Activity Recognition with Sound and Accelerometer Data.” Information Fusion 40: 45–56.

Garcia-Ceja, Enrique, Michael Riegler, Petter Jakobsen, Jim Tørresen, Tine Nordgreen, Ketil J. Oedegaard, and Ole Bernt Fasmer. 2018. “Depresjon: A Motor Activity Database of Depression Episodes in Unipolar and Bipolar Patients.” In Proceedings of the 9th Acm on Multimedia Systems Conference. MMSys’18. Amsterdam, The Netherlands: ACM. https://doi.org/10.1145/3204949.3208125.

Huang, Gary B, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. 2008. “Labeled Faces in the Wild: A Database Forstudying Face Recognition in Unconstrained Environments.” In.

Kamminga, Jacob W, Helena C Bisby, Duc V Le, Nirvana Meratnia, and Paul JM Havinga. 2017. “Generic Online Animal Activity Recognition on Collar Tags.” In Proceedings of the 2017 Acm International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 Acm International Symposium on Wearable Computers, 597–606.

Kwapisz, Jennifer R., Gary M. Weiss, and Samuel A. Moore. 2010. “Activity Recognition Using Cell Phone Accelerometers.” In Proceedings of the Fourth International Workshop on Knowledge Discovery from Sensor Data (at KDD-10), Washington DC.

Nguyen, Minh-Hoang, Manh-Toan Ho, Quynh-Yen T. Nguyen, and Quan-Hoang Vuong. 2019. “A Dataset of Students’ Mental Health and Help-Seeking Behaviors in a Multicultural Environment.” Data 4 (3). https://doi.org/10.3390/data4030124.

Sanderson, Conrad, and Brian C Lovell. 2009. “Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference.” In International Conference on Biometrics, 199–208. Springer.

Yashuk, Kirill. 2019. Classify Gestures by Reading Muscle Activity: A Recording of Human Hand Muscle Activity Producing Four Different Hand Gestures. https://www.kaggle.com/kyr7plus/emg-4.