Chapter 9 Multi-User Validation

Every person is different. We all have different physical and mental characteristics. Every person reacts in different ways to the same stimulus and conducts physical and motor activities in particular ways. As we have seen, predictive models rely on the training data; and for user-oriented applications, this data encodes their behaviors. When building predictive models, we want them to be general and to perform accurately on new unseen instances. Sometimes this generalization capability comes at a price, especially in multi-user settings. A multi-user setting is one in which the results depend heavily on the target user, that is, the user on which the predictions are made. Take, for example, a hand gesture recognition system. At inference time, a specific person (the target user) performs a gesture and the system should recognize it. The input data comes directly from the user. On the other hand, a non-multi-user system does not depend on a particular person. A classifier that labels fruits on images or a regression model that predicts house prices does not depend directly on a particular person.

Some time ago I had to build an activity recognition system based on inertial data from a wrist band. So I collected the data, trained the models, and evaluated them. The performance results were good. However, it turned out that when the system was tested on a new sample group it failed. The reason? The training data was collected from people within a particular age group (young) but the target market of the product was for much older people. Older people tend to walk more slowly, thus, the system was predicting ‘no movement’ when in fact, the person was walking at a very slow pace. This is an extreme example but even within the same age groups, there are going to be differences between users (inter-user variance). Or even the same user will evolve over time and will change her/his behaviors (intra-user variance).

So, how can we evaluate multi-user systems to reduce the unexpected effects once the system is deployed? Most of the time there’s going to be surprises when testing a system on new users so we need to be aware of that. But in this chapter, I will present \(3\) types of models that will help you to reduce that uncertainty to some extent so you will have a better idea of how the system will behave when tested on more realistic conditions. The models are: mixed models, user-independent models, and user-dependent models. We will see how to train each type of model using a database with actions recorded with a motion capture system. After that, I will also show you how to build adaptive models that are tuned with the objective of increasing their performance for a particular user.

9.1 Mixed Models

Mixed models are trained and validated as usual without considering information about mappings between data points and users. Suppose we have a dataset as shown in Figure 9.1. The first column is the user id, the second column the label we want to predict and the last two columns are two arbitrary features.

Example dataset with a binary label and 2 features.

Figure 9.1: Example dataset with a binary label and 2 features.

With a mixed model we would just remove the userid column and perform \(k\)-fold cross-validation or hold-out validation as usual. In fact, this is what we have been doing so far. By doing so, some random data points will end up in the train set and others in the test set regardless of which data point was generated by which user. The user rows are just mixed, thus the mixed model name. This model assumes that the data was generated by a single user. One of the disadvantages of validating a system using a mixed model is that the performance results could be overestimated. When randomly splitting into train and test sets, some data points for a given user will end up in each of the splits. At inference time, when showing a test sample belonging to a particular user to the model, it is likely that the training set of that model included some data from that particular user. Thus, the model already knows a little bit about that user so we can expect an accurate prediction. However, this assumption not always holds true. If the model is to be used on a new user that the model has never seen before, then, it may not produce very accurate predictions.

When should a mixed model be used to validate a system?

  1. When you know that you will have some available data to train the model for all the users that are intended to use the system.
  2. In many cases, a dataset has already missing information about the mapping between rows and users. That is, a userid column is not present. In those cases, the best performance estimation would be through the use of a mixed model.

To demonstrate the differences between the different types of models (mixed, user-independent, and user-dependent) I will use the SKELETON ACTIONS dataset. First, a brief description of the dataset is presented including details about how the features were extracted. Then, the dataset is used to train a mixed model and in the following subsections, it is used to train user-independent and user-dependent models.

9.1.1 Skeleton Action Recognition with Mixed Models

preprocess_skeleton_actions.R classify_skeleton_actions.R

To demonstrate the \(3\) different types of models I chose the UTD-MHAD dataset (Chen, Jafari, and Kehtarnavaz 2015) and from now on, I will refer to it as the SKELETON ACTIONS dataset. This database is suitable because it was collected by \(8\) persons (\(4\) females/\(4\) males) and each file has a subject id, thus, we know which actions were collected by which users. The number of actions is \(27\) and some of the actions are: ‘right-hand wave’, ‘two hand front clap’, ‘basketball shoot’, ‘front boxing’, etc.

The data was recorded using a Kinect camera and an inertial sensor unit and each subject repeated each of the \(27\) actions \(4\) times. More information about the collection process and pictures can be consulted on the original dataset website

For our examples, I will only focus on the skeleton data generated by the Kinect camera. These data consists of human body joints (\(20\) joints). Each file contains one action for one user and one repetition. The file names have the structure aA_sS_tT_skeleton.mat. The A is the action id, the S is the subject id and the T is the trial (repetition) number. For each time frame, the \(3\)D positions of the \(20\) joints are recorded.

The script preprocess_skeleton_actions.R shows how to read the files and plot the actions. The files are stored in Matlab format. The library R.matlab (Bengtsson 2018) can be used to read the files.

# Path to one of the files.
filepath <- "/skeleton_actions/a7_s1_t1_skeleton.mat"

# Read skeleton file.
df <- readMat(filepath)$d.skel

# Print dimensions.
#> [1] 20  3 66

From the file name, we can see that this corresponds to action \(7\) (basketball shoot), from subject \(1\) and trial \(1\). The readMat() function is used to read the file contents and store them as a \(3\)D array in df. If we print the dimensions we see that the first one corresponds to the number of joints, the second one are the positions (x, y, z), and the last dimension is the number of frames, in this case \(66\) frames.

We can extract the first time-frame as follows:

# Select the first frame.
frame <- data.frame(df[, , 1])

# Print dimensions.
#> [1] 20  3

Each frame can then be plotted. The plotting code is included in the script. The Figure below shows how the skeleton looks like for \(6\) of the time frames. The script also has code to play the action as an animation.

Skeleton of basketball shoot action. 6 frames sampled from the entire sequence.

Figure 9.2: Skeleton of basketball shoot action. 6 frames sampled from the entire sequence.

We will represent each action (file) as a feature vector. The same script also shows the code to extract the feature vectors from each action. To extract the features, a reference point in the skeleton is selected, in this case the spine (joint \(3\)). Then, for each time frame, the distance between all joints (excluding the reference point) and the reference point is calculated. Finally, for each distance, the mean, min, and max are computed across all time frames. Since there are \(19\) joints (excluding the spine), we end up with \(19*3=57\) features. Figure 9.3 shows how the final dataset looks like. Only showing the first \(4\) features out of the \(57\) and user id and labels.

First rows of the skeleton dataset after feature extraction showing the first 4 features.

Figure 9.3: First rows of the skeleton dataset after feature extraction showing the first 4 features.

The following examples assume that the file dataset.csv with the extracted features already exsits in the skeleton_actions/ directory. To generate this file, run the feature extraction code in the script preprocess_skeleton_actions.R.

Once the dataset in a suitable format, we can proceed to train our mixed model. The script containing the full code for training the different types of models is classify_skeleton_actions.R. This script makes use of the dataset.csv file.

First, the auxiliary functions are loaded because we will use the normalize() function to normalize the data. We will use a Random Forest for the classification and the caret package to compute the performance metrics.


# Path to the csv file containing the extracted features.
# preprocess_skeleton_actins.R contains
# the code used to extract the features.
filepath <- file.path(datasets_path,
# Load dataset.
dataset <- read.csv(filepath, stringsAsFactors = T)

# Extract unique labels.
unique.actions <- as.character(unique(dataset$label))

# Print the unique labels.
#>  [1] "a1"  "a10" "a11" "a12" "a13" "a14" "a15" "a16" "a17"
#> [10] "a18" "a19" "a2"  "a20" "a21" "a22" "a23" "a24" "a25"
#> [19] "a26" "a27" "a3"  "a4"  "a5"  "a6"  "a7"  "a8"  "a9" 

The unique.actions variable stores the name of all actions. We will need it later to define the levels of the factor object. Next, we generate \(10\) folds and define some variables to store the performance metrics including the accuracy, recall, and precision. In each iteration during cross-validation, we will compute and store those performance metrics.

k <- 10 # Number of folds.
folds <- sample(k, nrow(dataset), replace = TRUE)
accuracies <- NULL; recalls <- NULL; precisions <- NULL

In the next code snippet, the actual cross-validation is performed. This is just the usual cross-validation procedure. The normalize() function defined in the auxiliary functions is used to normalize the data by only learning the parameters from the train set and applying them to the test set. Then, the actual Random Forest is fitted with the train set. One thing to note here is that the userid field is removed: trainset[,-1] since we are not using users’ information in the mixed model. Then, predictions on the test set are obtained and the accuracy, recall, and precision are computed during each iteration.

# Perform k-fold cross-validation.
for(i in 1:k){
  trainset <- dataset[which(folds != i,),]
  testset <- dataset[which(folds == i,),]
  res <- normalize(trainset, testset)
  trainset <- res$train
  testset <- res$test
  rf <- randomForest(label ~., trainset[,-1])
  preds.rf <- as.character(predict(rf,
                                   newdata = testset[,-1]))
  groundTruth <- as.character(testset$label)
  cm.rf <- confusionMatrix(factor(preds.rf,
                                  levels = unique.actions),
                                  levels = unique.actions))
  accuracies <- c(accuracies, cm.rf$overall["Accuracy"])
  metrics <- colMeans(cm.rf$byClass[,c("Recall",
                      na.rm = TRUE)
  recalls <- c(recalls, metrics["Recall"])
  precisions <- c(precisions, metrics["Precision"])

Finally, the average performance across folds of each of the metrics is printed.

# Print performance metrics.
#> [1] 0.9277258
#> [1] 0.9372515
#> [1] 0.9208455

The results look promising with an average accuracy of \(92.7\%\), a recall of \(93.7\%\), and a precision of \(92\%\). One important thing to remember is that the mixed model assumes that the training data contains instances belonging to users in the test set. Thus, the model already knows a little bit about the users in the test set.

Imagine that now you want to estimate what would be the performance of the model in a situation where a completely new user is shown to the model, that is, the model does not know anything about this user. We can model these situations using a user-independent model which is the topic of the next section.

9.2 User-Independent Models

The user-independent model allows us to estimate the performance of a system on new users. That is, the model does not contain any information about the target user. This resembles a scenario when the user wants to use a service out-of-the-box without having to go through a calibration process or having to collect training data. To build a user-independent model we just need to make sure that the training data does not contain any information about the users on the test set. We can achieve this by splitting the dataset into two disjoint groups of users based on their ids. For example, assign \(70\%\) of the users to the train set and the remaining to the test set. If the dataset is small, one way to optimize it is by performing leave-one-user-out validation: If the dataset has \(n\) users, then, \(n\) iterations are performed. In each iteration, one user is selected as the test set and the remaining ones are used as the train set. Figure 9.4 illustrates leave-one-user-out validation for the first \(2\) iterations of the procedure.

First 2 iterations of leave-one-user-out validation.

Figure 9.4: First 2 iterations of leave-one-user-out validation.

By doing this, we guarantee that the model knows anything about the target user. To implement this leave-one-user-out validation method in our skeleton recognition case, let’s first define some initialization variables. These include the unique.users variable which stores the ids of all users in the database. As before, we will compute the accuracy, recall, and precision so we define variables to store those metrics for each user.

# Get a list of unique users.
unique.users <- as.character(unique(dataset$userid))

# Print the unique user ids.
#> [1] "s1" "s2" "s3" "s4" "s5" "s6" "s7" "s8"

accuracies <- NULL; recalls <- NULL; precisions <- NULL

Then, we iterate through each user, build the corresponding train and test sets, and train the classifiers. Here, we make sure that the test set only includes data points belonging to a single user.


for(user in unique.users){
  testset <- dataset[which(dataset$userid == user),]
  trainset <- dataset[which(dataset$userid != user),]
  # Normalize. Not really needed since RF
  # is not affected by different features scale.
  res <- normalize(trainset, testset)
  trainset <- res$train
  testset <- res$test
  rf <- randomForest(label ~., trainset[,-1])
  preds.rf <- as.character(predict(rf, newdata = testset[,-1]))
  groundTruth <- as.character(testset$label)
  cm.rf <- confusionMatrix(factor(preds.rf,
                                  levels = unique.actions),
                                  levels = unique.actions))
  accuracies <- c(accuracies, cm.rf$overall["Accuracy"])
  metrics <- colMeans(cm.rf$byClass[,c("Recall",
           na.rm = TRUE)
  recalls <- c(recalls, metrics["Recall"])
  precisions <- c(precisions, metrics["Precision"])

Now we print the average performance metrics across users.

#> [1] 0.5807805
#> [1] 0.5798611
#> [1] 0.6539715

Those numbers are surprising! In the previous section, our mixed model had an accuracy of \(92.7\%\) and now the user-independent model has an accuracy of only \(58\%\)! This is because the latter didn’t know anything about the target user. Since each person different, the user-independent model was not able to capture the patterns of new users and this had a big impact on the performance.

When should a user-independent model be used to validate a system?

  1. When you expect the system to be used out-of-the-box by new users and the system does not have any data from those new users.

The main advantage of the user-independent model is that it does not require training data from the target users so they can start using it right away at the expense of lower accuracy.

The opposite case is when a model is trained specifically for the target user. This model is called the user-dependent model and will be presented in the next section.

9.3 User-Dependent Models

A user-dependent model is trained with data belonging only to the target user. In general, this type of model performs better compared to the mixed model and user-independent model. This is because the model captures the particularities of a specific user. The way to evaluate user-dependent models is to iterate through each user, and for each user, build and test a model only with her/his data. The per-user evaluation can be done using \(k\)-fold cross-validation, for example. For the skeleton database, we only have \(4\) instances per type of action. The number of unique classes (\(27\)) is also high compared to the number of instances per action. If we do, for example, \(10\)-fold cross-validation, it is very likely that the train sets will not contain examples for several of the possible actions. To avoid this, we will do leave-one-out validation within each user. This means that we iterate through each instance. In each iteration, the instance is used as the test set and the remaining ones are used for the train set.

unique.users <- as.character(unique(dataset$userid))

accuracies <- NULL; recalls <- NULL; precisions <- NULL


# Iterate through each user.
for(user in unique.users){
  print(paste0("Evaluating user ", user)) <- dataset[which(dataset$userid == user), -1]
  # Leave-one-out cross validation within each user.
  predictions.rf <- NULL; groundTruth <- NULL
  for(i in 1:nrow({
    # Normalize. Not really needed since RF
    # is not affected by different features scale.
    testset <-[i,]
    trainset <-[-i,]
    res <- normalize(trainset, testset)
    testset <- res$test
    trainset <- res$train
    rf <- randomForest(label ~., trainset)
    preds.rf <- as.character(predict(rf, newdata = testset))
    predictions.rf <- c(predictions.rf, preds.rf)
    groundTruth <- c(groundTruth, as.character(testset$label))
  cm.rf <- confusionMatrix(factor(predictions.rf,
                                  levels = unique.actions),
                                  levels = unique.actions))
  accuracies <- c(accuracies, cm.rf$overall["Accuracy"])
  metrics <- colMeans(cm.rf$byClass[,c("Recall",
                      na.rm = TRUE)
  recalls <- c(recalls, metrics["Recall"])
  precisions <- c(precisions, metrics["Precision"])
} # end of users iteration.

We iterated through each user and performed the leave-one-out-validation for each, independently of the others and stored their results. We can now compute the average performance across all users.

# Print average performance across users.

#> [1] 0.943114
#> [1] 0.9425154
#> [1] 0.9500772

This time, the average accuracy was \(94.3\%\) which is higher than the accuracy achieved with the mixed model and the user-independent model. The average recall and precision were also higher compared to the other types of models. The high performance is because each model was targeted to a particular user.

When should a user-dependent model be used to validate a system?

  1. When the model will be trained only using data from the target user that will use the system.

In general, user-dependent models have the best accuracy. The disadvantage is that they require training data from the target user and for some applications, collecting training data can be very difficult and expensive.

Can we have a system that has the best of both worlds from user dependent/independent models? That is, a model that is as accurate as a user-dependent model but requires small quantities of training data from the target user. The answer is yes, and how we can do that is covered in the next section User-Adaptive Models.

9.4 User-Adaptive Models

We have already talked about some of the limitations of user-dependent and user-independent models. On one hand, user-dependent models require training data from the target user. In many situations, collecting training data is difficult. On the other hand, user-independent models do not need data from the target user but are less accurate. To overcome those limitations, models that evolve over time as more information is available can be built. One can start with a user-independent model and as more data is available from the target user, the model can be updated accordingly. In this situation, there is no need to wait to start using the system and as new feedback is available, the model gets better and better by learning the specific patterns of a particular user.

In this section, I will explain how a technique called transfer learning can be used to build an adaptive model that updates itself as new training data is available. First, in the following subsection the idea of transfer learning is introduced and next, the method is used to build an adaptive model for activity recognition.

9.4.1 Transfer Learning

In machine learning, transfer learning refers to the idea of using the knowledge gained when solving a problem to solve another different problem. The new problem can be a similar one but also could be very unrelated. For example, a model trained to detect smiles from images could also be used to predict gender (of course with some fine-tuning). In humans, learning is a lifelong process in which many tasks are interrelated. When we are faced with a new problem, we tend to find solutions that have worked in the past for similar problems. However, in machine learning most of the time the models are trained from scratch for every new problem. For many tasks, training a model from scratch is very time consuming and requires a lot of effort, especially during the data collection and labeling phase.

The idea of transfer learning dates back to 1991 (Pratt et al. 1991) but with the advent of deep learning and in particular, with convolutional neural networks (see chapter 8), it has gained popularity because it has proven to be a valuable tool when solving challenging problems. In 2014 a CNN architecture called VGG16 was proposed by Simonyan and Zisserman (2014) and won the ILSVR image recognition competition. This CNN was trained with more than \(1\) million images to recognize \(1000\) categories and it consists of several convolution layers, max pooling operations, and fully connected layers. In total, the network has \(\approx 138\) million parameters and it took some weeks to train.

What if you wanted to add a new category to the \(1000\) labels? Or maybe, you only want to focus on a subset of the categories? With transfer learning you can take advantage of a network that has already been trained and adapt it to your particular problem. In the case of deep learning, the approach consists of ‘freezing’ the first layers of a network and only retrain and/or modify the last layers to adapt them for the particular problem. During training, the frozen layers’ parameters will not change and the unfrozen ones are updated as usual during the gradient descent procedure. As discussed in chapter 8, the first layers can act as feature extractors and can be reused. With this approach, you can easily retrain a VGG16 network in an average computer and within a reasonable time. In fact, Keras already provides interfaces to common pre-trained architectures that you can reuse.

In the following section we will use this idea to build a user-adaptive model for activity recognition using transfer learning.

9.4.2 A User-Adaptive Model for Activity Recognition


For this example, we will use the SMARTPHONE ACTIVITIES dataset encoded as images . In chapter 7 (section: Images) I showed you how timeseries data can be represented as an image with an example of how an activity can be represented as an RBG color image where each channel corresponds to one of the acceleration axes (x, y, z). We will use the file images.txt that already contains the activities in image format. The procedure of converting the raw data into this format was explained in chapter 7 and the corresponding code is in script timeseries_to_images.R. Since the input data are images, we will use a convolutional neural network (see chapter 8).

The main objective will be to build an adaptive model with a small amount of training data from the target user. What we will do first is build a user-independent model. That is, we will select one of the users as the target user. We will train a user-independent model with data from the remaining users (excluding the target user). Then, we will apply transfer learning on this model to adapt it to the target user.

The target user’s data will be split into a test set and an adaptive set. The test set will be used to evaluate the performance of the model and the adaptive set will be used to fine-tune the model. The adaptive set is used to simulate that we have obtained new data from the target user.

The complete code is in script keras/adaptive_cnn.R. First, we start by reading the images file. Each row corresponds to one activity. The last two columns are the userid and the class. The first \(300\) columns correspond to the image pixels. Each image has a size of \(10 \times 10 \times 3\) (height, width, depth).

# Path to smartphone activities in image format.
filepath <- file.path(datasets_path,

# Read data.
df <- read.csv(filepath, stringsAsFactors = F)

# Shuffle rows.
df <- df[sample(nrow(df)),]

The rows happen to be ordered by user and activity, so we shuffle them to ensure that the model is not biased towards the last users and activities.

Since we will train a CNN using keras, we need the classes to be in integer format. The following code is used to append a new column intlabel to the database. This new column contains the classes as integers. We also create a variable mapping to keep track of the mapping between integers and the actual labels. By printing the mapping variable we can see that for example, the ‘Walking’ label has a corresponding integer value of \(0\), ‘Downstairs’ \(1\), and so on.

## Convert labels to integers starting at 0. ##

# Get the unique labels.
labels <- unique(df$label)

mapping <- 0:(length(labels) - 1)

names(mapping) <- labels

#>    Walking Downstairs    Jogging   Standing   Upstairs    Sitting 
#>         0          1          2          3          4          5 

# Append labels as integers at the end of data frame.
df$intlabel <- mapping[df$label]

Now we store the unique users’ ids in the users variable. After printing it, you can notice that there are \(19\) distinct users in this database. The original database has more users but we only kept those that performed all the activities. Then, we can select one of the users to act as the target user. I will just select one of them at random (turned out to be user \(24\)). Feel free to select another user if you want.

# Get the unique user ids.
users <- unique(df$userid)

# Print all user ids.
#> [1] 29 20 18  8 32 27  3 36 34  5  7 12  6 21 24 31 13 33 19

# Choose one user at random to be the target user.
targetUser <- sample(users, 1)

Next, we split the data into two sets. The first set trainset contains the data from all users but excluding the target user. Then we create two variables: train.y and train.x. The first one has the labels as integers and the second one has the actual image pixels (features).

# Split into train and target user sets.
# The train set includes data from all users excluding targetUser.
trainset <- df[df$userid != targetUser,]

# Save train labels in a separate variable.
train.y <- trainset$intlabel

# Save train pixels in a separate variable.
train.x <- as.matrix(trainset[,-c(301,302,303)])

Now we split the target’s user data into \(50\%\) test data and \(50\%\) adaptive data such that we end up with \(4\) variables:

  1. target.adaptive.y Integer labels for the adaptive data of the target user.
  2. target.adaptive.x Pixels of the adaptive data of the target user.
  3. target.test.y Integer labels for the test data of the target user.
  4. target.test.x Pixels of the test data of the target user.
# This contains all data from the target user. <- df[df$userid == targetUser,]

# Split the target user's data into 50% adaptive
# and 50% test data.
idxs <- sample(nrow(,
               size = nrow( * 0.5,
               replace = FALSE)

# Target user adaptive set.
# This data is used to adapt the model to the target user.
target.adaptive <-[idxs,]

# Save target.adaptive labels in a separate variable.
target.adaptive.y <- target.adaptive$intlabel

# Save target.adaptive pixels in a separate variable.
target.adaptive.x <- as.matrix(target.adaptive[,-c(301,302,303)])

# Target user test set.
# This data is used to test the performance.
target.test <-[-idxs,]

# Save target.test labels in a separate variable.
target.test.y <- target.test$intlabel

# Save target.test pixels in a separate variable.
target.test.x <- as.matrix(target.test[,-c(301,302,303)])

We also need to normalize the data and reshape it into the actual image format since in their current form, the pixels are stored into \(1\)-dimensional arrays. We learn the normalization parameters only from the train set and then we used the normalize.reshape() function (defined in the same script file) to perform the actual normalization and formatting.

# Learn min and max values from train set for normalization.
maxv <- max(train.x)
minv <- min(train.x)

# Normalize and reshape. May take some minutes.
train.x <- normalize.reshape(train.x, minv, maxv)
target.adaptive.x <- normalize.reshape(target.adaptive.x, minv, maxv)
target.test.x <- normalize.reshape(target.test.x, minv, maxv)

Let’s inspect how the structure of the final datasets looks like.

#> [1] 6399   10   10    3
#> [1] 124  10  10   3
#> [1] 124  10  10   3

Here, we see that the train set has \(6399\) instances (images). The adaptive and test sets both have \(124\) instances.

Now that we are finally done with all the preprocessing, it is time to build the CNN model! This one will be the initial user-independent model and is trained with all the training data train.x, train.y.

model <- keras_model_sequential()

model %>%
  layer_conv_2d(name = "conv1",
                filters = 8,
                kernel_size = c(2,2),
                activation = 'relu',
                input_shape = c(10,10,3)) %>%
  layer_conv_2d(name = "conv2",
                filters = 16,
                kernel_size = c(2,2),
                activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_flatten() %>%
  layer_dense(name = "hidden1", units = 32,
              activation = 'relu') %>%
  layer_dropout(0.25) %>%
  layer_dense(units = 6, activation = 'softmax')

This CNN has two convolutional layers followed by a max pooling operation, a fully connected layer, and an output layer. One important thing to note is that we have specified a name for each layer with the name parameter. For example, the first convolution’s name is conv1, the second one is conv2, and the fully connected layer was named hidden1. These names must be unique because they will allow us to later select specific layers fo freeze and unfreeze.

If we print the model’s summary we can see that in total it has \(9,054\) trainable parameters and \(0\) non-trainable parameters. This means that all the parameters of the network will be updated during the gradient descent procedure, as usual.

# Print summary.
Summary of initial user-independent model.

Figure 9.5: Summary of initial user-independent model.

The next code will compile the model and initiate the training phase.

# Compile model.
model %>% compile(
  loss = 'sparse_categorical_crossentropy',
  optimizer = optimizer_sgd(lr = 0.01),
  metrics = c("accuracy")

# Fit the user-independent model.
history <- model %>% fit(
  train.x, train.y,
  epochs = 50,
  batch_size = 8,
  validation_split = 0.15,
  verbose = 1,
  view_metrics = TRUE

Note that this time the loss was defined as loss = 'sparse_categorical_crossentropy' instead of the usual loss = 'categorical_crossentropy'. Here, the sparse_ suffix was added. You may have noted that in this example we did not one-hot encode the labels but they were only transformed into integers. By adding the sparse_ suffix we are telling Keras that our labels are not one-hot-encoded but encoded as integers starting at \(0\) so it will perform the one-hot encoding for us. This is a little trick that saved us some time.

We save the model so we can load it later. Let’s also estimate the model’s performance on the target user test set.

# Save model.
save_model_hdf5(model, "user-independent.hdf5")

# Compute performance (accuracy) on the target user test set.
model %>% evaluate(target.test.x, target.test.y)
#>      loss  accuracy 
#> 2.1888916 0.6129032

The overall accuracy of this user-independent model when tested on the target user was \(61.2\%\) (quite low). Now, we can apply transfer learning and see if we can do better. We will ‘freeze’ the first convolution layer and only update the second convolution layer and the remaining fully connected layers using the target user-adaptive data. The following code loads the previously trained user-independent model. Then all the CNN’s weights are frozen using the freeze_weights() function. The from parameter specifies the first layer (inclusive) from which the parameters are to be frozen. Here, we set it to \(1\) so all parameters in the network are ‘frozen’. Then, we can use the unfreeze_weights() function to specify from which layer (inclusive) the parameters should be unfrozen. In this case, we want to retrain from the second convolutional layer so we set it to conv2 which is how we named this layer earlier.

adaptive.model <- load_model_hdf5("user-independent.hdf5")

# Freeze all layers.
freeze_weights(adaptive.model, from = 1)

# Unfreeze layers from conv2.
unfreeze_weights(adaptive.model, from = "conv2")

After defining those changes we need to compile the model so the modifications take effect.

# Compile model. We need to compile after freezing/unfreezing weights.
adaptive.model %>% compile(
  loss = 'sparse_categorical_crossentropy',
  optimizer = optimizer_sgd(lr = 0.01),
  metrics = c("accuracy")

Summary of user-independent model after freezing first convolutional layer.

Figure 9.6: Summary of user-independent model after freezing first convolutional layer.

After printing the summary, now you can note that the number of trainable and non-trainable parameters has changed. Now, the non-trainable parameters are \(104\) but before they were \(0\). These \(104\) parameters correspond to the first convolutional layer but this time they will not be updated during the gradient descent training phase.

The following code will retrain the model using the adaptive data but keeping the first convolutional layer fixed.

# Update model with adaptive data.
history <- adaptive.model %>% fit(
  target.adaptive.x, target.adaptive.y,
  epochs = 50,
  batch_size = 8,
  validation_split = 0,
  verbose = 1,
  view_metrics = TRUE
Note that this time the validation_split was set to \(0\). This is because the target user data set is very small so there is not enough data to use as validation set. One possible approach to overcome this is to leave a percentage of users out when building the train set for the user-independent model. Then, use those left-out users to find which are the most appropriate layers to keep frozen. Once you are happy with the results, evaluate the model with the target user.
# Compute performance (accuracy) on the target user test set.
adaptive.model %>% evaluate(target.test.x, target.test.y)
#>      loss  accuracy 
#> 0.4652017 0.8548387

If we evaluate the adaptive model performance on the target’s user test set, now we see that the accuracy is \(85.4\%\) which is a considerable increase! (\(\approx 24\%\) increase).

At this point, you may be wondering that this accuracy increase was due to the fact that the model was trained for an additional \(50\) epochs. To check this, we can re-train the initial user-independent model for \(50\) more epochs.

retrained_model <- load_model_hdf5("user-independent.hdf5")

# Fit the user-independent model for 50 more epochs.
history <- retrained_model %>% fit(
  train.x, train.y,
  epochs = 50,
  batch_size = 8,
  validation_split = 0.15,
  verbose = 1,
  view_metrics = TRUE

# Compute performance (accuracy) on the target user test set.
retrained_model %>% evaluate(target.test.x, target.test.y)
#>      loss  accuracy 
#> 2.7197671 0.6612903 

After re-training the user-independent model for \(50\) more epochs, its accuracy increased to \(66.1\%\). On the other hand, the adaptive model was trained with much less data and produced a much better result (\(85.4%\)) with only \(124\) instances as compared to the user-independent model \(\approx 5440\) instances. That is, the \(6399\) instances minus \(15\%\) used as the validation set. This also highlights one of the advantages of transfer learning which is a reduction in the needed amount of training data.

9.5 Summary

Many real-life scenarios involve multi-user settings. That is, the system heavily depends on the specific behavior of a given final user. This chapter covered different types of models that can be used to evaluate the performance of a system in such a scenario.

  • A multi-user setting is one in which its results depend heavily on the target user.
  • Inter/Intra -user variance are the differences between users and within the same users, respectively.
  • Mixed models are trained without considering unique users (user ids) information.
  • User-independent models are trained without including data from the target user.
  • User-dependent models are trained only with data from the target user.
  • User-adaptive models can be adapted to a particular target user as more data is available.
  • Transfer learning is a method that can be used to adapt a model to a particular user without requiring big quantities of data.


Bengtsson, Henrik. 2018. R.matlab: Read and Write Mat Files and Call Matlab from Within R.

Chen, Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2015. “UTD-Mhad: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor.” In 2015 Ieee International Conference on Image Processing (Icip), 168–72. IEEE.

Pratt, Lorien Y, Jack Mostow, Candace A Kamm, and Ace A Kamm. 1991. “Direct Transfer of Learned Information Among Neural Networks.” In Aaai, 91:584–89.

Simonyan, Karen, and Andrew Zisserman. 2014. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” arXiv Preprint arXiv:1409.1556.