When training a model you need to keep track of its performance. This tells you
when the model is finished training (preventing overfit),
or whether your training is proceeding as expected, or otherwise allows you to
gauge the effectiveness of your model/training set/hyper parameters.
The basic training loop is simply
init model
calculate loss for a training data set
use loss to calculate gradients
pass this on to optimiser
And that’s all well and good; but how can you decide how good the model is at
the end of the day? Well you could just check the loss, and that’s not a bad
idea. Let’s update this training scheme, and track the loss over batches using
tensorboard.
Tensorboard
Tensorboard is a great library for
storing and visualising machine learning model meta data and training
information.
The library keeps track of metrics by writing them out using helper objects.
Here, we’ve imported the SummaryWriter object, instantiated below:
This object will let you store metrics during the training process.
Then we can view the logs (if you’re using a Jupyter notebook, and gosh-darn it
you should be) with
Which looks something like:
In that screenshot I’ve included the learning rate, which shows the versatility
and utility of tensorboard; you can put anything on it! Looking on the bottom
left of the screenshot, there is a garble of text; this is a way of
discriminating between runs. You set this by changing the target logging
directory in the SummaryWriter constructor:
where hyper_params_str is some text describing the run.
Metrics
Ah! I haven’t talked properly about any metrics yet. D’oh!
A metric is a measurement of a pair of predictions and targets which describes
how close the two are. The one that comes to find is ‘accuracy’ but there are
others:
precision
recall
F-Score
Intersection over union
Quickly putting some maths to this, let’s start with the basics: true and false
positives and negatives: The true positives (TP) are predictions made by the
model which are positive and correct. That means, if the model is designed to
detect dogs, it is “positive” if it says “yes that is dog” and correct if it
indeed was shown a dog. True negatives (TN) are negative and correct (shown a
cat and responds “that is not dog”). False positives (FP) and false
negatives (FN) are where the model has gone wrong. Wikipedia has a nice image
for this:
We don’t need to implement these in pytorch (although it wouldn’t be difficult
to do so). We can use the
torchmetrics library which
contains a whole bunch more (saving a heap of time):
This initialises a dictionary with a couple metric calculating functions, each
called in a similar way:
With this set up, we can add a section to our training loop to calculate the
metrics and write them out to tensorboard.
Final
Putting all this together yields a training script which records the hyper
parameters used, several metrics and both the training and validation loss
(cross entropy in this case):