1

Lecture 5 - Part *a*

Training and Evaluation

2

3

4

5

6

7

**Training**set*train*

**Validation**set*fine-tune*

**Test**set*evaluate*

8

**Data items**- in the
*validation*or*evaluation*sets

- in the
**Features**- highly correlated to prediction
- not present in the production environment

9

- Cycle training and validation data several times
- Useful when dataset is small

- Split the data into $n$ portions
- Train the model $n$ times using $n-1$ portions for training
- Average results

10

11

**Metric**- How to measure errors?
- Both training and testing

**Training**- How to “help” the ML model to perform well?

**Validation**- How to pick the best ML model?

**Evaluation**- How to “help” the ML model to generalize?

12

- Errors are almost inevitable!

- How to measure errors?
- Select an evaluation procedure (a “metric”)

13

14

- These are the most common questions:
*How*is the prediction wrong?*How often*is the prediction wrong?- What is the
*cost*of wrong predictions? - How does the
*cost*vary by the wrong prediction type? - How can
*costs*be minimised?

15

16

$MAE= \frac{1}{N}\sum^N_{J=1}|p_j - v_j|$

Average of the difference between the *expected* value ($v_j$) and the *predicted* value ($p_j$)

17

$MSE= \frac{1}{2N}\sum^N_{J=1}(p_j - v_j)^2$

Average of the square of the difference between the *training* value ($v_j$) and the *expected* value ($p_j$)

Square is easier to use during the training process (derivative)

More significant errors are more pronounced

18

19

Describes the complete performance of the model

20

21

$\frac{TP+TN}{TP+TN+FP+FN}$

The *percentage of times* that a model is correct

The model with the highest accuracy *is not necessarily the best*

Some errors (e.g., False Negative) can be more costly than others

22

23

- Detecting the “Alexa” command?
- Pregnancy detection
- Cost of “false negatives”?
- Cost of “false positives”?

- Covid testing
- Cost of “false negatives”?
- Cost of “false positives”?

- Law enforcement?

24

$\frac{\frac{TP}{TP+FN}+\frac{TN}{FP+TN}}{2}$

Average of single class performance

Good to use when the distribution of data items in classes is imbalanced

25

$\frac{\frac{TP}{(TP+FN)*w}+\frac{TN}{(FP+TN)*(1-w)}}{2}$

**Weighted** average of single-class performance

Weight depends on the popularity of a class.

26

$\frac{TP}{TP + FP}$

Among the examples we classified as positive, how many did we correctly classify?

$\frac{TP}{TP + FN}$

Among the positive examples, how many did we correctly classify?

27

$F_1 = 2 * \frac{1}{\frac{1}{P}+\frac{1}{R}}$

The harmonic mean between *precision* and *recall*

What is the implicit assumption about the costs of errors?

28

$\frac{TP}{FN + TP}$

Identification of the positively labeled data items

*Same as recall*

29

$\frac{TN}{FP + TN}$

Identification of the negatively labeled data items

*Not the same as precision*

30

31

**Recall**and**sensitivity**- How many were correctly diagnosed as sick among the sick people (positives)?

**Precision**- Among the people diagnosed as sick, how many were sick?

**Specificity**- Among the healthy people (negatives), how many were correctly diagnosed as healthy? \u2028

32

**Recall**and**sensitivity**- How many were correctly deleted among the spam emails (positives)?

**Precision**- Among the deleted emails, how many were spam?

**Specificity**- Among the good emails (negatives), how many were correctly sent to the inbox?\u2028

33

- Constraint:
**high precision**- False positives are tolerable but should be minimised

- Among the available models, pick one with a higher recall
**Metric**:*Recall at Precision =*$x$*%*

34

- One team builds the mode
- Data scientists / ML engineers

- Many teams will make use of it
- e.g., product team, management team

35

Lecture 5 - Part *a*

Training and Evaluation

36

Grokking Machine Learning. Luis G. Serrano. Manning, 2021

[CIS 419/519 Applied Machine Learning]. Eric Eaton, Dinesh Jayaraman.

Deep Learning Patterns and Practices - Andrew Ferlitsch, Maanning, 2021

Machine Learning Design Patterns - Lakshmanan, Robinson, Munn, 2020