Let me give you few simple questions. Answer them.
Are the consequences of detecting False Negatives are Highly severe?
If the answer is yes, Choose Recall.
If not ask again,
Are the consequences of detecting False Positives are Highly severe?
Now if question to this answer is yes, go with Precision.
Ok let’s dive a little bit by taking examples.
Predicting whether or not a patient has a tumor.
In this example,
Detecting False Negatives is said to occur when the patient has Tumor BUT the model didn’t detected it. Now this case has very high Consequences. Since we didn’t detected the tumor, doctor will not the carry out necessary treatment. Fatal.
But False Positives are not really our concern. Why? Say, the model detected a tumor, which in reality doesn’t actually exist. The patient will still be safe as other diagnosis might reveal the prediction was incorrect and no more treatment is required. No problem.
That’s why go for Recall.
Recall / True Positivity Rate / Sensitivity / Probability of Detection
What fraction of all positive instances does the classifier correctly identifies as positive.
Detecting a False Positive occurs when model classifies a not so good candidate to be absolutely perfect for the job position. This has very high consequences for the recruitor as it increase the time for the recruitment process to complete.
Candidates that are good enough for the position but are classified Incapable i.e. False Negatives, might not be able to reach upto interview. But if we see from Recruitor’s point of view, this will save time.
Here we would go with Precision.
For what fraction of positive predictions are correct.
In the last example you might think it might be incorrect to discard perfectly good cadidates. Yeah you are right. But from the recruitor’s viewpoint the time is precious for him/her, and doesn’t justify interviewing a poor candidate.
Well that’s where we arrive at Precision/Recall Trade-off. Usually we prefer to use precision when the end result is to be used by end-user or customer facing applications (as they remember failures).
As seen previously we actually needed both Precision and Recall to be high. We don’t want to waste time interviewing an incapable candidate BUT at the same time let go a perfectly acceptable or may be Perfect candidate.
Unfortunately You can’t have both; increasing precision reduces recall, and vice versa. This is called Precision/recall trade-off.
Plotting Recall and Precision against thresholds gives you way to select a good precision/recall trade-off threshold.
In case you were wondering how I plotted this, here is the code:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_recall_curve
# First you need to get the precisions, recalls, thresholds for your classifier.
y_scores = cross_val_predict(classifier, X_train, y_train, cv=3, method='decision_function')
precisions, recalls, thresholds = precision_recall_curve(y_train, y_scores)
Now you can just this threshold to give your prediction, like this:
Another way of select a good precision/recall trade-off is to plot precision directly against recall.
And again, you can generate this graph like this,
After We have got threshold (returned from the function), predictions are done just as same way as shown in code block above.