# Reading notes on loss functions.

## Proper loss

### Reference

- Buja, A., Stuetzle, W., & Shen, Y. (2005). Loss functions for binary class probability estimation and classiﬁcation: Structure and applications. https://pdfs.semanticscholar.org/d670/6b6e626c15680688b0774419662f2341caee.pdf
- Reid, M. D., & Williamson, R. C. (2009). Surrogate regret bounds for proper losses. Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, 1–8. https://doi.org/10.1145/1553374.1553489

This paper helped me understand the properties of “proper losses.”

### TL;DR

- A proper loss is always associated with a convex function, which is the point-wise Bayes risk.
The loss can be reconstructed from the point-wise Bayes risk as the Bregman divergence.

- Proper losses are surrogates for 0-1-c loss (for any c).
- Some margin losses are not proper, e.g., hinge loss.
- Hinge loss satisfies a surrogating property specifically for c=0.5

- weighted zero-one-like loss, and point-wise

## Class conditional estimation and density ratio estimation

### Reference

- Menon, A. K., Ong, C. S., Menon, A., & Ong, C. (2016). Linking losses for density ratio and class-probability estimation. 10.

### TL;DR

- Minimizing proper loss for classification corresponds to minimizing Bregman divergence for density ratio estimation.