- Buja, A., Stuetzle, W., & Shen, Y. (2005). Loss functions for binary class probability estimation and classiﬁcation: Structure and applications. https://pdfs.semanticscholar.org/d670/6b6e626c15680688b0774419662f2341caee.pdf
- Reid, M. D., & Williamson, R. C. (2009). Surrogate regret bounds for proper losses. Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, 1–8. https://doi.org/10.1145/1553374.1553489
This paper helped me understand the properties of “proper losses.”
- A proper loss is always associated with a convex function, which is the point-wise Bayes risk.
The loss can be reconstructed from the point-wise Bayes risk as the Bregman divergence.
- Proper losses are surrogates for 0-1-c loss (for any c).
- Some margin losses are not proper, e.g., hinge loss.
- Hinge loss satisfies a surrogating property specifically for c=0.5
- weighted zero-one-like loss, and point-wise
Class conditional estimation and density ratio estimation
- Menon, A. K., Ong, C. S., Menon, A., & Ong, C. (2016). Linking losses for density ratio and class-probability estimation. 10.
- Minimizing proper loss for classification corresponds to minimizing Bregman divergence for density ratio estimation.