Proper loss

Reference

• Buja, A., Stuetzle, W., & Shen, Y. (2005). Loss functions for binary class probability estimation and classiﬁcation: Structure and applications. https://pdfs.semanticscholar.org/d670/6b6e626c15680688b0774419662f2341caee.pdf
• Reid, M. D., & Williamson, R. C. (2009). Surrogate regret bounds for proper losses. Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, 1–8. https://doi.org/10.1145/1553374.1553489

This paper helped me understand the properties of “proper losses.”

TL;DR

• A proper loss is always associated with a convex function, which is the point-wise Bayes risk.
• The loss can be reconstructed from the point-wise Bayes risk as the Bregman divergence.

• Proper losses are surrogates for 0-1-c loss (for any c).
• Some margin losses are not proper, e.g., hinge loss.
• Hinge loss satisfies a surrogating property specifically for c=0.5
• weighted zero-one-like loss, and point-wise

Class conditional estimation and density ratio estimation

Reference

• Menon, A. K., Ong, C. S., Menon, A., & Ong, C. (2016). Linking losses for density ratio and class-probability estimation. 10.

TL;DR

• Minimizing proper loss for classification corresponds to minimizing Bregman divergence for density ratio estimation.