Reading notes on loss functions.

Proper loss

Reference

  • Buja, A., Stuetzle, W., & Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications. https://pdfs.semanticscholar.org/d670/6b6e626c15680688b0774419662f2341caee.pdf
  • Reid, M. D., & Williamson, R. C. (2009). Surrogate regret bounds for proper losses. Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, 1–8. https://doi.org/10.1145/1553374.1553489

This paper helped me understand the properties of “proper losses.”

TL;DR

  • A proper loss is always associated with a convex function, which is the point-wise Bayes risk.
  • The loss can be reconstructed from the point-wise Bayes risk as the Bregman divergence.

  • Proper losses are surrogates for 0-1-c loss (for any c).
  • Some margin losses are not proper, e.g., hinge loss.
    • Hinge loss satisfies a surrogating property specifically for c=0.5
  • weighted zero-one-like loss, and point-wise

Class conditional estimation and density ratio estimation

Reference

  • Menon, A. K., Ong, C. S., Menon, A., & Ong, C. (2016). Linking losses for density ratio and class-probability estimation. 10.

TL;DR

  • Minimizing proper loss for classification corresponds to minimizing Bregman divergence for density ratio estimation.