Universality of invertible neural networks

Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have found various machine learning applications such as probabilistic modeling and feature extraction. However, their attractive properties come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions?

Download slides (Powered by https://azu.github.io/slide-pdf.js)

Coupling-based invertible neural networks are universal diffeomorphism approximators

In this research, we developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. More specifically, the framework allows us to show the universal approximation properties of INNs for approximating a large class of diffeomorphisms. We applied the framework to Coupling-Flow-based INNs (CF-INNs), one of the first-choice INN architectures, and elucidated their high representation power.

Universal approximation property of neural ordinary differential equations

We extended the framework to analyze Neural Ordinary Differential Equations (NODEs), another popular building block of INNs, and we showed their universal approximation property for a certain large class of diffeomorphisms.

Download slides (Powered by https://azu.github.io/slide-pdf.js)

Causality for Machine Learning

Humans care about the the causality underlying the world. But Why? Is it useful? (How?) How can the causal knowledge be useful for statistical learning and inference that are conventionally considered "non-causal"?

Download slides (Powered by https://azu.github.io/slide-pdf.js)

Few-shot domain adaptation by causal mechanism transfer

This work approaches the central question of domain adaptation (DA): "when is it possible?" DA is an approach to enable small-data learning by using some additional data sets which are related to but different from your own data. The central question in DA is "what do we assume on the relations of the data sets?" (the question of transfer assumption). To provide one possible answer, we focus on the concept of causal mechanisms behind the data, and consider the scenario that the different data sets share a common causal mechanism. Technically, we consider so-called structural causal models, and assume that the set of structural equations is invariant across different domains. In this setting, we propose causal mechanism transfer, a method to exploit the assumption to perform DA. We conducted solid theoretical analyses of the method and performed a proof-of-concept experiment to confirm its validity.

Incorporating causal graphical prior knowledge into predictive modeling via simple data augmentation

In real-world tasks, there are situations where the domain experts can provide a rough estimate of the underlying causal graph of the data variables. A causal graph is known to encode the conditional independence relations that should hold in the data, and this can be a strong prior knowledge for predictive tasks of machine learning when the data is scarce. However, how should we incorporate this knowledge into a predictive modeling? This work addresses the question by proposing a model-agnostic data augmentation method that can be combined with virtually any supervised learning methods.

Download slides (Powered by https://azu.github.io/slide-pdf.js)

Research on Restoring Clipped Low-rank Matrices

Clipping (a.k.a. censoring) is a salient type of measurement deficit that appears in various scientific areas such as biology and psychology, and it is known to obstruct the statistical analyses of the data. What can we do to mitigate this problem, especially after we have already made the measurements?

Download slides (Powered by https://azu.github.io/slide-pdf.js)

Clipped Matrix Completion: A Remedy for Ceiling Effects

We consider recovering a low-rank matrix from its clipped observations (clipped matrix completion; CMC). Matrix completion (MC) methods are known to provably recover a low-rank matrix from various information deficits such as random missingness. However, the existing theoretical guarantees for low-rank MC do not apply to clipped matrices because the deficit depends on the underlying values. Therefore, the feasibility of CMC is not trivial. In this paper, we first provide a theoretical guarantee for the exact recovery of CMC by using a trace-norm minimization algorithm, and we further propose practical CMC algorithms by extending ordinary MC methods. We also propose a novel regularization term tailored for CMC. We demonstrate the effectiveness of the proposed methods through experiments using both synthetic and benchmark data for recommendation systems.