Hyperbolic secant representation of the logistic function: Application to probabilistic multiple instance learning for CT intracranial hemorrhage detection

There has long been a problem of “weak supervision” in the field of artificial intelligence, where only part of the labels are observable in the training data, while the remaining labels are unknown. Multiple Instance Learning (MIL) is a paradigm to address this issue. In MIL, the training data is divided into multiple “bags”, each containing multiple instances. We can only observe the label of each bag but not the specific label of each instance. The goal of MIL is to predict the labels of new bags and their contained instances based on the labels of the bags.

The MIL paradigm has been widely applied in various scientific fields, particularly excelling in the field of medical imaging. This paper focuses on a practical medical problem – Intracranial Hemorrhage (ICH) detection. In this problem, a CT scan is considered as a bag, and each slice of the scan is an instance. If at least one slice shows evidence of bleeding, the entire scan is labeled as positive (diseased); otherwise, it is labeled as negative (normal). We can only observe the label of each scan but not the specific label of each slice. MIL can greatly reduce the workload of radiologists since they only need to label each scan once, rather than labeling each slice individually.

Probabilistic MIL methods have received widespread attention in recent years, among which Gaussian Process (GP)-based methods have shown superior performance because they can not only express complex models but also quantify uncertainty. One of the most successful GP-MIL methods is VGPMIL, which employs Variational Inference to handle the mathematical intractability brought by the Logistic function. Recent research has found that this method suffers from performance degradation in practice.

In this paper, the authors propose a new, equivalent, and tractable form of the Logistic observation model using a technique called Pólya-Gamma variables. Based on this, they reformulate the VGPMIL model into the PG-VGPMIL model. Interestingly, the authors find that the update equations of PG-VGPMIL during variational inference are exactly the same as those of the original VGPMIL. The root cause of this phenomenon lies in the two equivalent representations of the hyperbolic secant density: one is a super Gaussian form, and the other is a Gaussian Scale Mixture (GSM) form. VGPMIL utilizes the former representation, while PG-VGPMIL utilizes the latter.

Further analysis reveals that VGPMIL/PG-VGPMIL is actually a special case of a more general framework, ψ-VGPMIL, which is obtained by replacing the hyperbolic secant density with an arbitrary differentiable GSM density ψ. Based on this, the authors propose using the Gamma density instead of the PG density, leading to a new model called G-VGPMIL.

Experiments on multiple datasets (including a controlled experiment set MNIST, two MIL benchmark datasets MUSK, and a real-world ICH detection dataset RSNA and CQ500) demonstrate that G-VGPMIL outperforms the original VGPMIL in terms of both prediction performance and training efficiency, and it also outperforms most other methods on the ICH detection task. This result validates the effectiveness of the proposed method and provides useful insights for further research in this field.

The main contributions of this paper include: 1) introducing Pólya-Gamma variables into the MIL domain; 2) discovering that PG-VGPMIL is an equivalent form of VGPMIL; 3) proposing a more general ψ-VGPMIL framework; 4) proposing a new G-VGPMIL model using the Gamma density as an example; 5) validating the superior performance of G-VGPMIL on multiple datasets. This work not only expands the theoretical foundation of the MIL field but also provides an efficient solution for practical applications such as ICH detection.