Weakly Supervised Segmentation Improves The Estimate of The Choroid Plexus Volume Application To Multiple Sclerosis
Weakly Supervised Segmentation Improves The Estimate of The Choroid Plexus Volume Application To Multiple Sclerosis
1 Introduction
The Choroid Plexus (ChP) is a vascular structure located inside the brain ven-
tricles. It is responsible for many brain functions, like the maintenance of brain
2 Valentina Visani et al.
clearance [1, 24], the regulation of inflammatory processes [13] and the produc-
tion of Cerebral Spinal fluid [5]. Thanks to its high contrast and resolution,
Magnetic Resonance Imaging (MRI) is one of the most suited imaging modal-
ities to visualize the ChP in-vivo. The ChP delineation and the quantification
of the ChP volume (ChPV) might help in discovering pathological alterations
that track inflammatory patterns in neurodegenerative disorders like Multiple
Sclerosis (MS) [8]. The gold-standard technique to depict the ChP is the manual
segmentation, which will be referred to as ground truth (GT), usually performed
by neuroradiologists employing T1 weighted (T1-w) MRI images [31, 22, 30]. Un-
fortunately, this technique is time-consuming and scarcely reproducible within
and between operators due to the complex morphology of the ChP. To overcome
these negative aspects, automatic tools like FreeSurfer [7] and Gaussian Mixture
Model [25] have been proposed in the literature although with unsatisfactory
performances.
The new state-of-the-art for medical image segmentation involves the uti-
lization of deep learning techniques. However, deep learning approaches widely
employed for annotating medical data face significant challenges, primarily stem-
ming from the substantial data requirements and label discrepancies that arise
when multiple annotators, including experts, are involved [11]. To address these
limitations, weakly supervised learning has been introduced to construct predic-
tive models with fewer stringent assumptions on reference annotations [34]. In
this work, our goal is to investigate whether weakly supervised training relying
on non-expert data annotators [26] can be used to improve the ChP segmentation
task. In this paper, we simulate the output of non-expert humans by creating
weak GTs using three different methods.
The main contributions of this paper are as follows: 1. We propose a reliable
and easily reproducible test protocol in which training and test sets are obtained
using different MRI scanners. 2. We report several methods for creating weak GT
to simulate the masks created by humans that merely coarsely select the areas
they consider as foreground. 3. We compare different methods for handling the
weak GTs we have created, thus creating a test-bed in which it is possible to
compare fairly.
The remainder of the paper is structured as follows: Section 2 provides some
related work on weakly supervised segmentation approaches. Section 3 describes
the Choroid Plexus segmentation problem and the proposed approach. In Section
4, we provide a thorough evaluation of our proposed system. Finally, the last
Section concludes this work and provides some further research on this topic.
2 Related Work
segmentation offers some relief from this challenge, it still struggles to deliver
practical results due to the absence of crucial information like object positions
and edges. A balance between these approaches comes from the weak supervi-
sion that garnered significant attention from researchers and has made notable
advancements in recent years [34].
WSSS methods leverage various forms of limited supervision, each of which
provides varying degrees of annotation granularity. These forms of supervision
include image-level labels [12, 14], where entire images are categorized without
specifying individual object boundaries. Additionally, bounding boxes [33, 15]
offer a more localized form of supervision by delineating the approximate bound-
aries of objects or regions of interest within the image. Point annotations [2, 10]
pinpoint specific key points or landmarks within the image, providing even finer-
grained guidance. Finally, scribbles [16, 27] offer a semi-structured approach, al-
lowing annotators to draw loose outlines or regions of interest, which can be less
time-consuming than pixel-level annotation but still provide valuable cues for
segmentation algorithms. In this paper, we focus on inaccurate bounding boxes
to simulate non-expert human annotators. For instance, students or users who
are familiar with the task but possess a non-expert level of knowledge on the
subject.
3.1 Dataset
Data were provided by the Multiple Sclerosis Centre of the University Hospital
of Verona and were acquired using two different MRI scanners, which will be
named Scanner 1 and Scanner 2. The total number of collected scans was 128.
All subjects (age 36.7 ± 10.1 years) gave their written informed consent to the
processing of the data collected. All procedures were performed in accordance
with the Declaration of Helsinki (2008). The local Ethical Committee approved
the study protocols. Scanner 1 dataset is composed of 61 MS patients. T1-w MRI
images were acquired on a Philips Elition S with 32-channels head coil (Software
version R5.7.2.1). Parameters of 3D T1-w MPRAGE sequences were: resolution:
1x1x1 mm; compressed SENSE acceleration factor: 4; TE/TR: 3.7/8.4 ms; FA:
8°; total acquisition time: 3min 20s. Scanner 2 dataset is composed of 67 subjects:
24 healthy controls and 43 MS patients. T1-w MRI images were acquired on
a Philips Achieva TX with 8-channels head coil (Software version R3.2.3.2).
Parameters of 3D T1-w MPRAGE sequences were: resolution 1x1x1 mm; SENSE
acceleration factor: 2.5; TE/TR: 3.7/8.4 ms; FA: 9°; total acquisition time: 4min
50s. Images have not been pre-processed to correct local inhomogeneities or other
artifacts and were used as they were acquired. Figure 1 shows a representative
subject for each scanner to highlight the intrinsic differences caused by the MRI
scanner hardware and software in the acquired images.
4 Valentina Visani et al.
Fig. 1: Figure reports the coronal (left) and axial (right) views of T1-w MRI
images for a representative subject for each of the two scanners (Scanner 1,
Scanner 2). The red portion is the Choroid Plexus (ChP) manually depicted by
experts.
Fig. 2: Example of (a) an image, (b) corresponding ground truth. Mask computed
using: (c) method A, (d) method B, and (e) method C.
Starting from the available images, we employ three distinct methods to
construct a weak training set. In the following, we describe each method:
1. Method A: for each image in the original dataset, we adopt the authentic
Region of Interest (ROI), which is the rectangle of the minimum area con-
taining the actual masks. We encapsulate the Ground Truth (GT) within a
rectangle and augment it by adding a 2-pixel frame.
2. Method B: for each image in the original dataset, we draw a random num-
ber rnd ∈ [−0.4, 0.4]. Subsequently, given the coordinates of the lower-left
point of the ROI rectangle (x, y) and its dimensions width and height, the
bounding box is shifted by y + height×rnd and x + width×rnd.
3. Method C: same as the previous method, but 33% of ROIs are drawn at
random. Some ROIs may not contain any foreground pixels at all.
Figure 2 shows an example of an image with the corresponding GT, and
the masks resulting from applying the three methods for building the weakly
training set.
3.6 Ensemble
Ensemble learning in machine learning refers to a powerful technique where
multiple models, often of diverse types or trained on different subsets of data,
are combined to improve overall predictive performance [17]. The idea behind
ensemble methods is that the collective decision of a group of models can be
more accurate and robust than that of individual models [4]. Popular ensemble
methods include bagging, boosting, and stacking. In this work, we focus on the
Weakly Supervised Segmentation Improves the ChP Volume Estimation 7
4 Experimental Results
We used the Dice score as a performance indicator, this is defined as: Dice =
|A∩B| 2·T P
|A|+|B| = 2·T P +F P +F N , where, TP, FP, and FN represent the true positives,
false positives, and false negatives, respectively. A corresponds to the predicted
mask, while B corresponds to the GT map.
In the first experiment, we do not use weak masks. Instead, we compare two
testing protocols to determine which one yields better performance: (1) Random:
a random split considering all the patterns, in this way there are samples of both
the scanner in the training set (92 subjects) and in the test set (36 subjects).
(2) LOS (Leave One Scanner out): all patient MRIs of a scanner belong to the
training set or test set; first, we use scanner 1 as the training set, then we use
scanner 2, in essence, we have two folds. LOS x-y means that we use scanner ’x’
for training and scanner ’y’ as the test set.
Baseline and Saliency (Bl. and Sal. respectively in the Table) improve the per-
formance of the trained network by using only the training set with GT both
when we use methods A and B to create the weak training set; on the other hand,
things change when we use method C to create the weak training set, in LOS
1-2 there is still a good improvement, but it is not so in LOS 2-1. Considering
the simplicity of “baseline”, this method is the one suggested; interestingly, even
the weak masks created with method C (thus very erroneous) do not unduly
affect performance. In our view, this suggests the feasibility of crafting weakly
supervised methods using bounding boxes generated by non-expert humans.
Table 2: Comparison between the different approaches. Bl. stands for Baseline,
Sal. stands for Saliency. Best performance in bold.
Random LOS 1-2 LOS 2-1
Epoch Bl. CAM Sal. Bl. CAM Sal. Bl. CAM Sal.
0 0.722 0.722 0.722 0.539 0.539 0.539 0.604 0.604 0.604
1 0.798 0.802 0.792 0.591 0.607 0.605 0.591 0.602 0.602
2 0.800 0.798 0.802 0.583 0.610 0.605 0.602 0.602 0.597
Method A
3 0.808 0.810 0.802 0.619 0.599 0.602 0.597 0.609 0.602
4 0.813 0.815 0.812 0.613 0.612 0.590 0.606 0.603 0.603
5 0.811 0.807 0.810 0.599 0.593 0.557 0.610 0.599 0.606
0 0.722 0.722 0.722 0.539 0.539 0.539 0.604 0.604 0.604
1 0.790 0.787 0.793 0.600 0.612 0.597 0.602 0.597 0.600
2 0.801 0.795 0.804 0.586 0.586 0.597 0.614 0.604 0.575
Method B
3 0.806 0.811 0.814 0.619 0.600 0.597 0.593 0.611 0.602
4 0.815 0.816 0.813 0.594 0.575 0.568 0.602 0.612 0.612
5 0.819 0.815 0.812 0.599 0.596 0.632 0.612 0.609 0.612
0 0.722 0.722 0.722 0.539 0.539 0.539 0.604 0.604 0.604
1 0.795 0.803 0.795 0.580 0.560 0.573 0.595 0.602 0.587
2 0.805 0.798 0.810 0.609 0.604 0.623 0.609 0.606 0.605
Method C
3 0.806 0.804 0.800 0.570 0.624 0.605 0.605 0.608 0.610
4 0.819 0.810 0.807 0.622 0.617 0.611 0.612 0.584 0.607
5 0.802 0.806 0.812 0.585 0.595 0.610 0.604 0.601 0.595
Conclusion
In this study, we introduce a novel iterative weakly supervised training approach
for semantic segmentation using bounding box annotations. Despite their limita-
tions in spatial detail, our method achieves competitive performance akin to fully
supervised models trained on segmentation masks. Our approach significantly
reduces annotation costs, a key challenge in semantic segmentation research, by
allowing flexible utilization of bounding box annotations. We highlight the im-
portance of data augmentation [3] in enhancing the effectiveness of bounding
box annotations, maximizing their utility, improving model generalization, and
mitigating overfitting. With our approach, researchers have the option to use
bounding box annotations as a source of supervision or annotate segmentation
masks for only a small fraction of images while training models on a mix of GT
masks and bounding box annotations.
Weakly Supervised Segmentation Improves the ChP Volume Estimation 9
References
1. Balusu, S., Brkic, M., Libert, C., Vandenbroucke, R.E.: The choroid plexus-
cerebrospinal fluid interface in alzheimer’s disease: more than just a barrier. Neural
Regeneration Research 11(4), 534–537 (6)
2. Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L.: What’s the point: Semantic
segmentation with point supervision. In: European conference on computer vision.
pp. 549–565. Springer (2016)
3. Bravin, R., Nanni, L., Loreggia, A., Brahnam, S., Paci, M.: Varied image data
augmentation methods for building ensemble. IEEE Access 11, 8810–8823 (2023)
4. Cornelio, C., Donini, M., Loreggia, A., Pini, M.S., Rossi, F.: Voting with random
classifiers (VORACE): theoretical and experimental analysis. Autonomous Agents
and Multi-Agent Systems 35(2), 22 (2021)
5. Damkier, H.H., Brown, P.D., Praetorius, J.: Cerebrospinal fluid secretion by the
choroid plexus. Physiological Reviews 93(4), 1847–1892 (2013)
6. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The
PASCAL visual object classes (VOC) challenge. International Journal of Computer
Vision 88, 303–338 (2010)
7. Fischl, B.: Freesurfer. NeuroImage 62(2), 774–781 (2012)
8. Fleischer, V., Gonzalez-Escamilla, G., Ciolac, D., Albrecht, P., et al.: Translational
value of choroid plexus imaging for tracking neuroinflammation in mice and hu-
mans. Proceedings of the National Academy of Sciences 118(36) (2021)
9. Hou, X., Harel, J., Koch, C.: Image signature: Highlighting sparse salient regions.
IEEE Transactions on Pattern Analysis and Machine Intelligence 34(1), 194–201
(2012)
10. Jiang, P.T., Yang, Y., Hou, Q., Wei, Y.: L2g: A simple local-to-global knowledge
transfer framework for weakly supervised semantic segmentation. In: Proceedings
of the IEEE/CVF conference on computer vision and pattern recognition. pp.
16886–16896 (2022)
11. Kahneman, D., Sibony, O., Sunstein, C.R.: Noise: a flaw in human judgment. Little,
Brown Spark (2021)
12. Kolesnikov, A., Lampert, C.H.: Seed, expand and constrain: Three principles for
weakly-supervised image segmentation. In: Computer Vision–ECCV 2016: 14th
European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Pro-
ceedings, Part IV 14. pp. 695–711. Springer (2016)
13. Lassmann, H.: Pathogenic mechanisms associated with different clinical courses of
multiple sclerosis. Frontiers in Immunology 9 (2019)
14. Li, J., Fan, J., Zhang, Z.: Towards noiseless object contours for weakly supervised
semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition. pp. 16856–16865 (2022)
15. Li, Q., Arnab, A., Torr, P.H.: Weakly-and semi-supervised panoptic segmentation.
In: Proceedings of the European conference on computer vision (ECCV). pp. 102–
118 (2018)
16. Lin, D., Dai, J., Jia, J., He, K., Sun, J.: Scribblesup: Scribble-supervised convolu-
tional networks for semantic segmentation. In: Proceedings of the IEEE conference
on computer vision and pattern recognition. pp. 3159–3167 (2016)
17. Nanni, L., Fantozzi, C., Loreggia, A., Lumini, A.: Ensembles of convolutional neural
networks and transformers for polyp segmentation. Sensors 23(10) (2023)
18. Nanni, L., Loreggia, A., Barcellona, L., Ghidoni, S.: Building ensemble of deep net-
works: Convolutional networks and transformers. IEEE Access 11, 124962–124974
(2023)
10 Valentina Visani et al.
19. Nanni, L., Loreggia, A., Brahnam, S.: Comparison of different methods for building
ensembles of convolutional neural networks. Electronics 12(21), 4428 (2023)
20. Nanni, L., Lumini, A., Fantozzi, C.: Exploring the potential of ensembles of deep
learning networks for image segmentation. Information 14(12), 657 (2023)
21. Nanni, L., Lumini, A., Loreggia, A., Formaggio, A., Cuza, D.: An empirical study
on ensemble of segmentation approaches. Signals 3(2), 341–358 (2022)
22. Schmidt-Mengin, M., Ricigliano, V.A.G., Bodini, B., et al.: Axial multi-layer per-
ceptron architecture for automatic segmentation of choroid plexus in multiple scle-
rosis. In: Išgum, I., Colliot, O. (eds.) Medical Imaging 2022: Image Processing.
SPIE (apr 2022)
23. Senay, O., Seethaler, M., Makris, N., et al.: A preliminary choroid plexus volumet-
ric study in individuals with psychosis. Human Brain Mapping 44(6), 2465–2478
(2023)
24. Spector, R., Keep, R., Snodgrass, R., Smith, Q., Johanson, C.: A balanced view
of choroid plexus structure and function: Focus on adult humans. Experimental
neurology 267, 78–86 (03 2015)
25. Tadayon, E., Moret, B., Sprugnoli, G., Monti, L., Pascual-Leone, A., Santarnecchi,
E.f.t.A.D.N.I.: Improving choroid plexus segmentation in the healthy and diseased
brain: Relevance for tau-pet imaging in dementia. Journal of Alzheimer’s Disease
74(4), 1057–1068 (2020)
26. Tinati, R., Luczak-Roesch, M., Simperl, E., Hall, W.: An investigation of player
motivations in eyewire, a gamified citizen science project. Computers in Human
Behavior 73, 527–540 (2017)
27. Vernaza, P., Chandraker, M.: Learning random-walk label propagation for weakly-
supervised semantic segmentation. In: Proceedings of the IEEE conference on com-
puter vision and pattern recognition. pp. 7158–7166 (2017)
28. Vinogradova, K., Dibrov, A., Myers, G.: Towards interpretable semantic segmenta-
tion via gradient-weighted class activation mapping. CoRR abs/2002.11434 (2020)
29. Visani, V., Natale, V., Colombi, A., Tamanti, A., Bertoldo, A., Marjin, C., Pizzini,
F.B., Calabrese, M., Castellaro, M.: The ensemble of optimized deep learning neu-
ral networks improves the estimate of the choroid plexus volume: application to
multiple sclerosis. In Proceedings of the Annual Meeting of ISMRM, Toronto,
Canada, p. 812 (2023)
30. Visani, V., Pizzini, F.B., Natale, V., Tamanti, A., Bertoldo, A., Calabrese, M.,
Castellaro, M.: Choroid plexus volume in multiple sclerosis can be estimated on
structural mri avoiding contrast injection. medRxiv (2023)
31. Yazdan-Panah, A., Schmidt-Mengin, M., Ricigliano, V.A., Soulier, T., Stankoff,
B., Colliot, O.: Automatic segmentation of the choroid plexuses: Method and val-
idation in controls and patients with multiple sclerosis. NeuroImage: Clinical 38,
103368 (2023)
32. Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig,
G.: User-guided 3d active contour segmentation of anatomical structures: Signifi-
cantly improved efficiency and reliability. NeuroImage 31(3), 1116–1128 (2006)
33. Zhou, T., Zhang, M., Zhao, F., Li, J.: Regional semantic contrast and aggregation
for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. pp. 4299–4309 (2022)
34. Zhu, K., Xiong, N.N., Lu, M.: A survey of weakly-supervised semantic segmenta-
tion. In: 2023 IEEE 9th Intl Conference on Big Data Security on Cloud (BigDataSe-
curity), IEEE Intl Conference on High Performance and Smart Computing,(HPSC)
and IEEE Intl Conference on Intelligent Data and Security (IDS). pp. 10–15. IEEE
(2023)