Türk Medline
ADR Yönetimi
ADR Yönetimi

COMPARATIVE PERFORMANCE ANALYSIS OF DEEP LEARNING ARCHITECTURES FOR PULMONARY NODULE CANDIDATE CLASSIFICATION: A COMPUTATIONAL STUDY USING PUBLIC BENCHMARK DATASETS

Koray Kaya KILIÇ, Nilay ÇAVUŞOĞLU YALÇIN, Mustafa YALÇIN, Nevfel KAHVECİOĞLU

Journal of Medicine and Palliative Care - 2026;7(2):363-368

Department of Radiology, Antalya Training and Research Hospital, Antalya, Turkiye

 

Aims: To systematically compare the candidate-level classification performance of four state-of-the-art deep learning architectures-U-Net, ResNet-50, Vision Transformer (ViT-B/16), and YOLOv8-for pulmonary nodule candidate classification using patch-based classification on public benchmark CT datasets, and to evaluate the trade-off between detection accuracy and computational efficiency. Methods: This computational study utilized publicly available, de-identified datasets including LUNA16 (Lung Nodule Analysis 2016) and LIDC-IDRI (Lung Image Database Consortium). Candidate nodule locations were generated using a multi-scale Laplacian of Gaussian (LoG) blob detector applied to full CT volumes. From these candidates, 64x64x64 voxel patches were extracted and classified as true nodules or false positives. The dataset was partitioned at the patient level: 70% training, 15% validation, and 15% held-out test. Stratified 5-fold cross-validation was conducted exclusively within the training set for hyperparameter optimization. Four deep learning architectures were trained under identical protocols: U-Net (encoder-decoder), ResNet-50 (residual CNN), ViT-B/16 (self-attention transformer, adapted to 3D patch input via 3D patch embedding), and YOLOv8 (real-time detector, applied slice-by-slice with 3D aggregation). Primary performance metrics included sensitivity, specificity, F1-score, mAP@0.5, and area under the ROC curve (AUC). Free-response ROC (FROC) analysis was performed following LUNA16 challenge standards, reporting sensitivity at 0.125, 0.25, 0.5, 1, 2, 4, and 8 false positives per scan (FP/scan). Statistical comparisons focused on AUC using paired DeLong's test with Bonferroni correction for multiple comparisons. Bootstrap confidence intervals (n=2,000 resamples) were computed for sensitivity, specificity, and F1-score. Results: Across 888 CT scans (1,186 annotated nodules; LUNA16 test set: 133 scans, 178 nodules), Vision Transformer achieved the highest candidate-level patch classification performance: sensitivity 94.2% (95% CI: 91.8-96.1%), specificity 92.8% (95% CI: 90.3-94.9%), F1-score 0.935, mAP@0.5 0.947, and AUC 0.971 (95% CI: 0.958-0.982). Pairwise AUC comparisons using DeLong's test confirmed superior discrimination for ViT-B/16 relative to the comparator architectures. FROC analysis demonstrated ViT-B/16 achieved the highest mean sensitivity at 7 operating points (CPM=0.847), outperforming ResNet-50 (CPM=0.798), YOLOv8 (CPM=0.781), and U-Net (CPM=0.762). However, ViT-B/16 required 3.2x longer inference time (8.4 vs 2.6 seconds/scan) and 3.7x more trainable parameters than ResNet-50. YOLOv8 demonstrated superior computational efficiency with the shortest inference time (1.1 seconds/scan). Conclusion: The attention-based Vision Transformer architecture achieved superior candidate-level patch classification performance for pulmonary nodule candidate evaluation; however, this advantage must be weighed against substantial computational costs. Architecture selection should be guided by deployment context, with ResNet-50 offering optimal accuracy-efficiency balance for clinical deployment and YOLOv8 for real-time screening applications.