Convolutional neural networks (CNNs) have been established for a comprehensive range of computer vision problems across several benchmarks. Visualization and analysis of feature maps generated by convolutional layers can be an effective approach to explore the hidden and complex characteristic of a CNN model. Convolutional layers provide diverse feature maps however, the extent of this diversity needs to be explored. This research attempts to provide five insights of the ‘Black box’ mechanism of CNNs, using skin cancer dermoscopy and lung scan computed tomography (CT) Scan datasets by statistically analyzing layer by layer (three convolutional layers) feature maps using 17 geometrical and 6 intensity-based features to determine the characteristics and level of diversity. Significance and difference of the feature maps layer by layer, black feature maps analysis, difference of the feature maps to each other and to the original image, variations among the feature maps when running the model multiple times and inter-class variation among the feature maps for different iteration are explored. Various statistical methods including T-test, analysis of variance (ANOVA), mean, median, mean squared error (MSE), peak signal to noise ratio (PSNR), structural similarity index (SSIM), root mean squared error (RMSE), dice similarity score (DSC), universal image quality index (UQI) and Spectral angle mapper (SAM) are employed. Experimental results show that for the skin cancer dermoscopy dataset, a large number of black feature maps are produced (20–60%) while the proportion of black feature maps for the CT Scan dataset is comparatively low (2–20%). This demonstrates that for different datasets, feature maps with diverse characteristics can be produced. The layer by layer differences between the feature maps is evaluated using T-tests and ANOVA for seventeen geometrical features and six intensity-based features. For both datasets across most of the geometrical features and across most of the intensity-based features a significant diversity can be observed. The difference of the feature maps to each other and to the original image is quite high, with MSE values for the dermoscopy and CT Scan datasets in the range of 1860–31,399 and 171–6089, respectively, PSNR 3–15 and 10–25, SSIM values of 0.01–0.84 and 0.3–0.81, RMSE values of 0.81–1 and 0.21–1, DSC values of 0.37–0.53 and 0.47–0.75, UQI values of 0.02–0.86 and 0.01–0.88 and SAM values of 0.12–1.53 and 0.19–1.55 for the dermoscopy and CT Scan datasets respectively. When running the model multiple times (three iterations), a notable iteration by iteration diversity is found in terms of mean, median, maximum and minimum values for most of the geometrical features. The inter-class variation among the feature maps for different iterations and layers are evaluated based on the F-value of the ANOVA test. For the dermoscopy dataset, the highest mean F-value is found for layer 1 and iteration 3 while for the CT scan dataset the highest mean F-value is found for layer 3 and iteration 3 indicating that for these feature maps the highest inter-class dissimilarity is generated. The findings of this study may aid in exploring the complex mechanism of convolutional layers, kernels and feature maps.