Multi-modal medical image fusion has become a strong auxiliary technique that effectively combines normal tissue structure and abnormal alteration information to improve the efficiency of medical diagnosis. To address the shortcomings of space-domain fusion in dealing with image detail loss and spectral degradation, a two-channel frequency-domain multimodal medical image fusion method is proposed in the JBF domain of joint bilateral filter. This method decomposes the source image into two parts, structure and energy channels, to process the image texture detail information and edge intensity information, respectively. In the structure channel, the local gradient energy operator is obtained by improving the gradient energy, which further enhances the representation ability of small-scale detail information and the robustness to noise. In the energy channel, non-subsampled contourlet transform improves the multi-directional and multi-scale characteristics of the model. Meanwhile, the high-frequency sub-band processing framework that combining the local entropy detail enhancement operator and pulse coupled neural network is proposed to enhance the structural and detailed information in the energy channel. Experiments conducted on public dataset Atlas were compared with six representative frequency domain baselines, including methods based on MST, sparse representation, PCNN and JBF, etc. The results show that the similarity between the fused image and the source image is improved by 35.0%, the spatial frequency, edge intensity, and contrast ratio of fused images are improved by 16.2%, 12.5% and 11.2%, respectively. Meanwhile, the visual effect obtained by this method is also significantly better than others.