The objectives of segmentation network and classification network in two-step defect detection model are inconsistent, resulting in the low coupling between them, and the error accumulated in segmentation network further weakens the classification network performance. To address these problems, a joint optimization model for defect detection is proposed, named MADD-Net, which can simultaneously predict both the location and category of defects based on the attention mechanism. Firstly, the segmentation network fuses the shallow and deep features to extract more information based on the mixed attention feature fusion module. Then, the classification network captures more discriminative features based on the multi-receptive-field spatial attention module. Finally, the segmentation and classification networks are trained simultaneously via the joint optimization objective. Extensive experiments are conducted on various public industrial defect detection datasets (DAGM 2007, MAGNETIC-TILE, and KolektorSDD2) based on PyTorch framework, and the proposed method achieves superior performance. The accuracy of this algorithm is up to 28.02% higher than that of piece-wise algorithm and 8.3% higher than that of U-Net-like algorithm. The precision, recall and -score are also better than other state-of-the-art models, which has better detection performance.