Affective computing is an emerging research area which provides insights on human’s mental state through human-machine interaction. During the interaction process, bio-signal analysis is essential to detect human affective changes. Currently, machine learning methods to analyse bio-signals are the state of the art to detect the affective states, but most empirical works mainly deploy traditional machine learning methods rather than deep learning models due to the need for explainability. In this paper, we propose a deep learning model to process multimodal-multisensory bio-signals for affect recognition. It supports batch training for different sampling rate signals at the same time, and our results show significant improvement compared to the state of the art. Furthermore, the results are interpreted at the sensor- and signal- level to improve the explainaibility of our deep learning model.