Cost-sensitive hybrid neural networks for heterogeneous and imbalanced data

Abstract

Analyzing accumulated data has recently attracted huge attention for its ability to generate values by identifying useful information and providing an edge in global business competition. However, heterogeneous data and imbalanced class distribution present two major challenges to machine learning with real-world business data. Traditional machine learning algorithms can typically only be applied to standard data sets, which are normally homogeneous and balanced. These algorithms narrow complex data into a homogeneous, a balanced data space an inefficient process that requires a significant amount of pre-processing. In this paper, we focus on an efficient solution to the challenges with heterogeneous and imbalanced data sets that does not require pre-processing. Our approach comprises a novel, unified, end-to-end cost-sensitive hybrid neural network that learns real-world heterogeneous data via a parallel network architecture. A specifically-designed cost-sensitive matrix then automatically generates a robust model for learning minority classifications. And the parameters of both the cost-sensitive matrix and the hybrid neural network are alternately but jointly optimized during training. The results of comparative experiments on six real-world data sets reflecting actual business cases, including insurance fraud detection and mobile customer demographics, indicate that the proposed approach demonstrates superior performance over baseline procedures.

Publication
2018 International Joint Conference on Neural Networks (IJCNN) - 2012 Proceedings
Shirui Pan
Shirui Pan
Professor | ARC Future Fellow

My research interests include data mining, machine learning, and graph analysis.