The I/O bottleneck issue has been acknowledged as one of main performance issues of high performance computing (HPC) systems for data-intensive scientific applications, and has attracted intensive studies in recent years. With the enlarging gap between the computing bandwidth and I/O bandwidth in projected next-generation HPC systems, this issue will become even worse. In this paper, we present a novel decoupledI/O to address the fundamental I/O bottleneck issue. The decoupled I/O is a software stack including MPI extensions, compiler improvements, and runtime library support, based one decoupled HPC system architecture. It allows users to treat the computing of data-intensive operations and the traditionalI/O operation as an ensemble and offload them into dedicateddata nodes, which are near to the data source, to reduce the overhead of data movement and improve the I/O bandwidth usage. The decoupled I/O is user-friendly and requires littlechanges in application codes. Experiments were conducted to evaluate the performance of the decoupled I/O, and the results show that it outperforms existing solutions (such as active storage I/O) and provides an attractive I/O solution for data-intensive high performance computing.