Chips equipped with numerous simple cores and heterogeneous computing resources have become mainstream in the present supercomputer system design. However, for many real-world scientific applications, off-The-shelf parallel models can't adapt to such architecture effectively, which leads to challenges of both designing program and exploiting system performance. To solve this problem, a fine-grained and event-driven program execution model, Codelet, is proposed, which is based on the data flow method. By providing a runtime support between system interfaces and Codelet-based applications, fine-grained parallelism can be exploited and high utilization of computing resources can be obtained. Therefore, in this paper, we design and implement a dataflow-based runtime support, SunwayFlow, on a 100P actual system-The Sunway TaihuLight, the supercomputer system with the highest computing performance in the world so far, to provide a user-friendly and promising solution to utilize this supercomputer fully. To evaluate the efficiency of SunwayFlow, we choose HPCG as the case study and refactor it onto SunwayFlow. We rewrite main computing kernels of HPCG carefully, especially the most time-consuming and intricate one, the symmetric Gauss-Seidel relaxation function, where a speedup of 11.79X is achieved. Moreover, the whole HPCG performance reaches 2.47 GFlops on a single core group and 534.98 GFlops on 256 core groups.