The data growth from many applications in clouds poses significant challenges to cloud storage systems. To deliver the best storage and I/O performance possible, it is often required to understand and leverage the I/O characteristics based on data accesses. A number of research studies have been carried out on this topic. However, most of them either utilize a limited number of data-access attributes, restricting the general applicability of the method for different applications, or heavily rely on the domain knowledge or expertise about applications' I/O behaviors to select the best representative features, introducing bias for certain workloads. To overcome these limitations, in this study, we present a new I/O characteristic discovery methodology. This method enables capturing data-access features as many as possible to eliminate human bias. It utilizes a machine-learning based strategy to derive the most important set of features automatically, and groups data objects with a clustering algorithm (DBSCAN) to reveal I/O characteristics discovered. These I/O characteristics revealed can direct I/O performance optimizations in numerous scenarios, such as in data prefeteching and data reorganization optimizations in cloud storage systems.