Georeferenced social media data streams (social media geostreams) are providing promising opportunities to gain new insights into spatiotemporal aspects of human interactions on cyber space and their relation with real-world activities. In particular, such opportunities are motivating public health researchers to improve the surveillance of disease epidemics by means of spatiotemporal analysis of social media geostreams. One essential requirement in achieving such geostream-based disease surveillance is to establish scalable data infrastructures capable of real-time transformation of massive geostreams into spatiotemporally organized data to which analytical methods are readily applicable. To fulfill this requirement, this study develops a data pipeline solution where multiple computational components are integrated to collect, process, and aggregate social media geostreams in near real time. As a test case, this solution focuses on one well-known social media geostream, the Twitter data stream, and one type of disease epidemics, the flu. The pipeline solution facilitates multiscale spatiotemporal analysis of flu risks by collecting geotagged tweets from the Twitter Streaming API, identifying flu-related tweets through keyword match, aggregating tweets at multiple spatial granularities in near real time, and storing tweets and the aggregate statistics in a distributed NoSQL database. Although developed for the surveillance of flu epidemics, the pipeline would serve as a general framework for building scalable data infrastructures that can support real-time spatiotemporal analysis of social media geostreams in the application domains beyond disease mapping and public health.