Web services such as online banking, gaming, and social networking have rapidly evolved as has the reliance upon them by people to perform everyday tasks. As a result, a large amount of information is uploaded on a daily basis to the Web. The openness of the Web exposes opportunities for criminals to upload malicious content. Despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing hosts. The paper describes an approach that classifies URLs automatically based on their lexical and host-based features. The usability of Mahout is demonstrated for such scalable machine learning problems, and online learning is considered over batch learning. The classifier achieves 93-97% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. The raw data is examined, and the effectiveness of various feature subsets is assessed. The relevance of bigrams is assessed, and strengthened by using the chi-squared and information gain attribute evaluation methods.