Although seasonal influenza disease spread is a spatio-temporal phenomenon, public surveillance systems aggregate data only spatially, and are rarely predictive. We develop a hierarchical clustering-based machine learning tool to anticipate flu spread patterns based on historical spatio-temporal flu activity, where we use historical influenza-related emergency department records as a proxy for flu prevalence. This analysis replaces conventional geographical hospital clustering with clusters based on both spatial and temporal distance between hospital flu peaks to generate a network illustrating whether flu spreads between pairs of clusters (direction) and how long that spread takes (magnitude). To overcome data sparsity, we take a model-free approach, treating hospital clusters as a fully-connected network, where arcs indicate flu transmission. We perform predictive analysis on the clusters' time series of flu ED visits to determine direction and magnitude of flu travel. Detection of recurrent spatio-temporal patterns may help policymakers and hospitals better prepare for outbreaks. We apply this tool to Ontario, Canada using a five-year historical dataset of daily flu-related ED visits, and find that in addition to expected flu spread between major cities/airport regions, we were able to illuminate previously unsuspected patterns of flu spread between non-major cities, providing new insights for public health officials. We showed that while a spatial clustering outperforms a temporal clustering in terms of the direction of the spread (81% spatial v. 71% temporal), the opposite is true in terms of the magnitude of the time lag (20% spatial v. 70% temporal).