Here are some projects I'm working on.

Detecting extreme traffic events via a context augmented graph autoencoder

event map

Accurate and timely detection of large events on urban transportation networks enables informed mobility management. This work tackles the problem of extreme event detection on large scale transportation networks using origin-destination mobility data, which is now widely available. Such data is highly structured in time and space, but high dimensional and sparse. Current multivariate time series anomaly detection methods cannot fully address these challenges. To exploit the structure of mobility data, we formulate the event detection problem in a novel way, as detecting anomalies in a set of time dependent directed weighted graphs. We further propose a Context augmented Graph Autoencoder (Con-GAE) model to solve the problem, which leverages graph embedding and context embedding techniques to capture the spatial and temporal patterns. Con-GAE adopts an autoencoder framework and detects anomalies via semi-supervised learning.

The performance of the method is assessed on several city-scale travel time datasets from Uber Movement, New York taxi and Chicago taxi, and compared to state of the art approaches. The proposed Con-GAE can achieve a 0.1-0.4 improvement in the area under the curve (AUC) score compared to the baselines. We also discuss real-world traffic anomalies detected by Con-GAE.

Publication and Products:

  • Y. Hu, A. Qu, D. Work. "Detecting extreme traffic events via a context augmented graph autoencoder" in submissionDownload: Manuscript. Code: github.
  • Robust Tensor Recovery with Fiber Outliers for Traffic Events

    colored road map of Nashville

    This research project focuses on applying machine learning and optimization techniques to detect extreme events in urban transportation systems. Motivated by fast urbanization and increasing frequency of extreme weather events, the need for methods to quantify infrastructure performance and resilience at city scales has become a priority. The research on extreme events can be greatly aided by high volume of empirical data collected recent years, such as the large taxi dataset published by New York city, or Waze app dataset collected in Nashville. Data may of course be sparse and is in some cases masked. However, the sheer volumn of data from various sources provides an underexploited starting point to understand how transportation systems respond to distuptions.

    This project is aimed at finding a method that can overcome the constraints in high volume traffic data, and identify ''extreme'' behaviors from ''regular'' behaviors. Exploiting the regular patterns can be of help, which means we can rearrange the traffic data into higher dimensions. The porpose of the project is to develop a reliable algorithm to analysis the massive city traffic data in tensor format, and give a better insight on traffic pattern and extreme event behavior.


  • National Science Foundation.
  • Publication and Products:

  • Y. Hu and D. Work. "Robust Tensor Recovery with Fiber Outliers for Traffic Events." acceptet by ACM Transactions on Knowledge Discovery from Data (TKDD), 2020. Download: preprint; Manuscript. Code: github.
  • Automatic Data Cleaning for Urban Sensor Networks

    aot node and recovered map

    Low cost urban sensing networks enhance our understanding of cites and urban life. IThe impacts of mitigation strategies in communities can be measured at a fine-grained scale, informing context-aware policies and infrastructure design. However, fine-grained city-scale data analysis is complicated by common, tedious data cleaning tasks such as removing outliers and imputing missing data. To address the challenge of data cleaning, this project applies robust low-rank tensor factorization method to automatically correct anomalies and impute missing entries for high-dimensional urban environmental datasets.

    Furthermore, to address the challenge of data with large spatial-temporal scales and shifting patterns over time, we propose an online robust tensor recovery (OLRTR) method, to preprocess streaming high-dimensional urban environmental datasets. OLRTR can handle the data sequentially in minibatches, ensuring computa tional and memory efficiency in streaming systems.

    The method is applied to the Array of Things (AoT) city-scale sensor network. Located in the City of Chicago, IL, AoT collects real time data on the city's environment and activity with more than 90 nodes. Further analysis of AoT data and its broader usages are also under way.

    Publication and Products:

  • Y. Hu, Y. Wang, C. Jiao, R. Sankaran, C. E. Catlett, D. Work. "Automatic data cleaning via tensor factorization for large urban environmental sensor networks." Tackling Climate Change with Machine Learning Workshop at at NeurIPS, 2019.Download: Manuscript. Code: github.
  • Y. Hu, A. Qu, Y. Wang, D. Work. "Streaming data preprocessing via online tensor recovery for large environmental sensor networks" in submission Code: github
  • Quantifying traffic due to potential COVID-19 commute mode shifts

    rebound calculator

    This work is done in cooperation with teams from Cornell and UT Austin, to quantify the sensitivity of commute travel times in about a hundred US metro areas, due to potential changes in commute patterns after COVID-19 from transit and carpooling to single occupancy vehicles (SOV). Bayesian regression model is applied on US census data to relate commute travel time to the number of passenger vehicles. Findings are covered in more than 30 medias, including Reuters, Bloomberg CityLab, CBS news (also here), New York Post and so on.

    The findings of The Rebound study are coalesced into the Rebound Calculator. The tool estimates one-way commute travel times using models built around recent commuting data for most U.S. metro areas. As the number of vehicles on the roads increases, so too does the travel time, according to a traffic fundamental called the BPR model. Travel times, therefore, increase if existing commuters switch from transit or carpool to single-occupancy vehicles. Travel times will decrease if fewer vehicles are on the road due to unemployment or remote work. How mode shift could affect travel times is particularly important in the era or Covid-19, as it could impart high cost on commuters due to increased time spent in traffic.

    Publication and Products:

  • Yue Hu, Will Barbour, Kun Qian, Christian Claudel, Samitha Samaranayake, and Dan Work, ``Quantifying traffic due to potential Covid-19 commute mode shifts'' under preparation, 2019. Blog: The rebound. Download: preprint.
  • Copyright © All rights reserved | by Colorlib