Problems

Ongoing Research Themes

[T1] High-dimensional time series analysis

[T2] City-wide traffic state estimation and forecasting from partially observed data

  • Spatiotemporal traffic data collected from transportation systems are inevitably incomplete due to many reasons, including sensor malfunctioning, communication failure, and sparse sensing. For example, in the Uber movement project, we can approach many urban traffic speed data sets around the world. However, these data sets are both sparse and high-dimensional. In such case, missing data problem would possibly result in both methodological and practical challenges and make it difficult to get the true signals from data. In this research, we develop temporal matrix factorization frameworks (see technical questions) for missing traffic data imputation and multi-step traffic forecasting in the presence of missing values.
  • Challenges:
    • Incompleteness & sparsity (insufficient sampling of ridesharing vehicles on the urban road network). Poor data quality poses a threat to the credibility of analysis on such data.
    • High-dimensionality (thousands of road segments in a city-wide scale).
    • Nonstationarity (periodicity & seasonality). One fact is that the statistical properties of nonstationary signals are changing over time.
  • We are interested in several critical questions:
    • How can we manipulate and learn from partially observed data?
    • How to identify traffic dynamics and patterns from partially observed data?
    • How to perform forecasting by making full use of traffic dynamics from incomplete data?
  • The benefits of city-wide traffic state estimation:
    • Seeking solutions for congested traffic in urban areas (e.g., efficient traffic signal control scheme).
    • Providing trip planning suggestion for drivers.
    • Alerting special events (e.g., traffic accident).
    • Infrastructure investment for stationary sensors (e.g., loop detectors, cameras) to the whole urban road network is costly and inefficient because a large number of road segments have limited traffic.
  • Mobile sensors: In the past few years, the emergence and availability of GPS trajectory data of ridesharing vehicles provided transportation researchers and practitioners with more opportunities for monitoring urban traffic states. But the limited penetration of ridesharing vehicles in total traffic usually results in partially observed data.
  • References:

[T3] Multidimensional time series data analysis

[T4] Multidimensional spatiotemporal data imputation

These ongoing research themes are highly related to our prior work. If you want to take a look at our past research, please check out our publication list.

A Collection of Supplementary Material

Multivariate time series imputation

Multivariate time series forecasting

  • Short time series data (One intuitive example is daily passenger flow at railway station during special periods like Chinese New Year (about two to three weeks). How to apply machine learning models like Hankel structured low rank matrix completion to forecast short time series? How to generalize it to multivariate time series cases?)
  • One multidimensional short time series data case is from the competition sales forecast of passenger vehicle segment market, and the data contains the monthly sale information of 60 kinds of vehicles in 22 provinces of China within two whole years (2016 and 2017). As can be seen, the data can be represented as a 60-by-22-by-24 tensor.