Problems
Ongoing Research Themes
[T1] High-dimensional time series analysis
- High-dimensional time series data, one classical and representative type of multivariate time series data in which the number of time series sequences (vastly) exceeds the number of time points, results in both methodological and practical challenges. In this research, we develop high-dimensional time series models to overcome the challenges and support many real-world applications (e.g., intelligent transportation systems).
- One example is web traffic time series forecasting which consists of approximately 145k time series, and the first place solution is available at https://github.com/Arturus/kaggle-web-traffic. There are also some good tutorials about this data set, including Wikipedia traffic data exploration and Wiki Traffic Forecast Exploration - WTF EDA.
- References:
- Reduced-rank regression
- Raja P. Velu, Gregory C. Reinsel, Dean W. Wichern (1986). Reduced rank models for multiple time series. Biometrika, 73(1): 105-118.
- Alan Julian Izenman (1975). Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5(2): 248-264.
- Raja Velu, Gregory C. Reinsel (1998). Multivariate reduced-rank regression: Theory and applications. Springer.
- Yiyuan She, Kun Chen (2017). Robust reduced-rank regression. Biometrika, 104(3): 633-647.
- Sparse multivariate factor regression, arXiv preprint 2015.
- Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification, CVPR 2020.
- Solving high-dimensional parabolic PDEs using the tensor train format, ICML 2021.
- State-space model
- James Durbin, Siem Jan Koopman (2014). Time series analysis by state space methods.
- Shane Barratt, Yining Dong, Stephen Boyd (2021). Low rank forecasting. arXiv:2101.12414.
- Deep learning model
- Yoshimasa Uematsu, Yingying Fan, Kun Chen, Jinchi Lv, Wei Lin (2019). SOFAR: Large-scale association network learning. IEEE Transactions on Information Theory, 65(8): 4924-4939.
- Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting, arXiv preprint, 2019.
- Diffusion convolutional recurrent neural network: Data-driven traffic forecasting
- Zhenyu Liao (2019). A random matrix framework for large dimensional machine learning and neural networks. PhD defense slides.
- Vector autoregressive model
- High-dimensional vector autoregressive time series modeling via tensor decomposition, Journal of the American Statistical Association, 2021. [Julia code]
- Vector autoregresive moving average identification for macroeconomic modeling: A new methodology.
- Sumanta Basu, Xianqi Li, George Michailidis (2019). Low rank and structured modeling of high-dimensional vector autoregressions. IEEE Transactions on Signal Processing, 67(5): 1207-1222.
- Ines Wilms, Sumanta Basu, Jacob Bien, David S. Matteson (2021). Sparse identification and estimation of large-scale vector autoregressive moving averages. Journal of the American Statistical Association.
- Di Wang, Ruey S. Tsay (2021). Robust estimation of high-dimensional vector autoregressive models. arXiv:2107.11002.
- Peiliang Bai, Abolfazl Safikhani, George Michailidis (2021). Multiple change point detection in reduced rank high dimensional vector autoregressive models. arXiv:2109.14783.
- Questions:
- Reduced-rank regression
[T2] City-wide traffic state estimation and forecasting from partially observed data
- Spatiotemporal traffic data collected from transportation systems are inevitably incomplete due to many reasons, including sensor malfunctioning, communication failure, and sparse sensing. For example, in the Uber movement project, we can approach many urban traffic speed data sets around the world. However, these data sets are both sparse and high-dimensional. In such case, missing data problem would possibly result in both methodological and practical challenges and make it difficult to get the true signals from data. In this research, we develop temporal matrix factorization frameworks (see technical questions) for missing traffic data imputation and multi-step traffic forecasting in the presence of missing values.
- Challenges:
- Incompleteness & sparsity (insufficient sampling of ridesharing vehicles on the urban road network). Poor data quality poses a threat to the credibility of analysis on such data.
- High-dimensionality (thousands of road segments in a city-wide scale).
- Nonstationarity (periodicity & seasonality). One fact is that the statistical properties of nonstationary signals are changing over time.
- We are interested in several critical questions:
- How can we manipulate and learn from partially observed data?
- How to identify traffic dynamics and patterns from partially observed data?
- How to perform forecasting by making full use of traffic dynamics from incomplete data?
- The benefits of city-wide traffic state estimation:
- Seeking solutions for congested traffic in urban areas (e.g., efficient traffic signal control scheme).
- Providing trip planning suggestion for drivers.
- Alerting special events (e.g., traffic accident).
- Infrastructure investment for stationary sensors (e.g., loop detectors, cameras) to the whole urban road network is costly and inefficient because a large number of road segments have limited traffic.
- Mobile sensors: In the past few years, the emergence and availability of GPS trajectory data of ridesharing vehicles provided transportation researchers and practitioners with more opportunities for monitoring urban traffic states. But the limited penetration of ridesharing vehicles in total traffic usually results in partially observed data.
- References:
- Shape and time distortion loss for training deep time series forecasting models. NeurIPS 2019.
- R. Dahlhaus (1997). Fitting time series models to nonstationary processes.
[T3] Multidimensional time series data analysis
- Multidimensional time series data are ubiquitous nowadays in a variety of disciplines and fields ranging from social science to engineering. Examples in transportation systems include (station, station, time step)-formatted travel time data and (vehicle type, pickup location, dropoff location, time step)-formatted mobility flow data (e.g., NYC taxi data), and they all have intrinsic multiple dimensions and are by nature tensors. In this research, we develop multidimensional time series models to characterize the multidimensional properties of these time series data.
- We are interested in several critical questions:
- How to build a multidimensional time series model by using tensor structure?
- What is the advantage of the multidimensional time series models?
- References:
- Autoregressive models for matrix-valued time series, Journal of Econometrics, 2020.
- Learning from multiway data: Simple and efficient tensor regression, ICML 2016.
- Online topology identification from vector autoregressive time series, IEEE Transactions on Signal Processing, 2021.
- Factor models for high-dimensional tensor time series, arXiv preprint, 2019.
- Tensor-train networks for learning predictive modeling of multidimensional data, arXiv preprint, 2021.
- Dynamic and multi-faceted spatio-temporal deep learning for traffic speed forecasting, KDD 2021. [Data & Python code]
- Spatial-temporal graph ODE networks for traffic flow forecasting, KDD 2021. [Data & Python code]
- Walmart Recruiting - Store Sales Forecasting: Use historical markdown data to predict store sales
- Koh Takeuchi, Hisashi Kashima, Naonori Ueda (2017). Autoregressive tensor factorization for spatio-temporal predictions. ICDM 2017.
- Other examples:
- In environment, the gas emission might include many kinds of measurements (e.g., CO2, SO2) and involve spatial and temporal information. There are many emission data sets on UCI machine learning repository.
- In retail, consumer demand for certain products with spatial and temporal information is multidimensional. (M5 Forecasting - Accuracy: Estimate the unit sales of Walmart retail goods)
[T4] Multidimensional spatiotemporal data imputation
- References:
These ongoing research themes are highly related to our prior work. If you want to take a look at our past research, please check out our publication list.
A Collection of Supplementary Material
Multivariate time series imputation
Non-random missing (missing not at random in matrix/tensor.)
Blackouts (Missing values are occured in some consecutive time points for all time series. One recent work is HKMF-T: Recover from blackouts in tagged time series with Hankel matrix factorization.)
Count valued data (There are many count valued data, e.g., passenger/traffic flow. But how to develop any machine/statistical learning model for them? One online imputation work is Online tensor decomposition and imputation for count data.)
Multivariate time series forecasting
- Short time series data (One intuitive example is daily passenger flow at railway station during special periods like Chinese New Year (about two to three weeks). How to apply machine learning models like Hankel structured low rank matrix completion to forecast short time series? How to generalize it to multivariate time series cases?)
- One multidimensional short time series data case is from the competition sales forecast of passenger vehicle segment market, and the data contains the monthly sale information of 60 kinds of vehicles in 22 provinces of China within two whole years (2016 and 2017). As can be seen, the data can be represented as a 60-by-22-by-24 tensor.