In the course of ongoing research, the NSDPI reviews the state of the art in key topics in data science and emerging technology. We release these reviews to enable the community to track potentially relevant advances. This review characterizes recent applications of multimodal AI for time series analysis.
The ability of Language Models (LMs) to analyze time series has broad applications across various fields including healthcare (Harutyunyan et al., 2019) and finance (Li et al., 2024; S. Wang et al., 2024).
Scholars have recently proposed a variety of time series foundation models (TSFMs) specifically designed to address time series tasks (Ansari et al., 2024; Goswami et al., 2024; Y. Liu et al., 2025; Woo et al., 2024). These models perform well with regard to time, but fail to incorporate other modalities such as language. Other studies have adapted language models (LMs) to support time series tasks, such as forecasting (e.g., Gruver et al., 2023; Jin, Wang, et al., 2024; Tan et al., 2024) by leveraging textual information with cues that help explain or anticipate temporal changes. For example, holiday indicators can inform future sales predictions (Zhang et al., 2024). To accomplish this, researchers manually annotate time series with textual descriptions that include signals related to future trends, enabling LMs to make more informed predictions (Williams et al., 2024). X. Wang et al. (2024) collect news sequences related to the time series, such as weather forecast texts to aid in predicting electricity usage. Similarly, H. Liu et al. (2024) collect time series data from eight domains along with temporally aligned "factual" events or descriptions to support forecasting tasks.
Many studies highlight the importance of textual information in improving the understanding of time series by LMs (Jin, Zhang, et al., 2024; Kong et al., 2025). For example, J. Liu et al. (2024) incorporate AIOps (Artificial Intelligence for IT Operations) monitoring time series and domain knowledge into the prompt, enabling LMs to perform anomaly detection. Other studies, such as Cai et al. (2024), use synthetic data to evaluate the ability of LMs to recognize patterns in time series, finding that LMs perform well. However, pairing synthetic time series with associated text reveals that LMs still struggle with time series reasoning—for example, when matching a time series with the scenario that generated it (Merrill et al., 2024).
Some works modify LM architectures to integrate them with time-series data. Cao et al. (2024) impose time series-specific inductive biases to achieve better performance in time series tasks, including forecasting. They separate the trends, seasonality, and residual components of a time series, and design prompts to incorporate this information into language. Xie et al. (2024) interweave time series data with text descriptions to enable better LM understanding of these data. C. Wang et al. (2024) treat time series as a “foreign language” and expand the vocabulary of an existing LM to incorporate time series tokens. This approach far outperforms standard language-only models on time series tasks. Due to the difficulty in collecting large corpora of naturally occurring time series data, synthetic data generation is crucial for strong TSFM performance (X. Liu et al., 2025), but this area remains less explored for multimodal time series data (Xie et al., 2024).
Along with models that can process both time series and language, recent work has introduced multimodal time series benchmarks to track performance in these areas. Cai et al. (2024) propose a benchmark using synthetic data to evaluate LMs’ understanding of time series, focusing on tasks such as pattern recognition. Similarly, Merrill et al. (2024) introduce synthetic time series data and relevant textual descriptions containing a single causal event to evaluate the performance of LMs in matching time series to the scenarios that generated them (i.e., etiological reasoning).