Multivariate Time series (TS) forecasting is one of the most challenging topics, that has the potential to greatly aid in advanced estimation and management for decision-making in environmental fields such as natural disaster forecasting. In fact, several Earth Observation EO sensors are used to collect these TS. Consequently, a massive amount of data is generated at an exponential rate. Thus, managing this data and exploiting it for natural disaster forecasting present several challenges. The first challenge is the data volume and variety. EO data is vast and diverse, including satellite imagery, geospatial data, radar data, and so on. Managing and processing large volumes of data from various sources required efficient storage, computational power, and data management techniques. In addition, integrating other types of data such as meteorological, biophysical, and social data adds complexity to the data management process. The second challenge remains in data quality and preprocessing. EO data often contains noise, missing values, and outliers. Ensuring data quality through preprocessing techniques such as correction, filtering, and interpolation is essential for accurate forecasting. The third challenge is data integration. In fact, integrating data from diverse sources such as satellite imagery, climate models, and ground-based sensors is crucial for comprehensive forecasting. However, integrating various data sources, dealing with data heterogeneity, and handling missing data present a challenge. The fourth challenge is spatio-temporal dynamics and interactions. Natural disasters are dynamic and processes influenced by diverse factors. Thus, modeling the complex spatio-temporal dependencies between variables in multivariate TS forecasting is challenging. Several forecasting models were proposed by researchers. But, there is no rule for choosing the most appropriate forecasting model. A model which considers the sophisticated proprieties of EO data and the dependencies, causality, and lag effects between variables. Thus, to address these challenges, this thesis presents two main contributions. The first contribution,an advanced data management technique, and a scalable storage infrastructure are proposed. This architecture is composed of three main layers: 1) Data collection and preprocessing, 2) data loading and storage, and 3) visualization and interpretation. The experimentation was carried out using data gathered from China for drought application. The second contribution is a Heterogeneous Spatio-Temporal Graph using multivariate earth observation time series namely, HetSPGraph. HetSPGraph extends the conventional heterogeneous graph by incorporating multiple modalities of data sources. The proposed methodology consists of three layers: spatial aggregations, temporal aggregation, and a forecasting network. The findings showed that HetSPGraph gave good results for drought forecasting in China.
On the building of a spatio-temporal heterogeneous graph-based architecture using big data: Application for drought forecasting
Soutenue par Hanen Balti, à l’ENSI (Tunisie) le 19-12-2023. Co-Encadrée par Imed Riadh Farah, Nedra Nauwynk Mellouli et Ali Ben Abbes, en co-tutelle entre l’Université de Paris 8 et l’ENSI. Content d’avoir été rapporteur de sa thèse.