Skip to Main content Skip to Navigation

Semantic Representation of a Heterogeneous Document Corpus for an Innovative Information Retrieval Model : Application to the Construction Industry

Abstract : The recent advances of Information and Communication Technology (ICT) have resulted in the development of several industries. Adopting semantic technologies has proven several benefits for enabling a better representation of the data and empowering reasoning capabilities over it, especially within an Information Retrieval (IR) application. This has, however, few applications in the industries as there are still unresolved issues, such as the shift from heterogeneous interdependent documents to semantic data models and the representation of the search results while considering relevant contextual information. In this thesis, we address two main challenges. The first one focuses on the representation of the collective knowledge embedded in a heterogeneous document corpus covering both the domain-specific content of the documents, and other structural aspects such as their metadata, their dependencies (e.g., references), etc. The second one focuses on providing users with innovative search results, from the heterogeneous document corpus, helping the users in interpreting the information that is relevant to their inquiries and tracking cross document dependencies.To cope with these challenges, we first propose a semantic representation of a heterogeneous document corpus that generates a semantic graph covering both the structural and the domain-specific dimensions of the corpus. Then, we introduce a novel data structure for query answers, extracted from this graph, which embeds core information together with structural-based and domain-specific context. In order to provide such query answers, we propose an innovative query processing pipeline, which involves query interpretation, search, ranking, and presentation modules, with a focus on the search and ranking modules.Our proposal is generic as it can be applicable in different domains. However, in this thesis, it has been experimented in the Architecture, Engineering and Construction (AEC) industry using real-world construction projects.
Complete list of metadata

Cited literature [113 references]  Display  Hide  Download
Contributor : Sébastien Laborie Connect in order to contact the contributor
Submitted on : Tuesday, February 4, 2020 - 8:22:50 AM
Last modification on : Tuesday, February 15, 2022 - 3:41:37 AM
Long-term archiving on: : Tuesday, May 5, 2020 - 1:15:56 PM


Files produced by the author(s)


  • HAL Id : tel-02465630, version 1



Nathalie Charbel. Semantic Representation of a Heterogeneous Document Corpus for an Innovative Information Retrieval Model : Application to the Construction Industry. Multimedia [cs.MM]. LIUPPA - Laboratoire Informatique de l'Université de Pau et des Pays de l'Adour, 2018. English. ⟨NNT : 2018PAUU3025⟩. ⟨tel-02465630⟩



Record views


Files downloads