Abstract
Great advances have been made by applying natural language processing (NLP) to extract knowledge from texts in general domains. However, in geoscience contexts this task remains challenging and under-investigated. Information extraction (IE), a sub-task of natural language processing (NLP) is challenging in geoscience contexts as named entities can be nested, require significant tacit knowledge to elicit, and can be composed of numerous remote words and independent terms. Consequently, pre-trained modes are scarce, resulting in inefficient and erroneous data collection and the need for knowledge-driven conceptual models.
If IE was feasible in the geological domain, adopting theory-driven data science approaches could help uncover new knowledge through patterns and relationship extraction from texts (Karpatne et al., 2019). A promising outcome would be the data-driven validation of assumptions using a consistent knowledge base.
This poster explores how Named Entity Recognition (NER) and embedding techniques (Wang et al., 2017, Fan et al., 2020) can be used to successfully construct a knowledge graph (KG) from geological texts without becoming incomprehensible (Zhou et al., 2021). A geological KG built using NLP techniques (Wang, 2020) will enable structured queries and knowledge discovery by identifying new relationships.
This work is part of a CSIRO project to assist its geodatabase integration into the national data infrastructure. The project's goals include promoting collaborative work that can facilitate information sharing, transfer of technology and knowledge, and improved decision-making amongst multidisciplinary teams and researchers in Australia. Specifically, this article will outline the development of a knowledge graph pipeline using NLP to support knowledge management and discovery in the geoscience domain.
If IE was feasible in the geological domain, adopting theory-driven data science approaches could help uncover new knowledge through patterns and relationship extraction from texts (Karpatne et al., 2019). A promising outcome would be the data-driven validation of assumptions using a consistent knowledge base.
This poster explores how Named Entity Recognition (NER) and embedding techniques (Wang et al., 2017, Fan et al., 2020) can be used to successfully construct a knowledge graph (KG) from geological texts without becoming incomprehensible (Zhou et al., 2021). A geological KG built using NLP techniques (Wang, 2020) will enable structured queries and knowledge discovery by identifying new relationships.
This work is part of a CSIRO project to assist its geodatabase integration into the national data infrastructure. The project's goals include promoting collaborative work that can facilitate information sharing, transfer of technology and knowledge, and improved decision-making amongst multidisciplinary teams and researchers in Australia. Specifically, this article will outline the development of a knowledge graph pipeline using NLP to support knowledge management and discovery in the geoscience domain.
Original language | English |
---|---|
Publication status | Published - 12 Dec 2022 |
Event | AGU Fall Meeting Abstracts - Duration: 12 Dec 2022 → 12 Dec 2022 |
Conference
Conference | AGU Fall Meeting Abstracts |
---|---|
Period | 12/12/22 → 12/12/22 |