Knowledge Mining Across Disciplines: A Survey

Thursday - August 17 2023, 12:05 UTC - 2 years ago

tldr #

This work is a survey into knowledge mining across disciplines such as natural language processing (NLP), data mining (DM), and machine learning (ML). The aim of the survey is to analyze and compare the representational traits and evaluation methods of knowledge bases constructed in different fields for the purpose of presenting a cross-disciplinary review to build bridges among different fields to stimulate ideas for further research.

content #

Knowledge mining is a widely active research area across disciplines such as natural language processing (NLP), data mining (DM), and machine learning (ML). The overall objective of extracting knowledge from data source is to create a structured representation that allows researchers to better understand such data and operate upon it to build applications. Each mentioned discipline has come up with an ample body of research, proposing different methods that can be applied to different data types. A significant number of surveys have been carried out to summarize research works in each discipline. However, no survey has presented a cross-disciplinary review where traits from different fields were exposed to further stimulate research ideas and to try to build bridges among these fields. In this work published on Machine Intelligence Research, the researchers present such a survey.

Organizing knowledge has traditionally been done manually in a laborious way, involving manual inspection, selection, and annotation of pieces of knowledge

Automatic extraction of knowledge from diverse sources of data is a challenging task across different fields. For example, in natural language processing (NLP), research on the extraction of structured knowledge bases from natural language text has received much attention due to its applications. In data mining (DM), a wide area of research has focused on mining rules from structured databases that can help people discover novel associations between items or features and make decisions in diverse contexts such as business or education. Furthermore, in the field of machine learning (ML), plenty of effort has been advocated towards extracting knowledge, mainly in the form of logic rules, from both machine learning system's predictions and parameters in order to build an interpretable representation that helps to explain the system's decisions (the so-called interpretability problem); a scenario highly sought in medicine, for example. Extracting or mining knowledge from data (be it unstructured, structured, or behavioral data) is an open problem that has been tackled across different research fields. This wide scenario has not only led to different definitions and ways to represent the construct of knowledge (and consequently, to define the task of knowledge mining), but it has also resulted in diverse research perspectives, which seem to use different methodologies to extract knowledge and different metrics to evaluate the consistency of the knowledge extracted.

Data mining can involve clustering user data to discover unknown relationships and patterns within the data

On the other hand, in the NLP field, a knowledge base is usually represented as a tensor structure where each entry usually corresponds to a probabilistic assignment of the belief of a fact. Finally, in the field of machine learning, the problem of knowledge mining has been motivated by the problem of trying to understand and validate ML systems which due to their complexity are not easy to be inspected manually. Similarly, the choice of the representation of knowledge has been constrained to be understandable by humans, where a widely common and accepted representation in this area are logic rules.

The goal of natural language processing is to automatically extract structured knowledge from unstructured input text

From this brief overview of knowledge mining across fields, it can be observed that the diversity of objectives and constructs and the wide scenario researchers claimed at thar this survey aimed to collate, integrate and explain.

hashtags #

knowledgemining datascience datamining nlp ml machinelearning

worddensity #

knowledge (14, 2.85%)
research (9, 1.83%)
different (8, 1.63%)
mining (7, 1.42%)
data (7, 1.42%)