Temporal quasi-semantic visualization and exploration of large scientific publication corpora

Share Embed


Descripción

The huge amount of information in the form of the rapidly growing corpus of scholarly literature presents a major bottleneck to research advancement. The use of artificial intelligence and modern machine learning techniques has the potential of overcoming some of the associated challenges. In this paper we introduce as our main contribution a visualization tool which enables a researcher to analyse large longitudinal corpora of scholarly literature in an intuitive, quasi-semantic fashion. The tool allows the user to search for particular topics, track their temporal interdependencies (e.g. ancestral or descendent topics), and examine their dominance within the corpus across time. Our visualiza-tion builds upon a temporal topic model capable of extracting meaningful information from large longitudinal corpora, and of tracking complex temporal changes within it. The framework comprises: (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes. Unlike previously proposed methods our algorithm distinguishes between two groups of particularly challenging and pertinent topic evolution phenomena: topic splitting and spe-ciation, and topic convergence and merging, in addition to the more widely recognized emergence and disappearance, and gradual evolution. Evaluation is performed on a public medical literature corpus concerned with the highly pertinent condition: the so-called metabolic syndrome.
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.