Multifaceted Entity Relational Data Model

Documents are typically unstructured in nature. Visualizing the content of a document corpus and the relationships between documents requires that these unstructured artifacts be transformed into a structured form. We proposed a multifaceted entity relational data model to represent this information in a structured way. This figure illustrates the processing pipeline used to transform a set of raw unstructured documents into our data model.
The first stage in the transformation pipeline is facet segmentation. During this stage, each document is segmented into facet snippets. While various techniques could be used, we typically employ a topic modeling technique such LDA and treat each topic as a facet. When processing documents with a well de?ned structure, we directly use the sections to de?ne facet snippets.
Entity extraction is the second transformation pipeline stage. In this step, a named entity recognition algorithm is applied to each facet’s document snippet to generate a set of typed entities. Domain-speci?c ontology models are used to recognize meaningful entities for each facet.
The third stage in the processing pipeline is relation building. In this stage, connections between extracted entities are established using two types of relations: internal relations and the external relations. An internal relation connects entities within the same facet. An external relation is a connection between entities from different facets.


Visual Analysis of Multifaceted Content, Internal, and External Relationships


Publication(s):

  • Yu-Ru Lin, Jimeng Sun, Nan Cao, Shixia Liu: ContexTour: Contextual Contour Analysis on Dynamic Multi-relational Clustering. SDM 2010: 418-429
  • Nan Cao, Jimeng Sun, Yu-Ru Lin, David Gotz, Shixia Liu, Huamin Qu: FacetAtlas: Multifaceted Visualization for Rich Text Corpora. IEEE Trans. Vis. Comput. Graph. 16(6): 1172-1181 (2010)
  • Nan Cao, David Gotz, Jimeng Sun, Yu-Ru Lin, Huamin Qu: SolarMap: Multifaceted Visual Analytics for Topic Exploration. ICDM 2011: 101-110