Application of a Similarity Measure for Graphs to Web-based Document Structures

Matthias Dehmer, Frank Emmert-Streib, Alexander Mehler, Juergen Kilian, Max Muehlhaeuser

Tutkimustuotos: KonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.

AlkuperäiskieliEnglanti
OtsikkoProceedings Of World Academy Of Science, Engineering And Technology, Vol 8
ToimittajatC Ardil
KustantajaWORLD ACAD SCI, ENG & TECH-WASET
Sivut77-81
Sivumäärä5
TilaJulkaistu - 2005
Julkaistu ulkoisestiKyllä
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaConference of the World-Academy-of-Science-Engineering-and-Technology - Budapest, Unkari
Kesto: 26 lokak. 200528 lokak. 2005

Julkaisusarja

NimiProceedings of World Academy of Science Engineering and Technology
KustantajaWORLD ACAD SCI, ENG & TECH-WASET
Vuosikerta8
ISSN (painettu)1307-6884

Conference

ConferenceConference of the World-Academy-of-Science-Engineering-and-Technology
Maa/AlueUnkari
Ajanjakso26/10/0528/10/05

Sormenjälki

Sukella tutkimusaiheisiin 'Application of a Similarity Measure for Graphs to Web-based Document Structures'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä