TY - GEN
T1 - Towards Clustering of Web-based Document Structures
AU - Dehmer, Matthias
AU - Emmert-Streib, Frank
AU - Kilian, Juergen
AU - Zulauf, Andreas
PY - 2005
Y1 - 2005
N2 - Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g, improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.
AB - Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g, improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.
KW - Clustering methods
KW - graph-based patterns
KW - graph similarity
KW - hypertext structures
KW - web structure mining
M3 - Conference contribution
T3 - Proceedings of World Academy of Science Engineering and Technology
SP - 289
EP - 294
BT - Proceedings Of World Academy Of Science, Engineering And Technology, Vol 10
A2 - Ardil, C
PB - WORLD ACAD SCI, ENG & TECH-WASET
T2 - Conference of the World-Academy-of-Science-Engineering-and-Technology
Y2 - 16 December 2005 through 18 December 2005
ER -