Searching and Ranking Documents based on Semantic Relationships PaperPaper presentation ICDE Ph.D....

download Searching and Ranking Documents based on Semantic Relationships PaperPaper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA This.

If you can't read please download the document

Transcript of Searching and Ranking Documents based on Semantic Relationships PaperPaper presentation ICDE Ph.D....

  • Slide 1

Searching and Ranking Documents based on Semantic Relationships PaperPaper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA This work is funded by NSF-ITR-IDM Award#0325464 titled 'SemDIS: Discovering Complex Relationships in the Semantic Web and NSF-ITR-IDM Award#0219649 titled Semantic Association Identification and Knowledge Discovery for National Security Applications.SemDIS: Discovering Complex Relationships in the Semantic Web Boanerges Aleman-Meza LSDIS labLSDIS lab, Computer Science, University of Georgia Slide 2 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Outline Research Problem Proposed Solution Preliminary Results Outstanding Future Work Conclusions and Future work Slide 3 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Todays use of Relationships (for web search) href relationships between documents documents as a whole No explicit relationships are used other than co-occurrence Implicit semantics such as page importance (some content from www.wikipedia.org) Slide 4 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 But, more relationships are available Documents are connected through concepts & relationships i.e., MREF [SS98] Named-entities can be identified with respect to existing data, such as ontologies (some content from www.wikipedia.org) Slide 5 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Complex Relationships People will use Web search not only for documents, but also for information about semantic relationships [SFJMC02] Relationships play an important role in the continuing evolution of the Web [SAK03] Slide 6 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Complex Relationships Semantic Relationships: named- relationships connecting information items their semantic type is defined in an ontology go beyond is-a relationship (i.e., class membership) Have gained interest in the Semantic Web operators semantic associations [AS03] discovery and ranking [AHAS03, AHARS05, AMS05] Relevant in emerging applications: content analytics business intelligence knowledge discovery national security Slide 7 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Research Problem How we can exploit semantic relationships of named-entities to improve relevance in search and ranking of documents? Slide 8 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Proposed Solution: Diagram View Builds upon the following capabilities: Populated Ontologies Semantic Annotation RDF databases It can be done [ABEPS05] Demonstrated with small dataset Using explicit, named relationships [SRT05] Allows to explain why a document is relevant Slide 9 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Research Challenges Ranking Complex Relationships Utilization of populated Ontologies Defining and measuring what is relevant Addressing Scalability Slide 10 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Proposed Solution: Big Picture Ranking Complex Relationships [AHAS03, AHARS05] Large Populated Ontologies[AHSAS04] User-defined Context for Document Retrieval [ABEPS05] Relevance Measures using Semantic Relationships [ANR+06] (current work) Searching and Ranking Documents based on Semantic Relationships Slide 11 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Goal: Search and Ranking of Documents using Relationships Slide 12 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Ranking Complex Relationships AssociationRank Popularity Context Organization Political Organization Democratic Political Organization Subsumption Trust Association Length Rarity Slide 13 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Populated Ontologies: SWETO SWETO: Semantic Web Technology Evaluation Ontology [AHSAS04] Large scale test-bed ontology containing instances extracted from heterogeneous Web sources Domain: cs-publications, locations, terrorism Over 800K entities, 1.5M relationships (version 1.4) Developed using Freedom toolkit (www.semagix.com)www.semagix.com Version 1.4 Slide 14 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Defining what is relevant Ultimately, many entities are inter-connected! Which ones are relevant? Slide 15 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Defining what is relevant - type of next entity (from ontology) - name of connecting relationship - length of discovered path so far (short paths are preferred) - cumulative relevance score - other properties such as transitivity - user-defined context (if any) Relevance is determined by considering: Slide 16 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Defining what is relevant Involves human-defined relevance of specific path segments The simplest case, a YES/NO question: - Is it relevant to discover entities through a ticker relationship? yes? - Is it relevant to discover entities through a industry focus relationship? no? (Company) ticker x y industry focus Slide 17 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Measuring what is relevant Information-loss: measure that defines a cut-off on whether a sequence of relationships is still relevant (extending [MKIS00]) Tina Sivinski Electronic Data Systems leader of (20+) leader of ticker EDS Plano based at Fortune 500 listed in Technology Consulting has industry focus listed in 499 NYSE:EDS listed in 7K+ has industry focus Slide 18 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Preliminary Results Using human-defined relevance pruned to 5 relevant paths nave method (all paths) results in over 24K paths (of up to length 5) Tina Sivinski Electronic Data Systems leader of ticker EDS Plano based at Fortune 500 listed in Technology Consulting NYSE:EDS listed in has industry focus Slide 19 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Outstanding Future Work Formalize relevance-threshold idea leading to claim/lemma with proof Address Scalability Issues refinement of current indexing techniques Release of SWETO-DBLP Ontology enhanced ontology of DBLP data Comprehensive Evaluations human-subjects & comparisons with related work Slide 20 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Future Work: Context : why, what, how? Context Focused/Personalized Relevance Context captures users interest to provide him/her with relevant results By selecting concepts/relations/entities of the ontology Will build upon our previous work [AHAS03, ABEPS05] Slide 21 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 Related Work Semantic Searching and Ranking of entities on the Semantic Web Rocha et al. WWW2004 Nie et al. WWW2005 Guha et al. WWW2003 Stojanovic et al. ISWC2003 Zhuge et al. WWW2003 Slide 22 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 References [ABEPS05] B. Aleman-Meza, P. Burns, M. Eavenson, D. Palaniswami, A.P. Sheth: An Ontological Approach to the Document Access Problem of Insider Threat, IEEE ISI-2005An Ontological Approach to the Document Access Problem of Insider Threat [ASBPEA06] B. Aleman-Meza, A.P. Sheth, P. Burns, D. Paliniswami, M. Eavenson, I.B. Arpinar: Semantic Analytics in Intelligence: Applying Semantic Association Discovery to determine Relevance of Heterogeneous Documents, Adv. Topics in Database Research, Vol. 5, 2006 (in print) Semantic Analytics in Intelligence: Applying Semantic Association Discovery to determine Relevance of Heterogeneous Documents [AHAS03] B. Aleman-Meza, C. Halaschek, I.B. Arpinar, and A.P. Sheth: Context-Aware Semantic Association Ranking, First Intll Workshop on Semantic Web and Databases, September 7-8, 2003Context-Aware Semantic Association Ranking [AHARS05] B. Aleman-Meza, C. Halaschek-Wiener, I.B. Arpinar, C. Ramakrishnan, and A.P. Sheth: Ranking Complex Relationships on the Semantic Web, IEEE Internet Computing, 9(3):37-44 Ranking Complex Relationships on the Semantic Web [AHSAS04] B. Aleman-Meza, C. Halaschek, A.P. Sheth, I.B. Arpinar, and G. Sannapareddy: SWETO: Large-Scale Semantic Web Test-bed, Intl Workshop on Ontology in Action, Banff, Canada, 2004SWETO: Large-Scale Semantic Web Test-bed [AMS05] K. Anyanwu, A. Maduko, A.P. Sheth: SemRank: Ranking Complex Relationship Search Results on the Semantic Web, WWW2005Ranking Complex Relationship Search Results on the Semantic Web [AS03] K. Anyanwu, and A.P. Sheth, -Queries: Enabling Querying for Semantic Associations on the Semantic Web, WWW2003-Queries: Enabling Querying for Semantic Associations on the Semantic Web Slide 23 Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006 References [HAAS04] C. Halaschek, B. Aleman-Meza, I.B. Arpinar, A.P. Sheth, Discovering and Ranking Semantic Associations over a Large RDF Metabase, VLDB2004, Toronto, Canada (Demonstration Paper)Discovering and Ranking Semantic Associations over a Large RDF Metabase [MKIS00] E. Mena, V. Kashyap, A. Illarramendi, A.P. Sheth, Imprecise Answers in Distributed Environments: Estimation of Information Loss for Multi-Ontology Based Query Processing, Intl J. Cooperative Information Systems 9(4):403-425, 2000Imprecise Answers in Distributed Environments: Estimation of Information Loss for Multi-Ontology Based Query Processing [SAK03] A.P. Sheth, I.B. Arpinar, and V. Kashyap, Relationships at the Heart of Semantic Web: Modeling, Discovering and Exploiting Complex Semantic Relationships, Enhancing the Power of the Internet Studies in Fuzziness and Soft Computing, (Nikravesh, Azvin, Yager, Zadeh, eds.)Relationships at the Heart of Semantic Web: Modeling, Discovering and Exploiting Complex Semantic Relationships [SFJMC02] U. Shah, T. Finin, A. Joshi, J. Mayfield, and R.S. Cost, Information Retrieval on the Semantic Web, CIKM 2002Information Retrieval on the Semantic Web [SRT05] A.P. Sheth, C. Ramakrishnan, C. Thomas, Semantics for the Semantic Web: The Implicit, the Formal and the Powerful, Intl J. Semantic Web Information Systems 1(1):1-18, 2005Semantics for the Semantic Web: The Implicit, the Formal and the Powerful [SS98] K. Shah, A.P. Sheth, Logical Information Modeling of Web-Accessible Heterogeneous Digital Assets, ADL 1998Logical Information Modeling of Web-Accessible Heterogeneous Digital Assets Slide 24 Data, demos, more publications at SemDis project web site, http://lsdis.cs.uga.edu/projects/semdis/ Thank You http://lsdis.cs.uga.edu/projects/semdis/