Cebit2009new

20

description

alKawarizmy Language Software 2009

Transcript of Cebit2009new

Page 1: Cebit2009new
Page 2: Cebit2009new

15 Years of Work before Deciding to Establish "alKhawarizmy"

alKhawarizmy Language Software" (established in January 2006)

In spite of the recency of the company, the roots of the concept of the company go back 15 years

The founder of the company Dr. Hossam ElDin Mahgoub

together with a team of researchers, developers and linguists, were engaged in NLP research, applied to the Arabic language.

Dr. Hossam established the company in order to invest his experience and research in the NLP area, applied to benefit the Arabic language  and the Arab user.

The greatest challenge was to try to make the computer "understand" the Arabic language and to process it as simply as possible, in spite of its unique and special features.

Page 3: Cebit2009new

KSearch, from alKhawarizmy is an Arabic search engine for websites, companies and  organizations, that is capable of searching through thousands of Arabic web pages or documents

thereby benefiting your business through the following features:

1. Speed:   KSearch indexes web pages and documents at a rate of about 20,000 words/sec.

2. Automatic Indexing:   KSearch's indexing engine is capable of automatically indexing web pages and documents, based on a period which you select

3. Accuracy:   KSearch's primary aim is to facilitate the retrieval of information for your website's visitors or the employees in your company by providing them with fast, comprehensive and accurate information retrieval.

4. Productivity:   Search accuracy, fast retrieval of results, automatic indexing... these are all features that will make your Arabic content more effective. The information you retrieve will be more reliable, since it will be more reachable than before.

Discover How KSearch can benefit your Business!

Page 4: Cebit2009new

Arabic NLP ResearchArabic Applications based on NLP components

Stress on software quality (targeting ‘zero defect’ S/W).Cooperate with the community; e.g. research students at universities (forming partnerships).Promote widespread use of affordable applications that take the special features of the Arabic language into account.Effectively serve the Arab region by catering for its users’ needs impact the way an Arabic user searches.

Page 5: Cebit2009new

The number of Arab Internet Users is growing22 million users in 200643 million expected in 2008

The volume of Arabic e-content is increasing (on the web and in companies’ intranets): Around 100 million Arabic web

pages About 5 million Arabic web

sites

Page 6: Cebit2009new

Arabic is a highly inflected languageArabic morphology has a set of unique featuresProper Arabic e-content processing is deficientConsequently, Arab users are unable to take full advantage of Arabic e-content, compared with other languagesAs an example, considering searching through Arabic content …

Page 7: Cebit2009new

Using :

- Search for “الحائزون على جوائز نوبل” produces about 238 results

Page 8: Cebit2009new

Using :

- Search for “الحائزون على جائزة نوبل” produces about 684 results

Page 9: Cebit2009new

Using :

- Search for “حاز على جائزة نوبل” produces about 16,700 results

Page 10: Cebit2009new

When used for Arabic search, traditional search engines produce

Incomprehensive results, i.e. not all inflected forms are found => a lot of useful information is missingRedundant results, i.e. some results are inaccurate => they ‘bear no relation’ in form or in meaning to the search word(s)

Page 11: Cebit2009new
Page 12: Cebit2009new

An Arabic Search Model that:(A) Provides Morphological Search Comprehensive(B) Differentiates between Meanings of Arabic words

Improves Accuracy

In other words…

Let us see the same example, using KSearch…

Page 13: Cebit2009new

SearchArabic Morphological Search (to produce comprehensive search results).Document, as well as Database Search.Differentiation between Word Meanings (to increase accuracy of search results, i.e. reduce redundancy).Search using Logical Operators (و – أو - ليس).Adjacency (Proximity) Search, in order of query words or not.Search using Wildcards (for proper nouns).Search words are highlighted in the results pages.

Latin character support (English words). Spell checking of query words. Stem and Thesaurus Search.

= NEW (After Incubation Funding)

Page 14: Cebit2009new

IndexingArabic comprehensive dictionary of contemporary Arabic (approximately 78,000 entries).Document, as well as Database Indexing.Fast Indexing Engine (≈ 20,000-56,000 words/sec on a PC with Intel Core 2 Duo CPU running at 2.33GHz, SATA HDD, 3GB RAM).Uses 64 bit Technology => Unlimited Index Size.Comprehensive Index Management: Capability of deleting, updating and merging indexes.Following document formats are supported, including UNICODE encoded documents: Text, RTF, MS Office, PDF.

Page 15: Cebit2009new

Arabic ِMorphological Analyzer

Comprehensive + Contemporary Arabic Lexicon

Arabic Data

Source(Database, Document,

etc.)

Indexing Engine

Meta Data Repository

Search Engine

Search Results

Arabic Lexical Semantic Analyzer

Page 16: Cebit2009new

Component Oriented Architecture:Software Integrated in:

Websites Web Edition. Enterprises (Intranets) Enterprise Edition.Single PCs Desktop Edition.

Software as a Service (SaaS) – Future Direction:

On Dedicated Web Server.

Page 17: Cebit2009new

Employs KMorph, a fast Arabic morphological analyzer.

Uses a comprehensive Arabic lexicon of contemporary words.

KSpell Engine: Provides APIs for spelling verification and correction, e.g. may be integrated with content management systems to produce correctly spelled Arabic web content.

Page 18: Cebit2009new

Target Audience:

1- e- Government.2- Web Publishers (News sites, Web developers,

…etc.).3- Web Content Management (CMS, E-library

systems, Helpdesk…etc.).4- Arabic & Arabic enabled internet search sites.

Page 19: Cebit2009new

Competitive Advantage:

Price

Off-the-Shelf Installation

Online Demonstration

Page 20: Cebit2009new

Target Markets

Arabic Web Sites(3,000,000)

Corporations'Intranets (566,000)