- Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

15
- Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED

Transcript of - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

Page 1: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

- Document Filter Ver 3.x

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED

Page 2: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 2 -

Software that works!

▶ Overview▶ Key features▶ Block Diagram▶ Specification▶ Performance

Page 3: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED

- 3 -

Overview

• Synap NextTM – Document Filter extracts text and information from various word processor document files, such as MS-Office WORD / PowerPoint / Excel / PDF / Hansoft Hangul / JustSystems-Ichitaro etc.

DocumentFile Format

MS-Word(Doc)

MS-PowerPoint(Ppt)

MS-Excel(Xls)

Hansoft Hangul(Hwp)

PDF

Synap NextDocument Filter

•Format detection•Error Check•Summary Information

Text

ExtractFull Text Summary

KMS, EDMS, Groupware, CMS(Various System)

Search Engine

Database

Email,Document Securitypersonal information(Security System)

JustSystems Ichitaro (JTD)

Page 4: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 4 -

Key features

1. Stabilized fast document filtering

2. Automatically detects supported file format

3. Various document-file and platform(OS) support

4. Support for multi-thread

5. Multilingual are supported at base unicode

6. Attached document file to e-mail support

7. Files in Compound Document File filtering

8. Document Files in zipped file can be filtered

9. Easy API to use

10. Memory-management and file-filtering API are supported

11. Unify Library of all document file

12. C/C++ API is included for customizing filter capability

13. C/C++ API can be exported and called by JAVA, Python, VB and Delphi

Page 5: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 5 -

Block Diagram

Docu2Txt doc2Txt ppt2Txt xls2Txt Pdf2Txt

File Format

Detector

DocInformationExtractor

WorldAnalyzer

PowerPointAnalyzer

ExcelAnalyzer

HwpAnalyzer

PDFAnalyzer

Html2Txt

HtmlAnalyzer

Text2Txt

Character Set Detector

Unicode Out put Buffer Manager(UTF16 Pivot, Surrogate Unsupported, Language Tag Aware)

MS World95,97,

2000/3/7,XP

MS Power Point95,97,

2000/3/7,XP

MS Excel95,97,

2000/3/7,XP

HWP2.3.x,

96,97,2002/7Adobe PDF

OLE2 Inter face (Microsoft Office & Hwp)

Unicode to KS5600

Converter

Unicode to UnicodeConverter

Unicode to JapaneseConverter

Unicode to ChineseConverter

AnyDocument

Html Text

File Stream Interface / Memory Stream Interface

hwp2Txt

Office Code (ASCII, DBS2,UCS2) to Unicode Converter

HWP to Unicode

Converter

Cid to Unicode

Converter

Special to Unicode Converter

InputDocument

Data

InputDocument

Data

ExtractText

Processing

ExtractText

Processing

OutputText DataOutput

Text Data

Page 6: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 6 -

Specification

• Microsoft Office Word (Doc), PowerPoint (Ppt), Excel (Xls) : 96, 97, 2000, XP, 2003, 2007

• (Korean) Hansoft Hangul (HWP) : 2.x, 3.x, 96, 97, wordian, 2002, 2004, 2005, 2007

• Adobe Acrobat PDF : (PDF 1.x)

• Rich Text Format (RTF)

• (Korean) Handy Soft Arirang (HWD)

• (Japanese) JustSystems Ichitaro (JTD) : 8 ~ 12

• Microsoft Document Imaging (MDI)

• Microsoft Outlook Message (MSG)

• Open Office (Odt, Ods, Odp) : 1.x, 2.x

• WordPerfect (WP, WPD) : 4.0, 5.0, 6.0 – X3

• Autodesk Drawing File (DWG) : R11-R14, 2000, 2004, 2005

• Flash Movie File (SWF) : 2 - 8

• Compressed file (ZIP, TAR, GZIP, (Korean) ARZ, BZIP)

• XML/HTML, MHT, CHM, EML, MIME, TEXT, MP3 TAG

▶ Supported Documents

▶ Various platforms(OS) Support ▶ Various compilers Support

• IBM AIX 4.3, 5.x

• RedHat Linux 6.x, 7.x, 8.x, 9.x, 10.x

• RedHat Enterprise Linux 3, 4, 5, 6

• RedHat Fedora ~7

• Solaris 7, 8, 9, 10, 10 x86

• HP-UX IA 11.x

• Windows 95/98/NT/2000/XP/2003/Vista

• gcc 2.x, 3.x, 4.x

• MS-VC 6,7,8

• xlC

• aCC

Page 7: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 7 -

Performance

▶ Test Data

Item doc xls ppt pdf txt Total

Number of file 258 298 58 412 190 1,216

Sum Total (MB) 100 135 70.2 100 3.57 408.77

Average (KB) 398.15 463.99 1,240.26 249.68 19.24 2,371.32

Output File size (KB) 6,521 36,772 650 17,996 5,867 67,806

▶ Windows Test

Item doc xls ppt Pdf Txt Average

TotalTurnaround

time7.15 18.92 2.36 47.24 2.94 17.72

TotalFiltering

time 7.09 18.55 2.35 47.06 2.88 15.58

TotalText-file

output time 0.07 0.37 0.01 0.18 0.06 0.138

AverageTurnaround

time0.03 0.06 0.04 0.11 0.02 0.05

Item doc xls ppt Pdf Txt Average

TotalTurnaround

time5.06 15.91 2.55 27.00 1.06 10.31

TotalFiltering

time 5.00 15.64 2.54 26.65 1.02 10.17

TotalText-file

output time 0.06 0.27 0.01 0.35 0.04 0.14

AverageTurnaround

time0.02 0.05 0.04 0.06 0.005 0.03

▶ Linux TestOS : WindowsXP Professional SP2CPU : Intel Pentium4 2.33GMemory : 256 MB

OS : Redhat Fedora Core 6 Kernel 2.6.18CPU : Intel Pentium 4 3.0GHzMemory : 256 MB

(Unit : second) (Unit : second)

Page 8: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 8 -

Software that works!

▶ Example case▶ Customer▶ Synap Next™

Page 9: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 9 -

Example Case

1. Search Business

▶ Search Engine, Portal site• Electronic document retrieval system : A Search engine can index text from document files without

Word Processor• Search result preview Naver (portal :7 year continuation), Empas (Portal :5Year), Daum (Portal :5Year), Happycampus (Document

Votal :2Year) 3Soft (K2 Search Engine:4Year), Korea Wisenut (Search : 4Year), KONAN (Search : 4Year), OPENBASE

(Search :5Year) Fujitsu Korea (Search), Daumsoft (Search :5Year), Diquest (Search :5Year), REPIA (Search :5Year), etc..

Page 10: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 10 -

Example Case

▶ Desktop Search

▶ KMS, EDMS, EKP, CMS

• Filter embedded in Desktop Search can help peoples find documents.

• NHN, Empas and Konan Technology use filter for it’s Desktop Search.

• KMS, EDMS, EKP and CMS is supported for searching documents files by filter

• Knowledge Cube, Shinsegae I&C and OnTheIT adopt filter for it’s products.

Page 11: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 11 -

Example Case

2. Security Business

Router Firewall Switch Server(DB, Mail, File)

PC

▶ Filter embedded in Solution can inspect security documents, e-mail and contents on networks.

• Web appliance firewall(WAF) : Secui.com, Piolink, Monitorapp, PentaSecurity, Inca Internet, Winstechnet, etc

• Spam Mail : Mobizen, JIRAN soft, Terracetech, etc

• Personal information management system(PIMS) : Sentineltechnology, Xcerenet, expernet, winnerdime, etc

Page 12: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 12 -

Customer

• Internet Service

• Search Engine Solution & SI

• Security Solution

• KM, EKP, CMS Solution

• Government Agency

National SecurityResearch Institute

Electronics and telecommunicationsResearch Institute

Korea InformationSecurity Agency

Korea Industrial Technology Foundation

Page 13: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 13 -

Synap NextTM Products

SYNAP NEXTTM

Document Filter

Extracts text / document-information from various document files

MS-Office word / Power point / Excel, HWP, PDF, HTML, Zip etc

Various platforms(OS) Support

Converter

Convert document files to HTML/XML

MS-Office word / Power point / Excel, HWP

Various platforms(OS) Support

Web OfficeIt's expected to open beta service in March/2008.

Work documents, Spreadsheets and Presentations at web

Page 14: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 14 -

Software that works!

▶ Contact us

Page 15: - Document Filter Ver 3.x COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED.

COPYRIGHT © 2007 SYNAPSOFT.COM ALL RIGHTS RESERVED - 15 -

Contact US

Synapsoft Corporation

Rm.706, Woolim e-BIZ Center II, 184-1, Guro 3-dong, Guro-gu, Seoul 152-769, Korea

TEL) 82-2-890-3400 FAX) 82-2-890-3414

Homepage : http://www.synap.co.kr , Blog : http://synap.tistory.com

• Sungyeon Lee

TEL) 82-2-890-3406 E-Mail) [email protected]

▶ Technical Consulting & Support

▶ Sales • Jaesung Kim

TEL) 82-2-890-3402 E-Mail) [email protected]