Research support for an informal expert group on product ...
Research ExperT
description
Transcript of Research ExperT
RESEARCH EXPERT
Paul VarcholikJoshua Thompson
EEL 6883 – Software Engineering IISpring 2009
Background Academic Research
Literature Reviews Conferences Journals
Material collected from the Internet Google Scholar
How do researchers organize the papers they find?
Hard copies On-Disk Directory Structures
Background (cont.) Needs
Storage and quick retrieval of research papers Collaboration with colleagues User-provided reviews Annotated references
Existing Tools 2collab.com Mendeley Zotero Papers (Mac-only) Wikipedia comparison
High-Level ArchitectureDatabase
Data Laye
rUI
5 Assemblies 1 Common 1 Data Layer 1 Unit Test 2 UI
1 Web 1 Windows Forms (WinForms)
First Iteration Requirements gathering, initial design,
and implementation Web-based system Foundation set, key features available Large scope required feature pull-back UI lacking polish
Second Iteration Windows Forms (WinForms) UI Same base code – database and data
layer with some extensions Attempts at auto-extraction of meta-data
Iteration Metrics Comparison
First Iteration Second Iteration
180 files ~4,500 ELOC 57 classes and
enumerations 15 database tables 88 stored procedures 87 unit tests
Files ~9,650 ELOC 92 classes and
enumerations 16 database tables 100 stored procedures 96 unit tests
UI Comparison
Web Windows
Unit Testing
Discussion (cont.) Low complexity
Cyclomatic complexity Risk1-10 A simple, low risk program
11-20 A more complex program, moderate risk
21-50 A complex, high risk program
Greater than 50 An un-testable program (very high risk)
Assembly Cyclomatic complexity Unit Test 0.73
Data Layer 1.10
Common 1.54
Windows Client 1.16
Average 1.13
Discussion (cont.) High maintainability
Assembly MaintainabilityUnit Test 90.75
Data Layer 80.69
Common 82.64
Windows Client 64.34
Average 67.98 *
You can think of the score as a percentage grade, numbers closer to 100 are better.
* The formula for average complexity is logarithmic (the numbers don’t add up like sums)
PDF Parsing Metadata
Issue HeadingTitleAuthorsAbstractKeywords
PDF Parsing (cont.) Using PDFBox libraries for PDF reading
and manipulation Three methods for parsing PDFs
AutomaticXML basedUser-driven image based
PDF Parsing (cont.) Automatic parsing
Uses heuristics to determine metadata○ Font sizes○ Relative positioning○ Specific tokens
Pros○ No user input required○ Can provide reasonable guesses
Cons○ Makes assumptions○ Does not always work 100%○ Difficulties with text grabbing
PDF Parsing (cont.)
PDF Parsing (cont.) XML Parsing
Paper formats are specified○ Order of metadata○ Relative font sizes○ Token delimiters
Pros○ More effective than automatic parsing○ No direct user input required
Cons○ Requires manual input for each publication
source
PDF Parsing (cont.) User-Driven Image Based Parsing
Display Page 1User draws rectangles around metadataUses automatic parsing as an initial guess
○ User can review/modify the resultsPros
○ Uses automatic and user-driven methodsCons
○ Requires user input
PDF Parsing
Demonstration
Discussion Interesting uses of .NET Reflection Object Registry Difficulties of PDF Parsing
Approaches to resolving these difficulties○ Publication source templates○ User input○ Cut-and-paste
Future Work Integrated meta-data parsing Group-User-Repository access roles Author ranking Advanced searching Annotated references Additional document types (e.g. MS Word) More UI polish
Server selectionReview attachment improvementsAdministration features
Questions?
Research ExpertPaul VarcholikJoshua Thompson
EEL 6883 – Software Engineering IISpring 2009