Stat Design3 18 09

STAT Design (I)

3/18/09

Domain

Application

Technical Service

Weka API MinorThird API

Example Impl

Foundation

Util Math

John Impl Weka Wrapper

lients

Extension of framework provided with the package

Extension of framework done by the user

edu.cmu.statproject.core

edu.cmu.statproject.impl

edu.cmu.statproject.wrapper

edu.cmu.statproject.math

edu.cmu.statproject.util

edu.cmu.statproject.impl.hideki

edu.cmu.statproject.impl.example

edu.cmu.statproject.impl.shilpa

edu.cmu.statproject.wrapper.weka

edu.cmu.statproject.wrapper.minorthird

Proposed packages V1

edu.cmu.statproject.core

edu.cmu.statproject.math

edu.cmu.statproject.util

edu.cmu.statproject.hideki

edu.cmu.statproject.example

edu.cmu.statproject.shilpa

edu.cmu.statproject.weka

edu.cmu.statproject.minorthird

Proposed packages V2

Different implementations of .core

CorpusReader

+ read(filename: String): Corpus+ read(ds: DataSource): Corpus

- documents: List<Document>- pm : PartitionManager- paritionFilter : String

Corpus

+ add(d: Document)+ remove(id: int)+ get (id: int): Document+ getPartitionManager() : PartitionManager+ setPartitionManager(pm: PartitionManager)+ iterator(): Iterator<Document>

Document

- id : int- annotations: List<Annotation>- labels: List<String>- text: String…+ getAnnotations(id: int): List<Annotation>+ getLabels(): List<String>+ getText(): String+ setText(text : String)+ addLabel(label : String)

Annotator

+ annotate (c: Corpus)+ annotate (d: Document)

FeatureExtractor

+ transform(c: Corpus): Dataset+ transform(d:Document): Instance

Dataset

See next diagram for “Machine Learning” side

Annotation

- id : int- begin : int- end : int- label: List<String>

+ getText(): String

edu.cmu.statproject.coreText Processingedu.cmu.statproject.coreText Processing

PartitionManager

DataSource

FeatureExtractor

+ transform(c: Corpus): Dataset+ transform(d: Document): Instance

Dataset

- instances: List<Instance>- featureNames: List<String>- featureIDs: HashMap<String, Integer>- labelNames: List<String>- labelIDs: HashMap<String, Integer>- pm: PartitionManager- paritionFilter : String

+ add(ins: Instance)+ remove(idx: int)+ get(idx: int): Instance+ getPartitionManager() : PartitionManager+ setPartitionManager(pm: PartitionManager)+ iterator() : Iterator<Instance>+ getSubset(partitionName : String) : Dataset

Learner

- name: String- settings : Settings

+ learn(d: Dataset): Model

- name: String

Instance

- id : int- featureIDs: int[]- featureValues: double[]- sequence: int[]- labels: int[]

+ getFeatureIDs() : int []+ getFeatureValues() : double[]+ getSequence() : int[]+ getLabel(): int+ getLabels(): int[]

Classifier

- name: String - m: Model- settings : Settings

+ setModel(m: Model)+ classify(d: Dataset): Classification

Classification

- predictions : List<String> - classifierName: String - modelName: String

ClassificationEvaluator

- settings : Settings

+ eval(cl: Classification, Dataset d) : ClassficationEvaluation

ClassificationEvaluation

edu.cmu.statproject.coreMachine Learningedu.cmu.statproject.coreMachine Learning

PartitionManager

62138745

PartitionManager

- nparts: int- partSizes: int[]- partNames : String[]- itemOrder: int[]

// Constructors# PartitionManager(d : Dataset)# PartitionManager(c : Corpus)# PartitionManager(itemCount : int)

+ size()

// Partitioning+ split(npart : int)+ setPartName(partNo : int, partName : String)+ split(partSizes : int[], partNames : String[])+ split(partRatios : double[] , partNames : String[])

// Cross Validation Methods+ splitForCrossValidation(npart : int, , partNames : String[])+ setCurrentFold(foldNo : int)

// Changing the order that items being processed+ shuffle()+ shuffle(partNo : int)+ shuffle(partName : int)

// Getters and setters are not shown for brevity

edu.cmu.statproject.corePartitioning and Orderingedu.cmu.statproject.corePartitioning and Ordering

Dataset Corpus

partName partNo itemOrder

Dataset d;...// Default split method (without explicit splitter)PartitionManager pm = new PartitionManager(d.size());d.setPartitionManager(pm);pm.shuffle();pm.split(new double{0.8,0.2}, new String[]{”train”,”test”});

// Learn from the training subsetNaiveBayesLearner nblearner = new NaiveBayesLearner();NaiveBayesModel nbModel = nBlearner.learn(d.getSubset(“train”))

// Classify the test subset datasetNaiveBayesClassifier nbCf = new NaiveBayesClassifier(nbModel);NaiveBayesClassification nbCt = nbCf.classify(d.getSubset(“test”));

// Or use cross validationpm.splitForCrossValidation(10, new String[]{”train”,”test”});for (int i = 0; i < 10; ++i) { pm.setCurrentFold(i) nbModel = nBlearner.learn(d.getSubset(“train”)); nbCf.classify(d.getSubset(“test”));}

How Partition Manager Works:

Sample Code:

pm.split( new int{2,3,1,2}, new String[]{”name1”, ”name2”, ”name1”, ”name3”});

Corpus

Dataset

<<interface>>

Persistable

+ save(filename : String)+ load(filename : String)+ save(ds : DataSource)+ load(ds : DataSource)

CorpusWriter

+ write(c: Corpus, filename: String)…

CorpusReader

+ read(filename: String): Dataset…

DatasetReader

+ read(filename: String): Dataset…

DatasetWriter

+ write(d: Dataset, filename: String)…

Classification

PersistencePersistence

Settings

Java.util.Properties

Stat Design3 18 09

Technology

Transcript of Stat Design3 18 09

STAT Working Papers STAT Documents de travail STAT ...

UNITED STATES ENVIRONMENTAL PROTECTION AGENCY …...15 Stat. 221 (1868).: 18 15 Stat. 655 (1868) 5 15 Stat. 673 (1868) passim 16 Stat. 544 (1871) 4 17Stat. 214 (1872) 4 18 Stat. 291

Stat Fax 2600 · 2017-04-09 · SF-2600-woLable.doc AWARENESS TECHNOLOGY INC Creating Better Medical and Optical Instrumentation Stat Fax 2600 Микропланшетный вошер

Stat-6 signaling pathway and not Interleukin-1 mediates multi … · 2017-09-13 · RESEARCH Open Access Stat-6 signaling pathway and not Interleukin-1 mediates multi-walled carbon

Wordpress Meet Responsive Design3

Das LCMS-System EMILeA-stat · 2015-12-09 · 09.05.2003 Möbus, Hartmann 20 Kriterien für LMS und CMS LMS LCMS „e-STAT“ CMS LCMS „ICE“ bfe Oldenburg e-STAT-Konsortium S-Kompatibilität

Stat...Stat ... 100

205 DOI phys. stat. sol. (a) physica status solidichglib.icp.ac.ru/subjex/2012/pdf07/PhysStatSolA-2008-205... · 2012-09-18 · phys. stat. sol.(a) physica205, No. 7, 1566–1571

2008-09 Elementary School Grand Prix Race Stat Sheet 8 ...02May09).pdf · 2008-09 Elementary School Grand Prix Race Stat Sheet Cumulative Score Sheet 8 years (as of 12/31/08) * Must

STAT!Ref ® IT! Powered by Teton Data Systems. Presentation Goals: What is STAT!Ref®? Where is STAT!Ref? How to STAT!Ref® it!

Syllabus - STAT 200 6370 Introduction to Statistics (2178) · MATH 111, MGMT 316, PSYC 200, SOCY 201, STAT 100, STAT 200, STAT 225, or STAT 230. In this course, you will develop a

SAS/STAT 14.3 User’s Guide · 2017-09-22 · Exploring Interpoint Interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8958 ... Fitted Model Validation That Uses

Stat 09,MOVariation

Tent design3

mcclured/Stat Mech/Lecture PDF's/Lecture Set 1... · Created Date: 4/6/2010 11:09:20 AM

STAT LINGK- Slide VI - 2011- Analisis Data Stat Deskriptif 1

Stat Ab of U.S2 · Title: Stat Ab of U.S2.pdf Created Date: 8/19/2004 9:09:51 AM

Stat Anxiety

STAT 400 Discussion 09 Solutions Spring 2018 · STAT 400 Discussion 09 Solutions Spring 2018 1. Consider a 6-pack of soda. Suppose that the amount of soda in each can follows a normal

STAT Working papers STAT Documents de travail STAT ... · STAT Documents de travail STAT Documentos de trabajo No. 95-4 INQUÉRITOS DE POPULAÇAO ACTIVA, EMPREGO, ... 4 Principios