A Short Introduction to Weka
Natural Language Processing Thursday, November 5th
What is weka?
Java-based Machine Learning Tool
Implements numerous classifiers 3 modes of operation
GUI Command Line Java API (not discussed here)
Google: weka java
weka Homepage
[Link]
To run:
java -Xmx1024M -jar ~cs4705/bin/[Link] &
.arff file format
[Link]
% 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE sepallength NUMERIC sepalwidth NUMERIC petallength NUMERIC petalwidth NUMERIC class {Iris-setosa,Iris-versicolor, Iris-virginica}
@DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa
.arff file format
@attribute attrName {numeric, string, <nominal>, date}
numeric: a number nominal: a (finite) set of strings, e.g.
{Iris-setosa,Iris-versicolor, Irisvirginica}
string: <arbitrary strings>
date: (default ISO-8601) yyyy-MMddTHH:mm:ss
Example Arff Files
~cs4705/bin/weka-3-4-11/data/
[Link] [Link] [Link]
To Classify with weka GUI
1. Run weka GUI
1. (in Unix: java jar [Link])
[Link] 'Start'
[Link]... [Link]-click on Result list entry
a.'Save result buffer'
[Link] 'Explorer'
3.'Open file...'
[Link] 'Classify' tab
5.'Choose' a classifier
[Link] options
b.'Save model'
Classify
Some classifiers to start with.
NaiveBayes JRip J48 SMO
Find References by selecting a classifier
Use Cross-Validation!
Analyzing Results
Important tools for Homework 3
Accuracy
Correctly classified instances
F-measure Confusion matrix
Save model
Visualization
Running weka from the Command Line
[Link]
Running an N-fold cross validation experiment
java -cp ~cs4705/bin/[Link] [Link] -t [Link] -x N -i
Using a predefined test set
java -cp ~cs4705/bin/[Link] [Link] -t [Link] -T [Link]
Saving the model
java -cp ~cs4705/bin/[Link] [Link] -t [Link] -d [Link]
Classifying a test set
java -cp ~cs4705/bin/[Link] [Link] -l [Link] -T [Link]
Getting help
java -cp ~cs4705/bin/[Link] [Link] -?
Homework 3 Weka Workflow
T1
TN
S1 S2 SN
results
Your Feature Extractor
Your Feature Extractor
.arff
Weka
best model
Test .arff
Weka
results
Preprocessing (you)
Experimentation (you)
Grading (us)
Tips for Homework Success
Start early Read instructions carefully Start simply Your system should always work
80/20 Rule
Add features incrementally
This way, you always have something you can turn in.