Class DatasetSplitter
- java.lang.Object
-
- org.apache.lucene.classification.utils.DatasetSplitter
-
public class DatasetSplitter extends java.lang.Object
Utility class for creating training / test / cross validation indexes from the original index.
-
-
Field Summary
Fields Modifier and Type Field Description private double
crossValidationRatio
private double
testRatio
-
Constructor Summary
Constructors Constructor Description DatasetSplitter(double testRatio, double crossValidationRatio)
Create aDatasetSplitter
by giving test and cross validation IDXs sizes
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private Document
createNewDoc(IndexReader originalIndex, FieldType ft, ScoreDoc scoreDoc, java.lang.String[] fieldNames)
void
split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, java.lang.String classFieldName, java.lang.String... fieldNames)
Split a given index into 3 indexes for training, test and cross validation tasks respectively
-
-
-
Constructor Detail
-
DatasetSplitter
public DatasetSplitter(double testRatio, double crossValidationRatio)
Create aDatasetSplitter
by giving test and cross validation IDXs sizes- Parameters:
testRatio
- the ratio of the original index to be used for the test IDX as adouble
between 0.0 and 1.0crossValidationRatio
- the ratio of the original index to be used for the c.v. IDX as adouble
between 0.0 and 1.0
-
-
Method Detail
-
split
public void split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, java.lang.String classFieldName, java.lang.String... fieldNames) throws java.io.IOException
Split a given index into 3 indexes for training, test and cross validation tasks respectively- Parameters:
originalIndex
- anLeafReader
on the source indextrainingIndex
- aDirectory
used to write the training indextestIndex
- aDirectory
used to write the test indexcrossValidationIndex
- aDirectory
used to write the cross validation indexanalyzer
-Analyzer
used to create the new docstermVectors
-true
if term vectors should be keptclassFieldName
- name of the field used as the label for classification; this must be indexed with sorted doc valuesfieldNames
- names of fields that need to be put in the new indexes ornull
if all should be used- Throws:
java.io.IOException
- if any writing operation fails on any of the indexes
-
createNewDoc
private Document createNewDoc(IndexReader originalIndex, FieldType ft, ScoreDoc scoreDoc, java.lang.String[] fieldNames) throws java.io.IOException
- Throws:
java.io.IOException
-
-