如何将WEKA实例附加到对象上?
我可以使用WEKA库 https://weka.sourceforge.io/doc.dev/overview-summary.html
public static Instances createWekaInstances(List<Ticket> tickets, String name) {
// Create numeric attributes "x" and "y" and "z"
Attribute x = new Attribute("x"); //sqrt of row pos
Attribute y = new Attribute("y"); // section cv
// Create arrayList of the above attributes
ArrayList<Attribute> attributes = new ArrayList<Attribute>();
attributes.add(x);
attributes.add(y);
// Create the empty datasets "ticketInstances" with above attributes
Instances ticketInstances = new Instances(name, attributes, 0);
ticketInstances.setClassIndex(ticketInstances.numAttributes() - 1);
for (Ticket ticket : tickets) {
// Create empty instance with attribute values
Instance inst = new DenseInstance(ticketInstances.numAttributes());
// get the Ticket
Ticket t = ticket;
// Set instance's values for the attributes "x", "y" and so on
inst.setValue(x, Math.sqrt(t.getRowPosition()));
inst.setValue(y, t.getSectionCVS());
// Set instance's dataset to be the dataset "ticketInstances"
inst.setDataset(ticketInstances);
// Add the Instance to Instance
ticketInstances.add(inst);
}
return ticketInstances;
}
我能够使用
Instances neighbors = tree.kNearestNeighbours(ticketInstances.get(indexToSearch), 2);
但是,它返回了一个实例看起来像 - &gt; {0 2.44949,1 0.4}
因此,我无法将其关联到对象。因此,是否有一种“ WEKA”附加ID或其他内容的方式,因此我可以知道在此实例列表中哪个对象与目标对象最接近?
更新
好,这样做似乎适用于我的用例
BallTree bTree = new BallTree();
try{
bTree.setInstances(dataset);
EuclideanDistance euclideanDistance = new EuclideanDistance();
euclideanDistance.setDontNormalize(true);
euclideanDistance.setAttributeIndices("2-last");
euclideanDistance.setInstances(dataset);
bTree.setDistanceFunction(euclideanDistance);
} catch(Exception e){
e.printStackTrace();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
WEKA没有概念
weka.core.instance
对象的唯一ID,而是需要创建一个允许您识别行的附加属性(例如,票务ID或具有唯一的数字属性值)。您可以使用。
从您的代码中,您似乎只是在使用最近的邻居搜索而没有任何分类器或群集(对于这些分类器或集群,您将使用
filteredClassifier
/filteredClusterer
方法来删除ID从用于构建模型的数据中的属性),因此您需要在 demandfunction 属性用于距离计算。这是通过向setAttributeIndices(String)
方法提供属性范围来完成的。如果您的ID属性是第一个属性,则将使用2-LAST
。Weka has not concept of unique IDs for
weka.core.Instance
objects, instead you need to create an additional attribute that will allow you to identify your rows (e.g., the ticket ID or a numeric attribute with unique values).You can use the AddID filter to add a numeric attribute to your dataset that will contain such an ID, as mentioned in the Weka wiki article on Instance ID.
From your code it seems that you are just using the nearest neighbor search without any classifier or cluster involved (for these, you would use the
FilteredClassifier
/FilteredClusterer
approach to remove the ID attribute from the data that is used for building the model), therefore you need to specify in the DistanceFunction which attributes to use for the distance calculation. This is done by supplying an attribute range to thesetAttributeIndices(String)
method. If your ID attribute is the first one, then you would use2-last
.