Musk2 Data Set

Description

The problem consists of determining whether a drug molecule will bind strongly to a target protein. Each molecule may adopt a wide range of shapes or conformations. A positive molecule has at least one shape that can bind well (although it is not known which one) and a negative molecule means none of its shapes can make the molecule bind well. This problem could be represented in a very natural way in MIL settings: each molecule would be a bag and the conformations it can adopt would be the instances in that bag.

Dataset

The original data set is partitioned using 10-fold cross-validation procedure five times. Thus, five different partitions of 10-fold cross validation are available

 

10-fold cross validation
Files
Procedure 1 musk2-10-proc1.arff
Procedure 2 musk2-10-proc2.arff
Procedure 3 musk2-10-proc3.arff
Procedure 4 musk2-10-proc4.arff
Procedure 5 musk2-10-proc5.arff