Each data set consists in four components
All data sets have been generated from the Latent Block Model with normally distributed entries. They differ in the type of model used:
Then, the setups differ with respect to:
The following figure shows the original data table, which is unsorted, on the left. The left and top sidebars respectively represent row and column class labels. The table is displayed sorted in the center, using an element of the set of possible row and column labels. The right-hand-side summary represents the block average of the sorted data table, that is, the average of the entries sharing identical row and column assignments.
As stated in the list above, we provide several (2000) possible row and column labels, sampled from the distribution of labels given the table (see details in paper). These labels do not cover exhaustively the distribution of labels, but enable to estimate expected performances with respect to the distribution of true labels.
Finally, the parameters that fully specify the generative model employed are provided for information as they may be useful for interpretation purposes.
Each archive .zip corresponds to a type of data sets defined by the error, the size and the number of clusters. It contains 20 folders corresponding to 20 different data sets. Each folder is named by AAA_err_BBB_n_CCC_g_DDD_tab_EEE, where :
The folder contains 4 files in ascii format (entries are tab-separated and lines end with line feeds):