File Format
The data sets have the following format (replace DS by the name of the data set):
Let
n= total number of nodesm= total number of edgesN= number of graphs
DS_A.txt(mlines): sparse (block diagonal) adjacency matrix for all graphs, each line corresponds to(row, col)resp.(node_id, node_id). All graphs are undirected. Hence,DS_A.txtcontains two entries for each edge.DS_graph_indicator.txt(nlines): column vector of graph identifiers for all nodes of all graphs, the value in the i-th line is thegraph_idof the node withnode_id iDS_graph_labels.txt(Nlines): class labels for all graphs in the data set, the value in the i-th line is the class label of the graph withgraph_id i
There are optional files if the respective information is available:
DS_node_labels.txt(nlines): column vector of node labels, the value in the i-th line corresponds to the node withnode_id iDS_edge_labels.txt(mlines; same size asDS_A_sparse.txt): labels for the edges inDS_A_sparse.txtDS_node_attributes.txt(nlines): matrix of node attributes, the comma seperated values in the i-th line is the attribute vector of the node withnode_id iDS_edge_attributes.txt(mlines; same size asDS_A.txt): attributes for the edges inDS_A.txtDS_graph_attributes.txt(Nlines): regression values for all graphs in the data set, the value in the i-th line is the attribute of the graph withgraph_id i
The datasets can also we easily accessed from popular graph deep libraries, see here.