File Format
The data sets have the following format (replace DS
by the name of the data set):
Let
n
= total number of nodesm
= total number of edgesN
= number of graphs
DS_A.txt
(m
lines): sparse (block diagonal) adjacency matrix for all graphs, each line corresponds to(row, col)
resp.(node_id, node_id)
. All graphs are undirected. Hence,DS_A.txt
contains two entries for each edge.DS_graph_indicator.txt
(n
lines): column vector of graph identifiers for all nodes of all graphs, the value in the i-th line is thegraph_id
of the node withnode_id i
DS_graph_labels.txt
(N
lines): class labels for all graphs in the data set, the value in the i-th line is the class label of the graph withgraph_id i
There are optional files if the respective information is available:
DS_node_labels.txt
(n
lines): column vector of node labels, the value in the i-th line corresponds to the node withnode_id i
DS_edge_labels.txt
(m
lines; same size asDS_A_sparse.txt
): labels for the edges inDS_A_sparse.txt
DS_node_attributes.txt
(n
lines): matrix of node attributes, the comma seperated values in the i-th line is the attribute vector of the node withnode_id i
DS_edge_attributes.txt
(m
lines; same size asDS_A.txt
): attributes for the edges inDS_A.txt
DS_graph_attributes.txt
(N
lines): regression values for all graphs in the data set, the value in the i-th line is the attribute of the graph withgraph_id i
The datasets can also we easily accessed from popular graph deep libraries, see here.