GDR4GED

GDR4GED

Graph Data Repository For Graph Edit Distance

RFAI - LIFAT - Tours / France

To precisely evaluate the accuracy of Graph Edit Distance (GED) methods, Ground-Truth with information about the quality of the matching between pairs of graphs (low level information) is required in addition to higher level information (label of the object represented by the graph, classification rates) usually provided and used as GT to evluate the methods.
Most of the publicly available Graph repositories with associated ground-truths are dedicated to evaluating graph classification or exact graph matching methods and so the matching correspondences as well as the distance between each pair of graphs are not directly evaluated.
This dataset called GDR4GED consists in a graph database repository annotated with low level information like graph edit distances values (optimal and sub-optimal) and also the corresponding node to node matchings (Edit Paths).
Best distances and vertex-vertex matchings coming from 6 methods (3 exact and 3 approximate) computed on pairs of graphs coming from different subsets of GREC, Mutagenicity, Protein and CMU datasets are provided for the precise evaluation of the scalability of the GED methods.
The edit costs used for all these computation have been defined according to [4]. The java source codes of the used GED cost functions are provided within each repository.
More information about the repository is available in [1] - see reference below

Figure 1: GT provided for each pair of graphs

Figure 2: Some statitistics about GDR4GED

A set of performance evaluation metrics that can be computed using this dataset to assess the performance of GED methods is also proposed in association to this repository.
These metrics are computed under time and memory constraints.
- For a specific matching between 2 graphs: Deviation & Matching Dissimilarity compared to the reference (GT)
- For a set of graphs: Number of best found solutions,Number of optimal solutions, Number of unfeasible solutions, Number of Time-Out and Out-Of-Memory cases,Mean number of explored nodes, Mean running time, Running time-Deviation plot
More information about these metrics is available in [1] - see reference below.

Figure 3 : Illustration of the proposed metrics (deviation and matching dissimilarity)i

GREC - Contains 5 subsets from the GREC database [2], each of which has 10 graphs.
Mutagenicity - Contains 8 subsets from the Mutagenicity (MUTA) database [2], each of which has 10 graphs.
Protein - Contains 4 subsets from the Protein database [2], each of which has 10 graphs.
CMU - Contains 111 graphs from the CMU house database [3].
Exemples of evaluation Results

We hope that this repository will evolve in the future, So, please note the version you are using for your test.

Downloading and condition of uses

This database is publicly available and free for research staff.
All publications and works that use this database must reference the paper [1] - see references below
Also, please inform us about your work and, if possible, send a copy of your publication to the contacts mentioned below. We will eventually put a list of references on this web page.
You can also credit the "GDR4GED project" as the source of the data by referencing this web page in your publication.
Limitation of Liability

THE IMAGES CONTAINED IN THIS DATABASE IS PROVIDED 'AS IS' WITHOUT WARRANTY OF ANY KIND. THE ENTIRE RISK IS ASSUMED BY THE USER, AND IN NO EVENT WILL RFAI BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL OR DIRECT DAMAGES SUFFERED IN THE COURSE OF USING THE DATABASE.
Permission to use but not reproduce or distribute the database is granted to all researchers given that the following steps are properly followed:
Permission is NOT granted to reproduce the database or posted into any other webpage.
None economical profit can be obtained from this database.

[1] Zeina Abu-Aisheh, Romain Raveaux, Jean Yves Ramel, A Graph Database Repository and Performance Evaluation Metrics for Graph Edit Distance. IAPR Internatioanl workshop on Graph Based Representation. Beijing 'China). GbR2015. PDF - Slides of the talk
[2] Kaspar Riesen, Horst Bunke, Iam graph database repository for graph based pattern recognition and machine learning (2008) 287-297.
[3] CMU house and hotel datasets. http://vasc.ri.cmu.edu/idb/html/motion.
[4] K. Riesen, PhD manuscript, Graph Classification and Clustering Based on Vector Space Embedding, Univiersity of Bern. 2010

Technical questions about the database, the format, or problems obtaining the data should be directed to the database editors:

Lab. LIFAT - PolytechTours
64, Av. Jean Portalis
37200 TOURS - FRANCE
Tel: +33 2.47.36.14.26