Graph Data Repository For Graph Edit Distance
RFAI - LIFAT - Tours / France
- To precisely evaluate the
accuracy of Graph Edit Distance (GED)
methods, Ground-Truth with information about the quality
of the matching between pairs of graphs (low level information) is
required in addition to higher level information (label of the object
represented by the graph, classification rates) usually provided and
used as GT to evluate the methods.
- Most of the publicly
available Graph repositories with associated ground-truths are
dedicated to evaluating graph classification or exact graph matching
methods and so the matching correspondences as well as the distance
between each pair of graphs are not directly evaluated.
- This dataset called GDR4GED
consists in a graph
database repository annotated with low level information like graph
edit distances values (optimal and sub-optimal) and also the corresponding node to node
matchings (Edit Paths).
- Best distances and vertex-vertex
matchings coming from 6 methods (3 exact and 3 approximate) computed on
pairs of graphs coming from different subsets of GREC, Mutagenicity,
Protein and CMU datasets are provided for the precise evaluation
of the scalability of the GED methods.
edit costs used for all these computation have been defined according
to . The java source codes of the used GED cost functions are
provided within each repository.
- More information about the repository is available in  - see reference below
Figure 1: GT provided for each pair of graphs
Figure 2: Some statitistics about GDR4GED
- A set of performance evaluation metrics
that can be computed using this dataset to assess the performance of
GED methods is also proposed in association to this repository.
- These metrics are computed under time and memory constraints.
- For a specific matching between 2 graphs: Deviation & Matching Dissimilarity compared to the reference (GT)
a set of graphs: Number of best found solutions,Number of optimal
solutions, Number of unfeasible solutions, Number of Time-Out and
Out-Of-Memory cases,Mean number of explored nodes, Mean running time, Running time-Deviation plot
- More information about these metrics is available in  - see reference below.
Figure 3 : Illustration of the proposed metrics (deviation and matching dissimilarity)i
The GDR4GED (Version 2015_04) benchmark is
organized as follows:
- GREC -
Contains 5 subsets from the GREC database , each of which has 10
- Mutagenicity -
Contains 8 subsets from the Mutagenicity (MUTA) database , each of
which has 10 graphs.
- Protein -
Contains 4 subsets from the Protein database , each of which has 10
- CMU -
Contains 111 graphs from the CMU house database .
- Exemples of evaluation Results
We hope that this repository will evolve in the future, So, please note the version you are using for your test.
Downloading and condition of uses
- This database is publicly available and free for research staff.
publications and works that use this database must reference the
paper  - see references below
- Also, please inform us about your work and, if possible, send a copy of your publication to the contacts mentioned below. We will eventually put a list of references on this web page.
- You can also credit the "GDR4GED project" as the source of the data by referencing this web page in your publication.
- Limitation of Liability
IMAGES CONTAINED IN THIS DATABASE IS PROVIDED 'AS IS' WITHOUT WARRANTY
OF ANY KIND. THE ENTIRE RISK IS ASSUMED BY THE USER, AND IN NO EVENT
WILL RFAI BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL OR DIRECT DAMAGES
SUFFERED IN THE COURSE OF USING THE DATABASE.
to use but not reproduce or distribute the database is granted to all
researchers given that the following steps are properly followed:
- Permission is NOT granted to reproduce the database or posted into any other webpage.
- None economical profit can be obtained from this database.
Zeina Abu-Aisheh, Romain Raveaux, Jean Yves Ramel, A Graph Database
Repository and Performance Evaluation Metrics for Graph Edit Distance.
IAPR Internatioanl workshop on Graph Based Representation. Beijing
'China). GbR2015. PDF - Slides of the talk
Kaspar Riesen, Horst Bunke, Iam graph database repository for graph
based pattern recognition and machine learning (2008) 287-297.
-  CMU house and
hotel datasets. http://vasc.ri.cmu.edu/idb/html/motion.
-  K. Riesen, PhD manuscript, Graph Classification and Clustering Based on Vector Space Embedding, Univiersity of Bern. 2010
questions about the database, the format, or problems obtaining
the data should be directed to the database editors:
- Zeina Abu-Aisheh, "email@example.com"
- Romain Raveau, "firstname.lastname@example.org"
Lab. LIFAT -
64, Av. Jean Portalis
37200 TOURS - FRANCE
Tel: +33 18.104.22.168.26