The general aim of the internship is to exploit expert knowledge regarding properties of optimisation
algorithms and problems, represented in the formal frameworks of ontologies and conceptual
graphs, and to develop tools to automatically extract underlying correlations: the objective
is to allow understanding the reasons why an algorithm is more appropriate than others to solve a
problem depending on its characterisation and possibly to offer new tools to configure optimisation
algorithms.
In this context, the internship work will address the task of frequent pattern mining for data
represented as conceptual graphs, and apply such algorithms to real data representing knowledge
about optimisation algorithms and problems.
Regarding the considered application, the work aims at processing OPTION, the OPTImisation
algorithm benchmarking ONtology1 (Kostovska et al. 2021; 2022): OPTION groups and annotates
benchmarking data from black-box optimisation, hosting more than 200 algorithm performance data
from the BBOB collection of the COCO framework (Hansen et al., 2020) and from various benchmark
suits of the Nevergrad environment (Rapin & Teytaud, 2018). OPTION provides the vocabulary
needed for semantic annotation of the core entities involved in the process of benchmarking black-box
optimisation algorithms, such as algorithms, problems and evaluation measures. These very rich data
can be represented in the formalism of conceptual graphs (see e.g. Chein & Mugnier, 2008) which
offers the advantage of combining a logical semantic and efficient manipulation tools based on graph
theory.
Extracting knowledge from conceptual graphs can take the form of identifying frequent patterns,
i.e. subgraphs that occur frequently and can thus be interpreted as relevant regularities. This relates
to the domain of Frequent Subgraph Mining, for which numerous approaches have been proposed (see
e.g. Jiang et al., 2013). Dedicated algorithms have been proposed for the specific case of taxonomy
labelled graphs (Inokuchi et al., 2000; Cakmak & Ozsoyoglu, 2008; Petermann et al., 2017), that
exploit the particular characteristics of these types of graphs to improve both their efficiency and the
relevance of the extracted patterns. Recent works conducted in the LFI team focused further on the
case of conceptual graphs: the cgSpan algorithm (Faci et al., 2021) exploits the information about
relation arity to decrease efficiently the number of explored pattern candidates, the relation signatures
to prune redundant pattern candidates with little informativeness as well as inference rules to extend
candidates faster. The goal of the internship is to build upon that work and to extend it further.
The internship work aims at designing and implementing a comparison tool of algorithms of frequent
pattern mining and proposing new approaches for the case of conceptual graphs. After validating the
propositions on baseline and synthetic data, the second part of the internship will turn to the real case
of understanding black-box optimisation applying the approach to the OPTION data and developing
an XAI tool for this case.