State-of-the-art optimization algorithms typically come with a set of parameters that need to be set by the user in order to configure the algorithm to the problem that he/she wants to solve. This parametrization typically offers a great flexibility in the algorithmic behavior, making the underlying approach applicable to broad ranges of problems or problem instances. As a downside, the user is left with the task of choosing these parameters, and it is not unusual that a good algorithmic performance requires a solid understanding of how different parameters choices impact the behavior of the algorithm. Obtaining such expert knowledge is a tedious task, which often requires a significant amount of time and experience, a prerequisite that is often not met in practice.
Automated algorithm configuration tools have been developed to aid the users in choosing suitable parameter settings for their applications, thus trading expert's time and knowledge against computational resources and a less person-biased, more scientific approach. Numerous applications in various industrial and academic contexts have shown significant advantages of such methods. Among the most widely used configuration tools are \emph{irace}~\cite{irace}, \emph{ParamILS}~\cite{ParamILS}, \emph{SMAC}~\cite{SMAC}, and \emph{GGA}~\cite{GGA}. Our project will focus on the first-mentioned tool, the \emph{irace} R package, for which Manuel is a co-developer. The main working principle of \emph{irace} is an iterative racing approach: the search for good algorithmic configurations is divided into several steps. In each step a set of potential parameter configurations is tested on a small set of instances. Statistical tests indicate non-promising or poor-performing configurations, which are removed from the race. At the end of each step the surviving configurations are used to sample new configurations, which are added to the race of the next step. The race ends when a user-defined budget is exhausted, and the surviving configurations are suggested to the user. Statistics about the relevance of each individual parameter and dependencies between them are recorded and can be analyzed in a post-processing analysis. irace has been successfully applied in a broad range of contexts, ranging from machine learning over robotics to classical optimization tasks such as traveling salesperson and scheduling problems.
In this project we will address the following questions:
- In the current version of irace the initial parameter configurations are samples uniformly at random. Can we obtain significant performance gains and/or other advantages such as a more robust output from replacing this initialization by a quasi-random procedure?
- The generation of new configuration samples is based on simple unbiased sampling, thus ignoring information that irace has accumulated from previous steps. It seems likely that additional performance gains are possible through a better exploitation of the previous evaluations. How do other optimization procedures compare to the unbiased generation of new candidate configurations? Is this comparison uniform across different budgets and across different dimensions?
- irace has been developed with a machine learning setting of training and test instances in mind. There is, however, an increasing demand in tuning (hyper-)parameters for single instances, e.g., to understand the dependence of suitable parameter choices on the dimension, cf.~\cite{DoerrW18} for an example. Such settings are particularly common in setups where theoretical performance bounds are sought~\cite{DoerrD18chapter}. How well does irace perform on such single-instance settings, and how can we potentially adjust it to cope better with this task?