An Efficient Mean Field Approach to the Set Covering Problem

A mean field feedback artificial neural network algorithm is developed and explored for the set covering problem. A convenient encoding of the inequality constraints is achieved by means of a multilinear penalty function. An approximate energy minimum is obtained by iterating a set of mean field equations, in combination with annealing. The approach is numerically tested against a set of publicly available test problems with sizes ranging up to 5x10^3 rows and 10^6 columns. When comparing the performance with exact results for sizes where these are available, the approach yields results within a few percent from the optimal solutions. Comparisons with other approximate methods also come out well, in particular given the very low CPU consumption required -- typically a few seconds. Arbitrary problems can be processed using the algorithm via a public domain server.


Introduction
The set covering problem (SCP) is a well known NP-hard combinatorial optimization problem, which represents many real-world resource allocation problems. Exact solutions can be obtained by e.g. a branch-and-bound approach for modestly sized problems. For larger problems various approximative schemes have been suggested (see e.g. [1,2]). In this paper we develop a novel approach based on feedback Artificial Neural Networks (ANN), derived from the mean field approximation to the thermodynamics of spin systems.
ANN is a computer paradigm that has gained a lot of attention during the last 5-10 years. Most of the activities have been directed towards feed-forward architectures for pattern recognition or function approximation. ANN, in particular feedback networks, can also be used for difficult combinatorial optimization problems (e.g. [5]- [10]). Here ANN introduces a new method that, in contrast to most existing search and heuristics techniques, is not based on exploratory search to find the optimal configuration. Rather, the neural units find their way in a fuzzy manner through an interpolating, continuous space towards good solutions. There is a close connection between feedback ANN and spin systems in statistical physics. Consequently, many mathematical tools used for dealing with spin systems can be applied to feedback ANN. Two steps are involved when using ANN for combinatorial optimization: 1. Map the problem onto an energy function, e.g.
where S = {s i ; i = 1 . . . N } is a set of binary spin variables s i ∈ {0, 1}, representing the elementary choices involved in minimizing E, while the weights w ij encode the costs and constraints.
2. To find configurations with low E, iterate the mean field (MF) equations where T is a fictitious temperature while V = {v i }, where v i ∈ [0, 1] represents the thermal average s i T , and allows for a probabilistic interpretation.
Eqs. (1,2) only represent one example. More elaborate encodings have been considered, e.g. based on Potts spins allowing for more general basic decisions elements than simple binary ones [16]. A propagator formalism based on Potts neurons has been developed for handling topological complications in e.g. routing problems [9][6].
The ANN approach for SCP that we develop here differs from the one that was successfully used for the somewhat related knapsack problem in [13,12], in particular with respect to encoding the constraints. Whereas a non-linear step-function was used in [13,12], we will here use a multilinear penalty, which in addition to being theoretically more appealing, also appears to be very efficient. Furthermore, an automatic procedure for setting the relevant T -scale is devised.
The algorithm is extensively tested against a set of publicly available benchmark problems [14] with sizes (rows×columns) ranging from 200×1000 to 5000×10 6 . The approach yields results, typically within a few percent from the exact optimal solutions, for sizes where these are available. Comparisons with other approximate methods also come out well. The algorithm is extremely rapid -the typical CPU demand is only a few seconds (on a 400MHz Pentium II).
A public domain WWW server has been set up, where arbitrary problems can be solved interactively.
This paper is organized as follows: In Sect. 2 we define the set covering problem, and in Sect. 3 we describe its encoding in terms of a neural network energy function and discuss the mean field treatment. Sect. 4 contains numerical explorations and comparisons. A brief summary is given in Sect. 5. Appendices A and B contain a derivation of the mean field equations, and some algorithmic implementation issues, respectively. Tables from the numerical explorations are found in Appendix C, while Appendix D contains pointers and instructions for the WWW server. It should be stressed that this paper is self-contained -no prior knowledge of feedback neural networks or the mean field approximation is necessary.

The Set Covering Problem
The set covering problem (SCP) is the problem of finding a subset of the columns of an M xN zero-one matrix A = {a ki ∈ {0, 1}; k = 1, . . . , M ; i = 1, . . . , N } that covers all rows at a minimum cost, based on a set of column costs {c i ; i = 1, . . . , N }. SCP is conveniently described using a set of binary variables, S = {s i ∈ {0, 1}; i = 1, . . . , N }. More precisely, SCP is defined as follows: with Eq. (4) states that at least one column must cover a particular row. The special case where all costs c i are equal is called the unicost SCP. There is subclass of SCPs that has a nice graphical interpretation: If the zero-one matrix A has the property that each row contains exactly two 1's then we can interpret A as the vertex-edge matrix of a graph with N vertices and M edges. The unicost SCP is then a vertex covering problem, where the task is to find the minimal number of vertices that covers all edges of the graph. SCPs (including weighted vertex covering) are NP-hard combinatorial optimization problems. If the inequalities of Eq. (4) are replaced by equalities, one has the set partition problem (SPP). Both SCP and SPP have numerous resource allocation applications.
The following quantities will be used later: Column Sums: Row Sums: 3 The Mean Field Approach

The Energy Function for SCP
We start by mapping the SC problem of Eqs. (3,4,5) onto a spin energy function E(S) (step 1 in the introduction), The first term yields the total cost and the second one represents the covering constraint of Eq. (4) by imposing a penalty if a row is not covered by any column.
The constraint term is a multilinear polynomial in the spin variables s i , i.e. it is a linear combination of products s 1 s 2 . . . s K of distinct spins. This is attractive from a theoretical point of view, and the differentiability of E enables a more quantitative analysis of the dynamics of the mean field algorithm.
An alternative would be to implement the inequality constraints using a piecewise linear function [13], with φ(x) = xΘ(x) = x if x > 0 and 0 otherwise. This yields a non-differentiable energy function and generally an inferior performance as compared to the polynomial representation (9).

The Statistical Mechanics Framework
The next step is to minimize E(S). Using some local updating rule will most often yield a local minimum close to the starting point, with poor solutions as a result. Simulated annealing (SA) [8] is one way of escaping from local minima since it allows for uphill moves in E. In SA a sequence of configurations S is generated according to a stochastic algorithm, such as to emulate the probability distribution where the sum runs over all possible configurations S ′ . The parameter T (temperature) acts as a noise parameter. For large T the system will fluctuate heavily since P (S) is very flat. For the SCP this implies that the sequence contains mostly poor and infeasible solutions. On the other hand, for a small T , P (S) will be narrow, and the sequence will be strongly dependent upon the initial configuration and contain configurations only from a small neighborhood around the initial point. In SA one generates configurations while lowering T (annealing), thereby diminishing the risk of ending up in a suboptimal local minimum. This is quite CPU-consuming, since one has to generate many configurations for each temperature following a careful annealing schedule (typically T k = T 0 / log(1 + k) for some T 0 ) in order to be certain to find the global minimum.
In the mean field (MF) approach the costly stochastic SA is approximated by a deterministic process. MF also contains an annealing procedure. The original binary variables s i are replaced by continuous mean field variables v i ∈ [0, 1], with a dynamics given by iteratively solving of the MF equations for each T .
An additional advantage of the MF approach is that the continuous mean field variables can evolve in a space not accessible to the original variables. The intermediate configurations at non-zero T have a natural probabilistic interpretation.

Mean Field Theory Equations
Our objective is now to minimize E using the MF method. The binary spin variables s i are replaced by mean field variables v i , representing solutions to the MF equations (a derivation of these is given in where The set V (i) denotes the complementary set {v j , j = i}. From Eq. (9) we get The MF equations are solved iteratively while annealing in T . It should be noted that the equation for v i does not contain any feedback, i.e no explicit dependence on v i itself. This yields a smooth convergence, and it is usually only necessary with a few iterations at each value of T . What remains to be specified are the parameters α and T . The latter will be discussed next and we return to the choice of α in connection with the numerical explorations in Sect. 4.

Critical Temperatures
In the limit of high temperatures, T → ∞, the MF variables {v i } will, under the dynamics defined by iteration of Eq. (12), converge to a trivial symmetric fixed point with v i = 1/(1 + exp(0)) = 1 2 , corresponding to no decision taken. At a finite but high T , the corresponding fixed point will typically deviate slightly from the symmetric point.
For many problems, a bifurcation occurs (indicative of a transition from a disordered phase to an ordered one) at a critical temperature T c , where the trivial fixed point loses stability and other fixed points emerge, which as T → 0 converge towards definitive candidate solutions to the problem, in terms of v i ∈ {0, 1}. For some problems, a cascade of bifurcations occur, each at a distinct critical temperature, but in the typical case there is a single bifurcation.
It is then of interest to estimate of the position of T c , which defines a suitable starting point for the MF algorithm. Such an estimate can be obtained by means of a linear stability analysis for the dynamics close to the fixed point. For the special case of a symmetric unicost SCP with constant row and column sums for the matrix A, there will be a sharp transition around where ρ is defined in (6).
For non-unicost problems, T c is harder to estimate, and there might even be no bifurcation at all. For such problems, a suitable initial T is instead determined by means of a fast preliminary run of the algorithm (see below).

Implementation Details
The annealing schedule for T and the value of the constraint parameter α have to be determined before we can run the algorithm. The former is accomplished by a geometric decrease of T , where k is set to 0.80 and T 0 is determined by a fast prerun of the algorithm (see below). The number of iterations of Eqs. (12) for each value of T is not fixed; T is lowered only when all v i have converged.
In order to ensure a valid solution at low T , the size of the constraint term in Eq. (14) must be larger than the largest cost c max that is part of the solution. Using a too large α will however reduce the solution quality since E is then dominated by the covering constraint. Our choice of α therefore depends on c max . Ideally, M/(ρM ) = 1/ρ columns would suffice to cover each row of A. If we further optimistically assume that the 1/ρ smallest costs can be chosen for the solution, c max can easily be found. However, for most problems we need more than 1/ρ columns, which makes it difficult to estimate c max except for unicost problems where all column costs are equal.
To determine (an approximate) c max for a non-unicost SCP, we perform a fast prerun with a smaller annealing factor k = 0.65, and with α = 1.01. From this prerun one can also obtain an estimate the critical temperature T c as the T where the saturation (defined below) deviates from 0. The second run is then initated at T 0 = 2 T c , thereby avoiding unnecessary updates of v i at high T .
This procedure for setting α also requires rescaling of the costs; for all problems we set c i → c i / max j (c j ). See Table 1 for a summary of the parameters used.
The evolution of the MF variables {v i } is conveniently monitored by the saturation Σ,  Table 1: Summary of the parameters k, α and T 0 used in the algorithm.
A completely "undecided" configuration, Σ = 0, means that every v i has the value 1/2. During the annealing process, as the v i approach either 1 or 0, Σ converges to 1. The transition between Σ = 0 and Σ = 1 is usually smooth for a generic SCP. However, when a bifurcation is encountered (see above), Σ can change abruptly; this occurs e.g. for unicost SCP where there is no natural ordering among the costs {c i }.  Figure 1: Evolution of the mean field variables v i as T is lowered for 4.1 (a) and cyc09 (b) respectively. Also shown is Σ (Eq. (17)). Note that the number of iterations is intentionally large for visualization purposes.
A summary of the algorithm can be found in Appendix B, while Appendix D gives the address of and instructions for a WWW server, where the MF algorithm is applied to user-defined SCPs.

Numerical Results
The performance of the algorithm is evaluated using 16 problem sets found in the OR-Library benchmark database [14]. These 16 sets consist of 91 problems, out of which 19 are unicost SCP. The algorithm is coded in C and the computations are done on a 400MHz Pentium II PC. The details of the OR-library problems are given in Table C1 and our results can be found in Tables C2, C3 and C4 in Appendix C. The optimal or currently best known values are taken from refs. [1,2,4,7,3,11].
For each of the test SCPs, 10 trials of our algorithm are performed. In Tables C2-C4, the best and the average costs are listed for each problem. For 10 of the problems our method found the optimal solution. The MF results typically are within a few percent of the optimal solutions, as can be seen in Table 2. Large relative deviations from optimum are seen in the unicost CLR problems, which appear to be difficult for our approach. However, the optimal (integer) costs for these problems are low (23 -26); this gives a large effect on the relative deviations even for a small change in the found costs. Decreasing the obtained costs by unity will change the mean relative deviation from 16% to 10%. It is also important to notice that we use a common set of algorithm parameters (α, k, T 0 ) for all unicost problems, without any parameter optimization for each problem. Another set of parameters might be more advantageous for the CLR problems.  Table 2: Mean relative deviation from the optimal (or best known) solution.
In Compared to other heuristic approaches, ours does not find the optimal cost as often as e.g the genetic algorithm [2]. It is however very competitive with respect to speed. In ref. [4] nine different approximation algorithms were tested on a large number of unicost problems (including the set considered here); our approach is comparable to the top ones in both performance and speed.

Summary
We have developed a mean field feedback neural network approach for solving set covering problems. The method is applied to a standard set of benchmark problems available in the OR-library database. The method is also implemented in a public WWW server.
The inequality constraints involved are conveniently handled by means of a multi-linear penalty function that fits nicely into the mean field framework. The bifurcation structure of the mean field dynamics involved in the neural network approach is analyzed by means of a linearized dynamics. A simple and self-contained derivation of the mean field equations is provided.
High quality solutions are consistently found throughout a range of problem sizes ranging up to 5 × 10 3 rows and 10 6 columns for the OR-library problems without having to fine-tune the parameters, with a time consumption scaling as the number of non-zero matrix elements. The approach is extremely efficient, typically requiring a few seconds on a Pentium II 400 MHz computer.
The mean field approach to SCP can easily be modified to apply to the related, and more constrained, set partitioning problem. One simply has to replace the inequality constraint term with one that handles the equality constraint present in the set partitioning problem.

Appendix A. Mean Field Approximation
Here follows for completeness a derivation of the mean field equations (see e.g. [15]). Let E (S) be an energy function of a set of binary decision variables (spins) S = {s i |s i ∈ {0, 1}, i = 1, . . . , N }. If we assume a Boltzmann probability distribution for the spins, the average s i T will be given by where the sums run over all possible configurations S. We can manipulate this expression to obtain, where S (i) denotes the set {s j , j = i} of all spins but s i . If we now perform the sums over s i in the numerator, we get So far there are no approximations, the expression for s i T has just been rewritten. Eq. (A3) states that the expectation value of s i is equal to the expectation value of a nonlinear function f of all the other spins. The mean field approximation consists of approximating the expectation value f S (i) by f S (i) . With v i denoting s i , and V (i) the complementary set {v j , j = i}, this amounts to making the replacement in Eq. (A3). This results in a set of self-consistency equations for V, the mean field equations which in general must be solved numerically, e.g. by iteration.
The mean field approximation often becomes exact in the limit of infinite range interactions where each spin variable interacts with all the others. This can be seen from ∆E i (S (i) ) which then becomes a sum of many (approximately) independent random numbers, and a central limit theory can be applied.

Appendix B. Algorithm Details
Here follows a summary of the MF annealing algorithm for finding approximate solutions to set covering problems. The procedure presented below is used for all problems in this study except for the large rail problems where numerical problems caused by limited machine precision comes into play. The problem arises when calculating the product in Eq. (14), which for large problems can contain many factors. A work-around is implemented by a simple truncation, This numerical fix is only used when calculating the ∆E i V (i) in Eq. (14).
Algorithmic outline of our approach: Appendix C.

Benchmark Results
Problem   Table C2: Results for problems 4 -D (see Table C1). Solution time refers 400 MHz Pentium II CPU seconds and includes the prerun for non-unicost problems.  Table C4: Results for the unicost problems E -STS (see Table C1). Solution time refers 400 MHz Pentium II CPU seconds. The notation current best value may for some instances mean optimal value.