Simulated annealing a proof of convergence pdf




















There are numerous papers discussing applications of SA to various problems. We have already mentioned that SA is extensively used in image processing. In order to give an indication of its performance, we will review some of the work concerning the application of SA to combinatorial optimization problems.

In a comprehensive study of SA, Johnson et al. Johnson et al. In general, the performance of SA was mixed: in some problems, it outperformed the best known heuristics for these problems, and, in other cases, specialized heuristics performed better. More specifically:. For the GPP, SA obtains final solutions that are at best some 5 percent better than those obtained by the best of the more traditional algorithms e.

For sparse graphs, SA was better than repeated applications of the Kernighan-Lin heuristic, which is based on ideas of local optimization, whereas for some structured graphs the Kernighan-Lin heuristic was better. For the graph coloring problem, SA produces final solutions that are competitive with those obtained by a tailored heuristic the one by Johri and Matula , which is considered the best one for this problem.

However, computation times for SA are considerably longer than those of the specialized heuristic. For the traveling salesman problem, SA consistently outperforms solutions found by repeated application of iterative improvement, based on 2-opt or 3-opt transitions, but it is a consistent loser when compared with the well-known algorithm of Lin and Kernighan The latter is based on k -opt transitions, and at each iteration it decides dynamically the value of k.

Another interesting point is that the choice of the cooling schedule influences the quality of solution obtained. In Laarhoven and Aarts Another observation is that the computation times can be excessive for some problems. In addition to the above mentioned developments in image processing, SA and various alternative versions based roughly on it have been used in statistical applications.

Bohachevsky et al. Many researchers have considered SA as a tool in the development of optimal experimental designs. Recent examples include Currin et al. Variants of SA based on Bayesian ideas have been proposed by Laud et al.

Overall, SA is a generally applicable and easy-to-implement probabilistic approximation algorithm that is able to produce good solutions for an optimization problem, even if we do not understand the structure of the problem well. We believe, however, that more research, both theoretical and experimental, is needed to assess further the potential of the method.

Bohachevsky, I. Johnson, and M. Stein , Generalized simulated annealing for function optimization, Technometrics 28 , Cerny, V. Theory Appl. Chiang, T. Chow , On eigenvalues and optimal annealing rate, Math. Connors, D. Kumar , Balance of recurrence order in time-inhomogeneous Markov chains with applications to simulated annealing,. Currin, C. Mitchell, M. Morris, and D.

Ylvisaker , Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments, J. Faigle, U. Geman, S. Pattern Anal. Gidas, B. Hajek, B. Sasaki , Simulated annealing—to cool or not, Syst.

Holley, R. Stroock , Simulated annealing via Sobolev inequalities, Commun. Kusuoka, and D. Stroock , Asymptotics of the spectral gap with applications to the theory of simulated annealing, J.

Jeng, F. Theory 36 , Johnson, D. Aragon, L. McGeoch, and C. Schevon , Optimization by simulated annealing: An experimental evaluation, Part II: Graph coloring and number partitioning , Oper. Schevon , Optimization by simulated annealing: An experimental evaluation, Part III: The traveling salesman problem, in preparation.

Johri, A. Kernighan, B. Lin , An efficient heuristic procedure for partioning graphs, Bell Syst. Kirkpatrick, S. Gelett, and M. Vecchi , Optimization by simulated annealing, Science , Kushner, H. Laud, P. Berliner, and P. Goel , A stochastic probing algorithm for global optimization, in Proc. Revised version to appear in J. Liggett, T. Metropolis, N.

Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller , Equation of state calculations by fast computing machines, J. Meyer, R.

Nachtsheim , Constructing exact D-optimal experimental designs by simulated annealing, Amer. Mitra, D. Romeo, and A. Sangiovanni-Vincentellim , Convergence and finite-time behaviour of simulated annealing, Adv.

Sacks, J. Gupta, ed. Sasaki, G. Hajek , The time complexity of maximum matching by simulated annealing, J. Tsitsiklis, J. Fleming and P. Lions, eds. Reidel, Dordrecht, Netherlands. Boender, E. Aarts, and A. Rinnooy Kan , A Bayesian approach to simulated annealing, Probab. Ventcel, A. Nauk SSSR , Some of the hardest computational problems have been successfully attacked through the use of probabilistic algorithms, which have an element of randomness to them. Concepts from the field of probability are also increasingly useful in analyzing the performance of algorithms, broadening our understanding beyond that provided by the worst-case or average-case analyses.

This book surveys both of these emerging areas on the interface of the mathematical sciences and computer science. It is designed to attract new researchers to this area and provide them with enough background to begin explorations of their own. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book. Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter. Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available. Do you enjoy reading reports from the Academies online for free? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Probability and Algorithms Chapter: 2 Simulated Annealing. Get This Book. Visit NAP. Looking for other ways to read this? No thanks. Suggested Citation: "2 Simulated Annealing. Probability and Algorithms. Simulated Annealing. Page 18 Share Cite. Page 19 Share Cite. Page 20 Share Cite. Page 21 Share Cite. Page 22 Share Cite. More precisely, in Section 3 we compute the infinitesimal generator of the noisy simulated annealing algorithm. In Section 4 we compare it to the one of the noise-free simulated annealing algorithm from [ Holley88Simu ].

This enables us to derive a differential inequality for a L 2 distance between the distributions of the two previously mentioned processes. In the same section, we show how to tune the parameters of the algorithm in order to optimize the performance bound and give the corresponding computational cost.

In Section 6 we propose some numerical insight on synthetic and real data experiments. We first present our extended version of the simulated annealing to the stochastic case, whose pseudo-code can be found in Algorithm 1. What we mean by good will be specified in the definition of 2. The algorithm explores the state space in the following manner. The time t is then updated using independent exponential random variables, enabling us to consider the NSA as a continuous time Markov process.

To state the convergence of Algorithm 1 , we first need to describe formally the framework we are working in. Notations introduced in this section are valid for the whole paper unless mentioned explicitly. We can make a few remarks about the different notations. The construction of N t ensures that its value is a strictly positive integer at all times. The reason why we choose to have a randomly sized sample for the Monte Carlo estimation procedure is rather technical.

It enables generating a continuous transition probability as it can be noticed in Equation 3 and ease the formulation of the infinitesimal generator Equation 7.

For any x in E , we denote S x the set of its direct neighbours. Considering a finite search space 2. Nevertheless, it could be replaced by coercivity assumptions on the function J , which could be more general but not really well suited for the application we are looking for.

It is our most restrictive assumption. Nevertheless it is in line with previous works on noisy global optimization for example: [ gutjahrsimulated ] , [ homemvariable ] or [ finkinverse ].

It corresponds to a historical use of simulated annealing for problems with huge finite search space like for the traveling salesman problem [ aartsboltzmann ].

Mimicking [ Holley88Simu ] , we might however relax this assumption of finiteness. Nevertheless it requires more technicalities as in [ aartssimulated ] and this is left for future work. We assume that the algorithm can visit and start from every point in the solution space through the connection assumption 2.

The proposition law 2. The irreducibility of q 0 implies the fact that one can go from any state x to any other state y using the neighbourhood structure S , in a finite number of steps. This last two assumptions are inherited from the classical Metropolis-Hasting sampling algorithm which corresponds to the NSA algorithm with no cooling mechanism and no noise. They ensure that a run in this setting, starting from any point of the search space, converges to a stationary distribution which is the Gibbs measure associated to J.

The assumption about 2. It reflects the practical setting where a simulation code crashes out of the definition domains. We associate infinite costs to crashes and thus 2. There is no reason to expect that the noisy context would be more favorable than the deterministic one. As suggested by the definition of 2.

First, for pedagogical purposes, we omit the temperature evolution and noisy measurements. The NSA algorithm then becomes a simpler Markov chain exploring the state space. This reflects the transition mechanism introduced at the beginning of this section.

As the process is in fact a continuous one, we must also consider the time component. NSA jumps happen at stochastic times and the probability of acceptance depends on these times. Combining the law of the jumping times and the previous mechanism, we can make their joint transition probability explicit:. This is a similar construction to the one of the classical simulated annealing process [ hajekcooling ].

The state transition mechanism must also reflect the estimation procedure, therefore the form of Equation 3 differs from Equation 1. The jumping times, or evaluation times of the process happen at times defined by the sequence T k cf. The expected value from the formula comes from the fact that, as mentioned in the definition of 2.

Finally, we obtain the NSA process by associating the two sub-processes as follows:. Let P x y be the set of paths from x to y.

Minimizing this quantity over the set of possible paths P x , y , gives us the elevation of the cheapest path going from x to y. Denote this elevation by:. As mentioned before, it also represents the maximal depth of a well not containing a fixed global minimum.

The definition provided here is equivalent to the classical one, i. A proof of this statement can be found in Appendix B. In addition set:. Consider the settings of Section 2. This theorem is a natural extension of the result provided by [ Holley88Simu ].

There are two main interesting facts to point out. First, we obtain a balance between the expected number of Monte Carlo simulations at each step of the algorithm and the inverse of the temperature, i. Second, the convergence is stated in terms of a bound on the probability of not returning an optimal solution. Using the concentration speed of the Gibbs measure one can deduce a rate of convergence of the algorithm. Also the theorem provides an insight on how the algorithm could be used in practice.

A run of parallel noisy simulated annealing would have a probability of returning a bad solution that would decrease in the power of the number of runs. Nevertheless this benefit should be traded with an additional selection cost. Indeed, if we obtain K solutions retrieved by K parallel NSA realization, we still face the problem of selecting the best one. We only access estimates of the costs associated to each solution. The proof of this theorem is divided into three parts.

First, in Section 3 , we compute the infinitesimal generator of the classical Equation 11 and noisy simulated annealing Equation 7. The semi-group characterization of its generator is given in the following definition:.

Using similar notations to the ones of Section 2. This is the natural extension of the simulated annealing process with discrete jumping times [ hajekcooling ] to the continuous time process. In this configuration, the jumping times are drawn from an i.

In the homogeneous configuration, i. The extension to the generator of the non-homogeneous process is not straightforward. Therefore we propose to detail the computations. By definition, for any bounded function f :. Since T k is a sum of independent exponential variables of parameter 1 , one can remark that H t is in fact a Poisson process of parameter 1.

In order to compute the above limit, we begin by calculating a more explicit form of the probabilities above. The second case is slightly more involved. Also in what follows, for a random variable Y we denote f Y its probability distribution.

Using these notations and the fact that. In the following we use the classical O. Putting all the terms together and replacing them in Equation 2 , we can rewrite the infinitesimal generator as follows:. Using the fact that f is bounded, E finite and the upper bound given by Equation 10 , one can easily check that the second term is zero.

Hence we obtain:. The only necessary property of this transition probability is its continuity with respect to t. Therefore, following the same argument,one can deduce 7.

Here we can see the relevance of the randomness of N t. The fact that for a temperature schedule that decreases slowly enough, the process generated by the classical Simulated Annealing converges to the set of global minima of J is well known.

The Noisy Simulated Annealing is a similar algorithm, built on the same principles except that the values of the function J are replaced by an estimation each time its computation is needed. Therefore a tight relation exists between both approaches.

Using the relations given by Equation 11 and Equation 7 , the quantity of interest is:. Thus the main result of this section is the following lemma. Before going into the proof of this lemma, we present some preliminaries. There are no comments yet. Ioana Gavra 2 publications. Related Research. Lecchini-Visintini , et al. Wenpin Tang , et al. Rosemarie Velik , et al. Kevin G. Jamieson , et al. Michael C. Choi , et al.

Brian Swenson , et al. Ruggero Bellio , et al. It aims at building a sequence of elements from E whose last element is drawn from a uniform probability law on the subset of global minima of J.



0コメント

  • 1000 / 1000