You are currently browsing the tag archive for the ‘Search’ tag.
Four different snapshots (click to enlarge) from one of my latest books, recently published in Japan: Ajith Abraham, Crina Grosan, Vitorino Ramos (Eds.), “Swarm Intelligence in Data Mining” (群知能と データマイニング), Tokyo Denki University press [TDU], Tokyo, Japan, July 2012.
Fig.1 – (click to enlarge) The optimal shortest path among N=1265 points depicting a Portuguese Navalheira crab as a result of one of our latest Swarm-Intelligence based algorithms. The problem of finding the shortest path among N different points in space is NP-hard, known as the Travelling Salesmen Problem (TSP), being one of the major and hardest benchmarks in Combinatorial Optimization (link) and Artificial Intelligence. (V. Ramos, D. Rodrigues, 2012)
This summer my kids just grab a tiny Portuguese Navalheira crab on the shore. After a small photo-session and some baby-sitting with a lettuce leaf, it was time to release it again into the ocean. He not only survived my kids, as he is now entitled into a new World Wide Web on-line life. After the Shortest path Sardine (link) with 1084 points, here is the Crab with 1265 points. The algorithm just run as little as 110 iterations.
Fig. 2 – (click to enlarge) Our 1265 initial points depicting a TSP Portuguese Navalheira crab. Could you already envision a minimal tour between all these points?
As usual in Travelling Salesmen problems (TSP) we start it with a set of points, in our case 1084 points or cities (fig. 2). Given a list of cities and their pairwise distances, the task is now to find the shortest possible tour that visits each city exactly once. The problem was first formulated as a mathematical problem in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many optimization methods.
Fig. 3 – (click to enlarge) Again the shortest path Navalheira crab, where the optimal contour path (in black: first fig. above) with 1265 points (or cities) was filled in dark orange.
TSP has several applications even in its purest formulation, such as planning, logistics, and the manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such as DNA sequencing. In these applications, the concept city represents, for example, customers, soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or a similarity measure between DNA fragments. In many applications, additional constraints such as limited resources or time windows make the problem considerably harder.
What follows (fig. 4) is the original crab photo after image segmentation and just before adding Gaussian noise in order to retrieve several data points for the initial TSP problem. The algorithm was then embedded with the extracted x,y coordinates of these data points (fig. 2) in order for him to discover the minimal path, in just 110 iterations. For extra details, pay a visit onto the Shortest path Sardine (link) done earlier.
Fig. 4 – (click to enlarge) The original crab photo after some image processing as well as segmentation and just before adding Gaussian noise in order to retrieve several data points for the initial TSP problem.
Figure (click to enlarge) – Cover from one of my books published last month (10 July 2012) “Swarm Intelligence in Data Mining” recently translated and edited in Japan (by Tokyo Denki University press [TDU]). Cover image from Amazon.co.jp (url). Title was translated into 群知能と データマイニング. Funny also, to see my own name for the first time translated into Japanese – wonder if it’s Kanji. A brief synopsis follow:
(…) Swarm Intelligence (SI) is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Particle Swarm Optimization (PSO) incorporates swarming behaviours observed in flocks of birds, schools of fish, or swarms of bees, and even human social behaviour, from which the idea is emerged. Ant Colony Optimization (ACO) deals with artificial systems that is inspired from the foraging behaviour of real ants, which are used to solve discrete optimization problems. Historically the notion of finding useful patterns in data has been given a variety of names including data mining, knowledge discovery, information extraction, etc. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Data mining and Swarm intelligence may seem that they do not have many properties in common. However, recent studies suggests that they can be used together for several real world data mining problems especially when other methods would be too expensive or difficult to implement. This book deals with the application of swarm intelligence methodologies in data mining. Addressing the various issues of swarm intelligence and data mining using different intelligent approaches is the novelty of this edited volume. This volume comprises of 11 chapters including an introductory chapters giving the fundamental definitions and some important research challenges. Chapters were selected on the basis of fundamental ideas/concepts rather than the thoroughness of techniques deployed. (…) (more)
Fig.1 – (click to enlarge) The optimal shortest path among N=1084 points depicting a Portuguese sardine as a result of one of our latest Swarm-Intelligence based algorithms. The problem of finding the shortest path among N different points in space is NP-hard, known as the Travelling Salesmen Problem (TSP), being one of the major and hardest benchmarks in Combinatorial Optimization (link) and Artificial Intelligence. (D. Rodrigues, V. Ramos, 2011)
Almost summer time in Portugal, great weather as usual, and the perfect moment to eat sardines along with friends in open air esplanades; in fact, a lot of grilled sardines. We usually eat grilled sardines with a tomato-onion salad along with barbecued cherry peppers in salt and olive oil. That’s tasty, believe me. But not tasty enough however for me and one of my colleagues, David Rodrigues (blog link/twitter link). We decided to take this experience a little further on, creating the first shortest path sardine.
Fig. 2 – (click to enlarge) Our 1084 initial points depicting a TSP Portuguese sardine. Could you already envision a minimal tour between all these points?
As usual in Travelling Salesmen problems (TSP) we start it with a set of points, in our case 1084 points or cities (fig. 2). Given a list of cities and their pairwise distances, the task is now to find the shortest possible tour that visits each city exactly once. The problem was first formulated as a mathematical problem in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many optimization methods. TSP has several applications even in its purest formulation, such as planning, logistics, and the manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such as DNA sequencing. In these applications, the concept city represents, for example, customers, soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or a similarity measure between DNA fragments. In many applications, additional constraints such as limited resources or time windows make the problem considerably harder. (link)
Fig. 3 – (click to enlarge) A well done and quite grilled shortest path sardine, where the optimal contour path (in blue: first fig. above) with 1084 points was filled in black colour. Nice T-shirt!
Even for toy-problems like the present 1084 TSP sardine, the amount of possible paths are incredible huge. And only one of those possible paths is the optimal (minimal) one. Consider for example a TSP with N=4 cities, A, B, C, and D. Starting in city A, the number of possible paths is 6: that is 1) A to B, B to C, C to D, and D to A, 2) A-B, B-D, D-C, C-A, 3) A-C, C-B, B-D and D-A, 4) A-C, C-D, D-B, and B-A, 5) A-D, D-C, C-B, and B-A, and finally 6) A-D, D-B, B-C, and C-A. I.e. there are (N–1)! [i.e., N–1 factorial] possible paths. For N=3 cities, 2×1=2 possible paths, for N=4 cities, 3x2x1=6 possible paths, for N=5 cities, 4x3x2x1=24 possible paths, … for N=20 cities, 121.645.100.408.832.000 possible paths, and so on.
The most direct solution would be to try all permutations (ordered combinations) and see which one is cheapest (using computational brute force search). The running time for this approach however, lies within a polynomial factor of O(n!), the factorial of the number of cities, so this solution becomes impractical even for only 20 cities. One of the earliest and oldest applications of dynamic programming is the Held–Karp algorithm which only solves the problem in time O(n22n).
In our present case (N=1084) we have had to deal with 1083 factorial possible paths, leading to the astronomical number of 1.19×102818 possible solutions. That’s roughly 1 followed by 2818 zeroes! – better now to check this Wikipedia entry on very large numbers. Our new Swarm-Intelligent based algorithm, running on a normal PC was however, able to formulate a minimal solution (fig.1) within just several minutes. We will soon post more about our novel self-organized stigmergic-based algorithmic approach, but meanwhile, if you enjoyed these drawings, do not hesitate in asking us for a grilled cherry pepper as well. We will be pleased to deliver you one by email.
p.s. – This is a joint twin post with David Rodrigues.
Fig. 4 – (click to enlarge) Zoom at the end sardine tail optimal contour path (in blue: first fig. above) filled in black, from a total set of 1084 initial points.
Springer book “Swarm Intelligence in Data Mining” (Studies in Computational Intelligence Series, Vol. 34) published in late 2006, is receiving a fair amount of attention, so much so, that early this year, Tokyo Denki University press (TDU) decided to negotiate with Springer the translation rights and copyrights in order to released it over their country in Japanese language. The Japanese version will now become shortly available, and I do hope – being one of the scientific editors – it will receive increasing attention as well in Japan, being it one of the most difficult and extraordinary real-world areas we could work nowadays among computer science. Multiple Sequence Alignment (MSA) within Bio-informatics is just one recent example, Financial Markets another. The amount of data – 100000 DVD’s every year -, CERN’s Large Hadron Collider (LHC) will collect is yet another. In order to transform data into information, and information into useful and critical knowledge, reliable and robust Data Mining is more than ever needed, on our daily life.
Meanwhile, I wonder how the Japanese cover design will be?! Starting with it’s own title, which appears to be pretty hard to translate. According to Yahoo BabelFish the Japanese characters (群れの知性) – derived among other language scripts from Kanji – correspond to the English sentence “Swarm Intelligence“. I wonder if this translation is correct or not, since “swarm” in itself, is kind of difficult to translate. Some meanings of it point out to a spaghetti dish, as well, which kind of makes some logic too. Moreover, the technical translation of it is also difficult. I guess the best person to handle the translation (at least from the list of colleagues around the world I know) is Claus Aranha. (IBA Lab., University of Tokyo). Not only he works in Japan for several years now, as well as some of his works focus this precise area.
SIDM book (Swarm Int. in Data Mining) focus on the hybridization of these two areas. As you may probably now, Data Mining (see also; Knowledge Extraction) refers to a collection of techniques – many of them classical – that envisions to tackle large amounts of data, in order to perform classification, clustering, sorting, feature selection, search, forecasting, decision, meaningful extraction, association rule discovery, sequential pattern discovery, etc. In recent years however (1985-2000), state of the art Artificial Intelligence such as Evolutionary Computation was also used, since some of his problems could be seen as – or properly translated to – optimization problems (namely, combinatorial). The same now happens with Swarm Intelligence, since some of it’s unique self-organizing distributed features (allowing direct applications over Grid Computing) seems ideal to tackle some of the most complex data mining problems we may face today.
For those willing for more, I will leave you with it’s contents (chapters), a foreword to this book by James Kennedy (one of the founding fathers of PSO – Particle Swarm Optimization, along with Russell C. Eberhart, and Yuhui Shi) which I vividly recommend (starting with the sentence “Science is a Swarm“!), as well as a more detailed description to it:
Swarm Intelligence (SI) is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Particle Swarm Optimization (PSO) incorporates swarming behaviors observed in flocks of birds, schools of fish, or swarms of bees, and even human social behavior, from which the idea is emerged. Ant Colony Optimization (ACO) deals with artificial systems that is inspired from the foraging behavior of real ants, which are used to solve discrete optimization problems. Historically the notion of finding useful patterns in data has been given a variety of names including data mining, knowledge discovery, information extraction, etc. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Data mining and Swarm intelligence may seem that they do not have many properties in common. However, recent studies suggests that they can be used together for several real world data mining problems especially when other methods would be too expensive or difficult to implement. This book deals with the application of swarm intelligence methodologies in data mining. Addressing the various issues of swarm intelligence and data mining using different intelligent approaches is the novelty of this edited volume. This volume comprises of 11 chapters including an introductory chapters giving the fundamental definitions and some important research challenges. Chapters were selected on the basis of fundamental ideas/concepts rather than the thoroughness of techniques deployed.
The eleven chapters are organized as follows. In Chapter 1, Grosan et al. present the biological motivation and some of the theoretical concepts of swarm intelligence with an emphasis on particle swarm optimization and ant colony optimization algorithms. The basic data mining terminologies are explained and linked with some of the past and ongoing works using swarm intelligence techniques. Martens et al. in Chapter 2 introduce a new algorithm for classification, named AntMiner+, based on an artificial ant system with inherent selforganizing capabilities. AntMiner+ differs from the previously proposed AntMiner classification technique in three aspects. Firstly, AntMiner+ uses a MAX-MIN ant system which is an improved version of the originally proposed ant system, yielding better performing classifiers. Secondly, the complexity of the environment in which the ants operate has substantially decreased. Finally, AntMiner+ leads to fewer and better performing rules. In Chapter 3, Jensen presents a feature selection mechanism based on ant colony optimization algorithm to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. The proposed method is applied to two very different challenging tasks, namely web classification and complex systems monitoring. Galea and Shen in the fourth chapter present an ant colony optimization approach for the induction of fuzzy rules. Several ant colony optimization algorithms are run simultaneously, with each focusing on finding descriptive rules for a specific class. The final outcome is a fuzzy rulebase that has been evolved so that individual rules complement each other during the classification process. In the fifth chapter Tsang and Kwong present an ant colony based clustering model for intrusion detection. The proposed model improves existing ant-based clustering algorithms by incorporating some meta-heuristic principles. To further improve the clustering solution and alleviate the curse of dimensionality in network connection data, four unsupervised feature extraction algorithms are also studied and evaluated. Omran et al. in the sixth chapter present particle swarm optimization algorithms for pattern recognition and image processing problems. First a clustering method that is based on PSO is discussed. The application of the proposed clustering algorithm to the problem of unsupervised classification and segmentation of images is investigated. Then PSO-based approaches that tackle the color image quantization and spectral unmixing problems are discussed.
In the seventh chapter Azzag et al. present a new model for data clustering, which is inspired from the self-assembly behavior of real ants. Real ants can build complex structures by connecting themselves to each others. It is shown is this paper that this behavior can be used to build a hierarchical tree-structured partitioning of the data according to the similarities between those data. Authors have also introduced an incremental version of the artificial ants algorithm. Kazemian et al. in the eighth chapter presents a new swarm data clustering method based on Flowers Pollination by Artificial Bees (FPAB). FPAB does not require any parameter settings and any initial information such as the number of classes and the number of partitions on input data. Initially, in FPAB, bees move the pollens and pollinate them. Each pollen will grow in proportion to its garden flowers. Better growing will occur in better conditions. After some iterations, natural selection reduces the pollens and flowers and the gardens of the same type of flowers will be formed. The prototypes of each gardens are taken as the initial cluster centers for Fuzzy C Means algorithm which is used to reduce obvious misclassification errors. In the next stage, the prototypes of gardens are assumed as a single flower and FPAB is applied to them again. Palotai et al. in the ninth chapter propose an Alife architecture for news foraging. News foragers in the Internet were evolved by a simple internal selective algorithm: selection concerned the memory components, being finite in size and containing the list of most promising supplies. Foragers received reward for locating not yet found news and crawled by using value estimation. Foragers were allowed to multiply if they passed a given productivity threshold. A particular property of this community is that there is no direct interaction (here, communication) amongst foragers that allowed us to study compartmentalization, assumed to be important for scalability, in a very clear form. Veenhuis and Koppen in the tenth chapter introduce a data clustering algorithm based on species clustering. It combines methods of particle swarm optimization and flock algorithms. A given set of data is interpreted as a multi-species swarm which wants to separate into single-species swarms, i.e., clusters. The data to be clustered are assigned to datoids which form a swarm on a two-dimensional plane. A datoid can be imagined as a bird carrying a piece of data on its back. While swarming, this swarm divides into sub-swarms moving over the plane and consisting of datoids carrying similar data. After swarming, these sub swarms of datoids can be grouped together as clusters. In the last chapter Yang et al. present a clustering ensemble model using ant colony algorithm with validity index and ART neural network. Clusterings are visually formed on the plane by ants walking, picking up or dropping down projected data objects with different probabilities. Adaptive Resonance Theory (ART) is employed to combine the clusterings produced by ant colonies with different moving speeds. We are very much grateful to the authors of this volume and to the reviewers for their tremendous service by critically reviewing the chapters. The editors would like to thank Dr. Thomas Ditzinger (Springer Engineering Inhouse Editor, Studies in Computational Intelligence Series), Professor Janusz Kacprzyk (Editor-in-Chief, Springer Studies in Computational Intelligence Series) and Ms. Heather King (Editorial Assistant, Springer Verlag, Heidelberg) for the editorial assistance and excellent cooperative collaboration to produce this important scientific work. We hope that the reader will share our excitement to present this volume on ‘Swarm Intelligence in Data Mining’ and will find it useful.
April, 2006
Ajith Abraham, Chung-Ang University, Seoul, Korea
Crina Grosan, Cluj-Napoca, Babes-Bolyai University, Romania
Vitorino Ramos, IST Technical University of Lisbon, Portugal
Fig. – (Above) A 3D toroidal fast changing landscape describing a Dynamic Optimization (DO) Control Problem (8 frames in total). (Bellow) A self-organized swarm emerging a characteristic flocking migration behaviour surpassing in intermediate steps some local optima over the 3D toroidal landscape (above), describing a Dynamic Optimization (DO) Control Problem. Over each foraging step, the swarm self-regulates his population and keeps tracking the extrema (44 frames in total). [extra details + PDF]
[] Vitorino Ramos, Fernandes, C., Rosa, A.C., Abraham, A., Computational Chemotaxis in Ants and Bacteria over Dynamic Environments, in CEC´07 – Congress on Evolutionary Computation, IEEE Press, USA, ISBN 1-4244-1340-0, pp. 1009-1017, Sep. 2007.
Chemotaxis can be defined as an innate behavioural response by an organism to a directional stimulus, in which bacteria, and other single-cell or multicellular organisms direct their movements according to certain chemicals in their environment. This is important for bacteria to find food (e.g., glucose) by swimming towards the highest concentration of food molecules, or to flee from poisons. Based on self-organized computational approaches and similar stigmergic concepts we derive a novel swarm intelligent algorithm. What strikes from these observations is that both eusocial insects as ant colonies and bacteria have similar natural mechanisms based on stigmergy in order to emerge coherent and sophisticated patterns of global collective behaviour. Keeping in mind the above characteristics we will present a simple model to tackle the collective adaptation of a social swarm based on real ant colony behaviors (SSA algorithm) for tracking extrema in dynamic environments and highly multimodal complex functions described in the well-know De Jong test suite. Later, for the purpose of comparison, a recent model of artificial bacterial foraging (BFOA algorithm) based on similar stigmergic features is described and analyzed. Final results indicate that the SSA collective intelligence is able to cope and quickly adapt to unforeseen situations even when over the same cooperative foraging period, the community is requested to deal with two different and contradictory purposes, while outperforming BFOA in adaptive speed. Results indicate that the present approach deals well in severe Dynamic Optimization problems.
(to obtain the respective PDF file follow link above or visit chemoton.org)
Video – Thousands of starlings birds gathering in flocks, flying in formations while emerging complex patterns on S.W. Scotland (more photos & video by/at Fresh Pics, 2007). Here for an artificial version with different purposes. They are not birds, instead an entirely different new animal.
[…] In contrast to negative feedback, positive feedback (PF) generally promotes changes in the system (the majority of self-organizing SO systems use them). The explosive growth of the human population provides a familiar example of the effect of positive feedback. The snowballing autocatalytic effect of PF takes an initial change in a system (due to amplification of fluctuations; a minimal and natural local cluster of objects could be a starting point) and reinforces that change in the same direction as the initial deviation. Self-enhancement, amplification, facilitation, and autocatalysis are all terms used to describe positive feedback [9]. Another example could be provided by the clustering or aggregation of individuals. Many birds, such as seagulls nest in large colonies. Group nesting evidently provides individuals with certain benefits, such as better detection of predators or greater ease in finding food. The mechanism in this case is imitation (1): birds preparing to nest are attracted to sites where other birds are already nesting, while the behavioral rule could be synthesized as “I nest close where you nest”. The key point is that aggregation of nesting birds at a particular site is not purely a consequence of each bird being attracted to the site per se. Rather, the aggregation evidently arises primarily because each bird is attracted to others (check for further references on [7,9]). On social insect societies, PF could be illustrated by the pheromone reinforcement on trails, allowing the entire colony to exploit some past and present solutions. Generally, as in the above cases, positive feedback is imposed implicitly on the system and locally by each one of the constituent units. Fireflies flashing in synchrony [49] follow the rule, “I signal when you signal”, fish traveling in schools abide by the rule, “I go where you go”, and so forth. In humans, the “infectious” quality of a yawn of laughter is a familiar example of positive feedback of the form, “I do what you do”. Seeing a person yawning (2), or even just thinking of yawning, can trigger a yawn [9]. There is however one associated risk, generally if PF acts alone without the presence of negative feedbacks, which per si can play a critical role keeping under control this snowballing effect, providing inhibition to offset the amplification and helping to shape it into a particular pattern. Indeed, the amplifying nature of PF means that it has the potential to produce destructive explosions or implosions in any process where it plays a role. Thus the behavioral rule may be more complicated than initially suggested, possessing both an autocatalytic as well as an antagonistic aspect. In the case of fish [9], the minimal behavioral rule could be “I nest where others nest, unless the area is overcrowded” (HEY !! here we go again to the El Farol Bar problem!). In this case both positive and negative feedback may be coded into the behavioral rules of the fish. Finally, in other cases one finds that the inhibition arises automatically, often simply from physical constraints. […]
in, V. Ramos et al., “Social Cognitive Maps, Swarm Collective Perception and Distributed Search on Dynamic Landscapes“.
(1) See also on this subject the seminal sociological work of Gabriel Tarde; Tarde, G., Les Lois de l’Imitation, Eds. du Seuil (2001), 1st Edition, Eds. Alcan, Paris, 1890.
(2) Similarly, Milgram et al (Milgram, Bickerman and Berkowitz, “Note on the Drawing Power of Crowds of Different Size”, Journal of Personality and Social Psychology, 13, 1969) found that if one person stood in a Manhattan street gazing at a sixth floor window, 20% of pedestrians looked up; if five people stood gazing, then 80% of people looked up.
(to obtain the respective PDF file follow this link or visit chemoton.org)
Recent Comments