You are currently browsing the tag archive for the ‘Classification’ tag.

Vitorino Ramos - Citations2016Jan

2016 – Up now, an overall of 1567 citations among 74 works (including 3 books) on GOOGLE SCHOLAR (https://scholar.google.com/citations?user=gSyQ-g8AAAAJ&hl=en) [with an Hirsh h-index=19, and an average of 160.2 citations each for any work on my top five] + 900 citations among 57 works on the new RESEARCH GATE site (https://www.researchgate.net/profile/Vitorino_Ramos).

Refs.: Science, Artificial Intelligence, Swarm Intelligence, Data-Mining, Big-Data, Evolutionary Computation, Complex Systems, Image Analysis, Pattern Recognition, Data Analysis.

Portugal at World Expo 1998 by Vitorino Ramos

Images – Portugal (1A – top left, original input satellite image below), geodesically stretched by one of my Mathematical Morphology algorithms, in order to represent real travel times from each of the 18 regional districts in Portugal, to the rest of the territory.  From the 18, three capital districts are represented here. As departing from Lisbon (1B – top right), from Faro (1C – South of Portugal, bottom left), and from Bragança (1D – North-East region, bottom right). [World Exposition, Lisbon, Territory pavilion, 1998].

Recently one of my colleagues who knows I love maps, pointed me to an old TV show “Câmara Clara“, a cultural TV show by RTP2, at one of the main public Portuguese TV stations. Main reason for my interest was his current theme: Maps. My second reason was their guests: Joaquim Ferreira do Amaral (an ex-Minister with a passion for maps) and Manuel Lima, which wonderful work on information visualization I know for a long time (on one of my past posts I referred to one of his ongoing working sites: visualcomplexity).
 

For my complete and positive surprise, their interview ended with some new examples, being one of my old works referred (from 57m 12s up to 60m 26s on http://camaraclara.rtp.pt/#/arquivo/131 ). It’s a long story on how I ended doing these kind of maps. Part of it, it’s here. During 1998, the World Exposition was in Portugal, and I got invited to present a set of 18 different maps from the Portuguese territory. So I decided to geodesically stretch the travel distances from any of the 18 different capital districts, to the rest of the territory, in order to represent travel Time not Distance, or Distance as time. For that,  I have coded new algorithms based on Mathematical Morphology (MM), taking in account every road (from main roads to regional, check some images below), from which I applied different MM operators.

Unfortunately, many of those maps are now lost. I did tried hard to find them from my old digital archives, but only found those above, which represent the departure from Lisbon (the Capital), Faro and Bragança. So, if by any reason you happen to have some photos from the 1998’s World Exposition in Lisbon, inside the Territory pavilion, I would love to receive them.

Os Portugueses e a Arte dos Mapas - Câmara Clara 131 - Maio 10 2009Video (LINK) – “Câmara Clara” TV show by journalist Paula Moura Pinheiro dedicated to maps (nº 131), at one of the main public Portuguese TV stations (RTP2), broadcasted on May 3 2009, in Portuguese.

A sketchy summary of this TV program went on something like this (the poor translation is mine): At the year Google promises to launch his first and exhaustive world-wide open-access digital cartography of the African continent, Joaquim Ferreira do Amaral, passioned by the Portuguese World Discover History and collector of historical maps, joins as guest with Manuel Lima, the Portuguese information designer that recently Creativity magazine has considered one of the top bright minds along with Google and Amazon founders, debating the importance of “navigating” reality with a map. From the Portuguese cartographic history, know to be the best in the XV and XVI centuries, up to the actual state-of-the-art in this area, from which Manuel Lima is considered to be one of the top researchers at global scale.

Original + Layers Portugal at World Expo 1998 by Vitorino Ramos

Four different snapshots (click to enlarge) from one of my latest books, recently published in Japan: Ajith Abraham, Crina Grosan, Vitorino Ramos (Eds.), “Swarm Intelligence in Data Mining” (群知能と  データマイニング), Tokyo Denki University press [TDU], Tokyo, Japan, July 2012.

Figure (click to enlarge) – Cover from one of my books published last month (10 July 2012) “Swarm Intelligence in Data Mining” recently translated and edited in Japan (by Tokyo Denki University press [TDU]). Cover image from Amazon.co.jp (url). Title was translated into 群知能と  データマイニング. Funny also, to see my own name for the first time translated into Japanese – wonder if it’s Kanji. A brief synopsis follow:

(…) Swarm Intelligence (SI) is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Particle Swarm Optimization (PSO) incorporates swarming behaviours observed in flocks of birds, schools of fish, or swarms of bees, and even human social behaviour, from which the idea is emerged. Ant Colony Optimization (ACO) deals with artificial systems that is inspired from the foraging behaviour of real ants, which are used to solve discrete optimization problems. Historically the notion of finding useful patterns in data has been given a variety of names including data mining, knowledge discovery, information extraction, etc. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Data mining and Swarm intelligence may seem that they do not have many properties in common. However, recent studies suggests that they can be used together for several real world data mining problems especially when other methods would be too expensive or difficult to implement. This book deals with the application of swarm intelligence methodologies in data mining. Addressing the various issues of swarm intelligence and data mining using different intelligent approaches is the novelty of this edited volume. This volume comprises of 11 chapters including an introductory chapters giving the fundamental definitions and some important research challenges. Chapters were selected on the basis of fundamental ideas/concepts rather than the thoroughness of techniques deployed. (…) (more)

Figure – Web Usage Mining of Monash’s Univ. web site using self-organized ant-based clustering (initial and final classification maps). Web usage Data was collected from the Monash University’s Web site (Australia), with over 7 million hits every week.

[] Vitorino Ramos, Ajith Abraham, Evolving a Stigmergic Self-Organized Data-Mining, in ISDA-04, 4th Int. Conf. on Intelligent Systems, Design and Applications, Budapest, Hungary, ISBN 963-7154-30-2, pp. 725-730, August 26-28, 2004.

Self-organizing complex systems typically are comprised of a large number of frequently similar components or events. Through their process, a pattern at the global-level of a system emerges solely from numerous interactions among the lower-level components of the system. Moreover, the rules specifying interactions among the system’s components are executed using only local information, without reference to the global pattern, which, as in many real-world problems is not easily accessible or possible to be found. Stigmergy, a kind of indirect communication and learning by the environment found in social insects is a well know example of self-organization, providing not only vital clues in order to understand how the components can interact to produce a complex pattern, as can pinpoint simple biological non-linear rules and methods to achieve improved artificial intelligent adaptive categorization systems, critical for Data-Mining. On the present work it is our intention to show that a new type of Data-Mining can be designed based on Stigmergic paradigms, taking profit of several natural features of this phenomenon. By hybridizing bio-inspired Swarm Intelligence with Evolutionary Computation we seek for an entire distributed, adaptive, collective and cooperative self-organized Data-Mining. As a real-world / real-time test bed for our proposal, World-Wide-Web Mining will be used. Having that purpose in mind, Web usage Data was collected from the Monash University’s Web site (Australia), with over 7 million hits every week. Results are compared to other recent systems, showing that the system presented is by far promising.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Journalism is dying, they say. I do agree. And while the argue continues, many interested on the issue are now debating what really is the reason. The question is…, there is no reason at all, there are many. Intricate ones. Do ponder on this: while newspapers are facing the immense omnipresent and real-time competition from TV channels, TV on itself is dying also (while unexpectedly, … Radio is surging). On many broadcasted programs, TV anchors are now more important than the invited people who, on that subject (supposedly) worked hardly over years to provide that precise innovative content. As in large supermarkets and great malls, package by these means have turned more important than the content in itself. This related business editorial pressure for news quickness have become so intensive and aggressive, that contents are replaced every second without judge and once in the air hardly described, discussed,  opposed or dessicated. So at large,  TV CEO’s producers think that people are no longer waiting for a new interesting content to appear, they are instead waiting for the anchor which passes them down as they were peanuts. Peanuts are good, but in excess – we all agree – are damn awful. And many do so,  as an old passive addiction. Which means that in the long run, nothing remains (fact for both sides); … And if they give me no opportunity at all to check content carefully, if I happen to be on the mood to, … So, I move on. Buy this precise simple way, media cannibalizes itself.

We all know that attention spam is getting narrower these days, and, e.g., yes… greater literature classics are no longer read. So, Media CEO’s say – “they have no time“. But, really … do mind that gap. Think twice. If the whole environment suddenly recognizes (being this one of the major questions – see below) that they are getting enough of peanuts (and they really are), they will urge for beef-steaks. In fact, eating 1000 void peanuts takes more time to consume than one large good beef! And there is a difference, … the beef remains on our body for several hours, not seconds.

It’s promptly becoming a paradox, since Media CEO’s on their blindness competition refuge on saying that they – us readers – have no time (when in mediocrity no solution is found, easiest way is to repeat a mantra), and we (mostly of us) keep zapping news as never before. However, they never realized that we keep zapping it, because no news – by these means –  are of interest. They really all have become the same. And once they appear all the same, they all soon disappear from our minds. … We all in some aspects all wonder, what  really happened to  research journalism, stories about new complex issues, strong content, explained in detail but still provided in simple eloquent ways? Come on, this long-tailed huge market niche, once yours, is now void!

Newspapers do have this wonderful singularity. They still have journalists (at least some, if they had enough vision to nourish them). They could provide insightful detailed backup stories, open questions, or debating new ones as no one can in public space. Moreover, they have time from their consumers. That, at least, is what I am feed-backing to Guardian every Sunday when I put my money over the news bench in change for this newspaper, along others like The Economist. But in face of these overall great news-without-sense turmoil cascade, probably one of these days, people will instead desire silence… or listening to their grandfathers knowledge, good-sense, and long-lived emotion (which keeps increasing believe me). They will relate to him, as never before.  Not newspapers. At least, he do provides content.

But once the media is set (and in some way, not all the way, medium is the message, as postulated by Marshall McLuhan), the great gold-run will be on, … guess what, … content. And on relationships among content! Journalism will be no longer under atomization. Or crystallized.

Fig. – Spatial distribution of 931 items (words taken from an article at ABC Spanish newspaper) on a 61 x 61 non-parametric toroidal grid, at t=106. 91 ants used type 2 probability response functions, with k1=0.1 and k2=0.3. Some independent clusters examples are: (A) anunció, bilbao, embargo, titulos, entre, hacer, necesídad, tras, vida, lider, cualquier, derechos, medida.(B) dirigentes, prensa, ciu. (C) discos, amigos, grandes. (D) hechos, piloto, miedo, tipo, cd, informes. (E) dificil, gobierno, justicia, crisis, voluntad, creó, elección, horas, frente, técnica, unas, tarde, familia, sargento, necesídad, red, obra … (among other word semantic clusters; check paper article below).

For long, media decided to do nothing, while new media including social media was coming in to the plateu, stronger as never before. Let me give you one example. In order to understand how relations between item news could enhnace newspaper reading and social awareness, back in 2002 I decided to make an experiment. Together with a colleague, we took one article of the Spanish ABC magazine (photo above). The article was about spanish political parties and corruption. It contained 931words (snapshot above). In order to extract semantic meaning from it as a pre-processing computer analysis, we started by applying Latent Semantic Analysis (LSA). Then, Swarm Intelligent algorithms were developed in order to have a glimpse on the relations among all those words on the newspaper article. Guess what? Some words like “big”, friends” and “music discs” were segmented from the rest of the political related article (segregated it on a remote semantic “island”), that is, not only a whole conceptual semantic atlas of that entire news section was possible, as well as finding unrelated issues (which were uncorrelated semantic “islands”). Now, just imagine if this happens within a newspaper social network, live, 24 hours a day, while people grab for strong co-related content and discuss it as it happens. One strong journal article, could in facto, evolve to social collective knowledge and awareness as never before. That, in reality is something that classic journalism could use as and edge for their (nowadays awful) market approach. Providing not only good content, but along with it, an extra service not available anyware (which is in some way, priceless): The chance to provide co-related real-time meta-content. Not one view, but many aggregated views.  Edited real-world real-time good quality journalism which has the potential of an “endless” price, namely these days. On the other hand, what we now see is that news CEO’s along with some editors still keep their minds on 19th century journalism.  For worse, due to their legitimic panic. However, meanwhile, the world has indeed evolved.

[] Vitorino Ramos, Juan J. Merelo, Self-Organized Stigmergic Document Maps: Environment as a Mechanism for Context Learning, in AEB´2002 – 1st Spanish Conference on Evolutionary and Bio-Inspired Algorithms, E. Alba, F. Herrera, J.J. Merelo et al. (Eds.), pp. 284-293, Centro Univ. de Mérida, Mérida, Spain, 6-8 Feb. 2002.

Social insect societies and more specifically ant colonies, are distributed systems that, in spite of the simplicity of their individuals, present a highly structured social organization. As a result of this organization, ant colonies can accomplish complex tasks that in some cases exceed the individual capabilities of a single ant. The study of ant colonies behavior and of their self-organizing capabilities is of interest to knowledge retrieval/management and decision support systems sciences, because it provides models of distributed adaptive organization which are useful to solve difficult optimization, classification, and distributed control problems, among others. In the present work we overview some models derived from the observation of real ants, emphasizing the role played by stigmergy as distributed communication paradigm, and we present a novel strategy to tackle unsupervised clustering as well as data retrieval problems. The present ant clustering system (ACLUSTER) avoids not only short-term memory based strategies, as well as the use of several artificial ant types (using different speeds), present in some recent approaches. Moreover and according to our knowledge, this is also the first application of ant systems into textual document clustering.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Video – ABB FlexPicker Robots (Source: http://www.botjunkie.com/ + http://www.abb.com/)

As well as, something at the lower pre-processing engineering level involving also Pattern Recognition, Image Analysis and Classification.  Not for  brownies, cookies or sausages. Since this is summer time, it relates with clams and bivalve in general. From the video, everything appears to be rather easy. But, they are not.

With the current ongoing dramatic need of Africa to have contemporary maps (currently, Google promises to launch his first and exhaustive world-wide open-access digital cartography of the African continent very soon), back in 1999-2000 we envisioned a very simple idea into a research project (over my previous lab. – CVRM IST). Instead of producing new maps in the regular standard way, which are costly (specially for African continent countries) as well as time consuming (imagine the amount of money and time needed to cover the whole continent with high resolution aerial photos) the idea then was to hybridize trough an automatic procedure (with the help of Artificial Intelligence) new current data coming from satellites with old data coming from the computational analysis of images of old colonial maps. For instance, old roads segmented in old maps will help us finding the new ones coming from the current satellite images, as well as those that were lost. The same goes on for bridges, buildings, numbers, letters at the map, etc. However in order to do this, several preparatory steps were needed. One of those crucial steps was to obtain (segment – know to be one of the hardest procedures in image processing) the old roads, buildings, airports, at the old maps. Back in 1999-2000 while dealing with several tasks at this research project (AUTOCARTIS Automatic Methods for Updating Cartographic Maps) I started to think of using evolutionary computation in order to tackle and surpass this precise problem, in what then later become one of the first usages of Genetic Algorithms in image analysis. The result could be checked below. Meanwhile, the experience gained with AUTOCARTIS was then later useful not only for digital old books (Visão Magazine, March 2002), as well as for helping us finding water in Mars (at the MARS EXPRESS European project – Expresso newspaper, May 2003) from which CVRM lab. was one of the European partners. Much often in life simple ideas (I owe it to Prof. Fernando Muge and Prof. Pedro Pina) are the best ones. This is particularly true in science.

Figure – One original image (left – Luanda, Angola map) and two segmentation examples, rivers and roads respectively obtained through the Genetic Algorithm proposed (low resolution images). [at the same time this precise Map of Luanda, was used by me along with the face of Einstein to benchmark several dynamic image adaptive perception versus memory experiments via ant-like artificial life systems over what I then entitled Digital Image Habitats]

[] Vitorino Ramos, Fernando Muge, Map Segmentation by Colour Cube Genetic K-Mean Clustering, Proc. of ECDL´2000 – 4th European Conference on Research and Advanced Technology for Digital Libraries, J. Borbinha and T. Baker (Eds.), ISBN 3-540-41023-6, Lecture Notes in Computer Science, Vol. 1923, pp. 319-323, Springer-Verlag -Heidelberg, Lisbon, Portugal, 18-20 Sep. 2000.

Segmentation of a colour image composed of different kinds of texture regions can be a hard problem, namely to compute for an exact texture fields and a decision of the optimum number of segmentation areas in an image when it contains similar and/or non-stationary texture fields. In this work, a method is described for evolving adaptive procedures for these problems. In many real world applications data clustering constitutes a fundamental issue whenever behavioural or feature domains can be mapped into topological domains. We formulate the segmentation problem upon such images as an optimisation problem and adopt evolutionary strategy of Genetic Algorithms for the clustering of small regions in colour feature space. The present approach uses k-Means unsupervised clustering methods into Genetic Algorithms, namely for guiding this last Evolutionary Algorithm in his search for finding the optimal or sub-optimal data partition, task that as we know, requires a non-trivial search because of its NP-complete nature. To solve this task, the appropriate genetic coding is also discussed, since this is a key aspect in the implementation. Our purpose is to demonstrate the efficiency of Genetic Algorithms to automatic and unsupervised texture segmentation. Some examples in Colour Maps are presented and overall results discussed.

(to obtain the respective PDF file follow link above or visit chemoton.org)

[] Crina Grosan, Ajith Abraham, Sang Yong Han, Vitorino Ramos, Stock Market Prediction using Multi Expression Programming, in ALEA´05, Workshop on Artificial Life and Evolutionary Algorithms at EPIA´05 – Proc. of the 12th Portuguese Conference on Artificial Intelligence, C. Bento, A. Cardoso and G. Dias (Eds.), IEEE Press, pp. 73-78, 2005.

The use of intelligent systems for stock market predictions has been widely established. In this paper we introduce a genetic programming technique (called Multi-Expression programming) for the prediction of two stock indices. The performance is then compared with an artifcial neural network trained using Levenberg-Marquardt algorithm, support vector machine, Takagi-Sugeno neuro-fuzzy model, a difference boosting neural network. We considered Nasdaq-100 index of Nasdaq Stock MarketSM and the S&P CNX NIFTY stock index as test data.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Figure – A sequential clustering task of corpses performed by a real ant colony. In here 1500 corpses are randomly located in a circular arena with radius = 25 cm, where Messor Sancta workers are present. The figure shows the initial state (above), 2 hours, 6 hours and 26 hours (below) after the beginning of the experiment (from: Bonabeau E., M. Dorigo, G. Théraulaz. Swarm Intelligence: From Natural to Artificial Systems. Santa Fe Institute in the Sciences of the Complexity, Oxford University Press, New York, Oxford, 1999).

The following research paper exploits precisely this phenomena into digital data.

[] Vitorino Ramos, Fernando Muge, Pedro Pina, Self-Organized Data and Image Retrieval as a Consequence of Inter-Dynamic Synergistic Relationships in Artificial Ant Colonies, in Javier Ruiz-del-Solar, Ajith Abraham and Mario Köppen (Eds.), Frontiers in Artificial Intelligence and Applications, Soft Computing Systems – Design, Management and Applications, 2nd Int. Conf. on Hybrid Intelligent Systems, IOS Press, Vol. 87, ISBN 1 5860 32976, pp. 500-509, Santiago, Chile, Dec. 2002.

Social insects provide us with a powerful metaphor to create decentralized systems of simple interacting, and often mobile, agents. The emergent collective intelligence of social insects “swarm intelligence” resides not in complex individual abilities but rather in networks of interactions that exist among individuals and between individuals and their environment. The study of ant colonies behavior and of their self-organizing capabilities is of interest to knowledge retrieval/ management and decision support systems sciences, because it provides models of distributed adaptive organization which are useful to solve difficult optimization, classification, and distributed control problems, among others. In the present work we overview some models derived from the observation of real ants, emphasizing the role played by stigmergy as distributed communication paradigm, and we present a novel strategy (ACLUSTER) to tackle unsupervised data exploratory analysis as well as data retrieval problems. Moreover and according to our knowledge, this is also the first application of ant systems into digital image retrieval problems. Nevertheless, the present algorithm could be applied to any type of numeric data.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Figure – From top left to bottom right, a sequential data-items clustering task performed by an artificial ant colony. The system is able to cope with unforeseen data items in real-time, that is, as data appears in a continuous basis over a large period of time. Also, as time evolves, spatial entropy decreases.

[] Vitorino Ramos, Ajith Abraham, Swarms on Continuous Data, in CEC´03 – Congress on Evolutionary Computation, IEEE Press, ISBN 078-0378-04-0, pp.1370-1375, Canberra, Australia, 8-12 Dec. 2003.

While being it extremely important, many Exploratory Data Analysis (EDA) systems have the inability to perform classification and visualization in a continuous basis or to self-organize new data-items into the older ones (even more into new labels if necessary), which can be crucial in KDD – Knowledge Discovery, Retrieval and Data Mining Systems (interactive and online forms of Web Applications are just one example). This disadvantage is also present in more recent approaches using Self-Organizing Maps. On the present work, and exploiting past successes in recently proposed Stigmergic Ant Systems a robust online classifier is presented, which produces class decisions on a continuous stream data, allowing for continuous mappings. Results show that increasingly better results are achieved, as demonstrated by other authors in different areas.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Springer book “Swarm Intelligence in Data Mining” (Studies in Computational Intelligence Series, Vol. 34) published in late 2006, is receiving a fair amount of attention, so much so, that early this year, Tokyo Denki University press (TDU) decided to negotiate with Springer the translation rights and copyrights in order to released it over their country in Japanese language. The Japanese version will now become shortly available, and I do hope – being one of the scientific editors – it will receive increasing attention as well in Japan, being it one of the most difficult and extraordinary real-world areas we could work nowadays among computer science. Multiple Sequence Alignment (MSA) within Bio-informatics is just one recent example, Financial Markets another. The amount of data – 100000 DVD’s every year -, CERN’s Large Hadron Collider (LHC) will collect is yet another. In order to transform data into information, and information into useful and critical knowledge, reliable and robust Data Mining is more than ever needed, on our daily life.

Meanwhile, I wonder how the Japanese cover design will be?! Starting with it’s own title, which appears to be pretty hard to translate. According to Yahoo BabelFish the Japanese characters (群れの知性) – derived among other language scripts from Kanji – correspond to the English sentence “Swarm Intelligence“. I wonder if this translation is correct or not, since “swarm” in itself, is kind of difficult to translate. Some meanings of it point out to a spaghetti dish, as well, which kind of makes some logic too. Moreover, the technical translation of it is also difficult. I guess the best person to handle the translation (at least from the list of colleagues around the world I know) is Claus Aranha. (IBA Lab., University of Tokyo). Not only he works in Japan for several years now, as well as some of his works focus this precise area.

SIDM book (Swarm Int. in Data Mining) focus on the hybridization of these two areas. As you may probably now, Data Mining (see also; Knowledge Extraction) refers to a collection of techniques – many of them classical – that envisions to tackle large amounts of data, in order to perform classification, clustering, sorting, feature selection, search, forecasting, decision, meaningful extraction, association rule discovery, sequential pattern discovery, etc. In recent years however (1985-2000), state of the art Artificial Intelligence such as Evolutionary Computation was also used, since some of his problems could be seen as – or properly translated to – optimization problems (namely, combinatorial). The same now happens with Swarm Intelligence, since some of it’s unique self-organizing distributed features (allowing direct applications over Grid Computing) seems ideal to tackle some of the most complex data mining problems we may face today.

For those willing for more, I will leave you with it’s contents (chapters), a foreword to this book by James Kennedy (one of the founding fathers of PSO Particle Swarm Optimization, along with Russell C. Eberhart, and Yuhui Shi) which I vividly recommend (starting with the sentence “Science is a Swarm“!), as well as a more detailed description to it:

Swarm Intelligence (SI) is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Particle Swarm Optimization (PSO) incorporates swarming behaviors observed in flocks of birds, schools of fish, or swarms of bees, and even human social behavior, from which the idea is emerged. Ant Colony Optimization (ACO) deals with artificial systems that is inspired from the foraging behavior of real ants, which are used to solve discrete optimization problems. Historically the notion of finding useful patterns in data has been given a variety of names including data mining, knowledge discovery, information extraction, etc. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Data mining and Swarm intelligence may seem that they do not have many properties in common. However, recent studies suggests that they can be used together for several real world data mining problems especially when other methods would be too expensive or difficult to implement. This book deals with the application of swarm intelligence methodologies in data mining. Addressing the various issues of swarm intelligence and data mining using different intelligent approaches is the novelty of this edited volume. This volume comprises of 11 chapters including an introductory chapters giving the fundamental definitions and some important research challenges. Chapters were selected on the basis of fundamental ideas/concepts rather than the thoroughness of techniques deployed.

The eleven chapters are organized as follows. In Chapter 1, Grosan et al. present the biological motivation and some of the theoretical concepts of swarm intelligence with an emphasis on particle swarm optimization and ant colony optimization algorithms. The basic data mining terminologies are explained and linked with some of the past and ongoing works using swarm intelligence techniques. Martens et al. in Chapter 2 introduce a new algorithm for classification, named AntMiner+, based on an artificial ant system with inherent selforganizing capabilities. AntMiner+ differs from the previously proposed AntMiner classification technique in three aspects. Firstly, AntMiner+ uses a MAX-MIN ant system which is an improved version of the originally proposed ant system, yielding better performing classifiers. Secondly, the complexity of the environment in which the ants operate has substantially decreased. Finally, AntMiner+ leads to fewer and better performing rules. In Chapter 3, Jensen presents a feature selection mechanism based on ant colony optimization algorithm to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. The proposed method is applied to two very different challenging tasks, namely web classification and complex systems monitoring. Galea and Shen in the fourth chapter present an ant colony optimization approach for the induction of fuzzy rules. Several ant colony optimization algorithms are run simultaneously, with each focusing on finding descriptive rules for a specific class. The final outcome is a fuzzy rulebase that has been evolved so that individual rules complement each other during the classification process. In the fifth chapter Tsang and Kwong present an ant colony based clustering model for intrusion detection. The proposed model improves existing ant-based clustering algorithms by incorporating some meta-heuristic principles. To further improve the clustering solution and alleviate the curse of dimensionality in network connection data, four unsupervised feature extraction algorithms are also studied and evaluated. Omran et al. in the sixth chapter present particle swarm optimization algorithms for pattern recognition and image processing problems. First a clustering method that is based on PSO is discussed. The application of the proposed clustering algorithm to the problem of unsupervised classification and segmentation of images is investigated. Then PSO-based approaches that tackle the color image quantization and spectral unmixing problems are discussed.
In the seventh chapter Azzag et al. present a new model for data clustering, which is inspired from the self-assembly behavior of real ants. Real ants can build complex structures by connecting themselves to each others. It is shown is this paper that this behavior can be used to build a hierarchical tree-structured partitioning of the data according to the similarities between those data. Authors have also introduced an incremental version of the artificial ants algorithm. Kazemian et al. in the eighth chapter presents a new swarm data clustering method based on Flowers Pollination by Artificial Bees (FPAB). FPAB does not require any parameter settings and any initial information such as the number of classes and the number of partitions on input data. Initially, in FPAB, bees move the pollens and pollinate them. Each pollen will grow in proportion to its garden flowers. Better growing will occur in better conditions. After some iterations, natural selection reduces the pollens and flowers and the gardens of the same type of flowers will be formed. The prototypes of each gardens are taken as the initial cluster centers for Fuzzy C Means algorithm which is used to reduce obvious misclassification errors. In the next stage, the prototypes of gardens are assumed as a single flower and FPAB is applied to them again. Palotai et al. in the ninth chapter propose an Alife architecture for news foraging. News foragers in the Internet were evolved by a simple internal selective algorithm: selection concerned the memory components, being finite in size and containing the list of most promising supplies. Foragers received reward for locating not yet found news and crawled by using value estimation. Foragers were allowed to multiply if they passed a given productivity threshold. A particular property of this community is that there is no direct interaction (here, communication) amongst foragers that allowed us to study compartmentalization, assumed to be important for scalability, in a very clear form. Veenhuis and Koppen in the tenth chapter introduce a data clustering algorithm based on species clustering. It combines methods of particle swarm optimization and flock algorithms. A given set of data is interpreted as a multi-species swarm which wants to separate into single-species swarms, i.e., clusters. The data to be clustered are assigned to datoids which form a swarm on a two-dimensional plane. A datoid can be imagined as a bird carrying a piece of data on its back. While swarming, this swarm divides into sub-swarms moving over the plane and consisting of datoids carrying similar data. After swarming, these sub swarms of datoids can be grouped together as clusters. In the last chapter Yang et al. present a clustering ensemble model using ant colony algorithm with validity index and ART neural network. Clusterings are visually formed on the plane by ants walking, picking up or dropping down projected data objects with different probabilities. Adaptive Resonance Theory (ART) is employed to combine the clusterings produced by ant colonies with different moving speeds. We are very much grateful to the authors of this volume and to the reviewers for their tremendous service by critically reviewing the chapters. The editors would like to thank Dr. Thomas Ditzinger (Springer Engineering Inhouse Editor, Studies in Computational Intelligence Series), Professor Janusz Kacprzyk (Editor-in-Chief, Springer Studies in Computational Intelligence Series) and Ms. Heather King (Editorial Assistant, Springer Verlag, Heidelberg) for the editorial assistance and excellent cooperative collaboration to produce this important scientific work. We hope that the reader will share our excitement to present this volume on ‘Swarm Intelligence in Data Mining’ and will find it useful.

April, 2006
Ajith Abraham, Chung-Ang University, Seoul, Korea
Crina Grosan, Cluj-Napoca, Babes-Bolyai University, Romania
Vitorino Ramos, IST Technical University of Lisbon, Portugal

Image Classification of Shellfish Larvae Digital Images using Swarm Intelligence. On the left a compendium of 9 raw images (out of 20 samples) used in the present study. Respective segmented images on the rigth.

Image Classification of Shellfish Larvae Digital Images using Swarm Intelligence. On the left a compendium of 9 raw images (out of 20 samples) used in the present project. Respective segmented images on the rigth.

[] Vitorino Ramos, Jonathan Campbell, John Slater, John Gillespie, Ivan F. Bendezu and Fionn Murtagh, Swarming around Shellfish Larvae Images, in WCLC-05, 2nd World Congress on Lateral Computing, Bangalore, India, 16-18 Dec., 2005.

The collection of wild larvae seed as a source of raw material is a major sub industry of shellfish aquaculture. To predict when, where and in what quantities wild seed will be available, it is necessary to track the appearance and growth of planktonic larvae. One of the most difficult groups to identify, particularly at the species level are the Bivalvia. This difficulty arises from the fact that fundamentally all bivalve larvae have a similar shape and colour. Identification based on gross morphological appearance is limited by the time-consuming nature of the microscopic examination and by the limited availability of expertise in this field. Molecular and immunological methods are also being studied. We describe the application of computational pattern recognition methods to the automated identification and size analysis of scallop larvae. For identification, the shape features used are binary invariant moments; that is, the features are invariant to shift (position within the image), scale (induced either by growth or differential image magnification) and rotation. Images of a sample of scallop and non-scallop larvae covering a range of maturities have been analysed. In order to overcome the automatic identification, as well as to allow the system to receive new unknown samples at any moment, a self-organized and unsupervised ant-like clustering algorithm based on Swarm Intelligence is proposed, followed by simple k-NNR nearest neighbour classification on the final map. Results achieve a full recognition rate of 100% under several situations (k =1 or 3).

(to obtain the respective PDF file follow link above or visit chemoton.org)

[...] People should learn how to play Lego with their minds. Concepts are building bricks [...] V. Ramos, 2002.

@ViRAms on Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Archives

Blog Stats

  • 244,343 hits