You are currently browsing the tag archive for the ‘Data Mining’ tag.

Vitorino Ramos - Citations2016Jan

2016 – Up now, an overall of 1567 citations among 74 works (including 3 books) on GOOGLE SCHOLAR (https://scholar.google.com/citations?user=gSyQ-g8AAAAJ&hl=en) [with an Hirsh h-index=19, and an average of 160.2 citations each for any work on my top five] + 900 citations among 57 works on the new RESEARCH GATE site (https://www.researchgate.net/profile/Vitorino_Ramos).

Refs.: Science, Artificial Intelligence, Swarm Intelligence, Data-Mining, Big-Data, Evolutionary Computation, Complex Systems, Image Analysis, Pattern Recognition, Data Analysis.

Complete circuit diagram with pheromone - Cristian Jimenez-Romero, David Sousa-Rodrigues, Jeffrey H. Johnson, Vitorino Ramos; Figure – Neural circuit controller of the virtual ant (page 3, fig. 2). [URL: http://arxiv.org/abs/1507.08467 ]

Intelligence and decision in foraging ants. Individual or Collective? Internal or External? What is the right balance between the two. Can one have internal intelligence without external intelligence? Can one take examples from nature to build in silico artificial lives that present us with interesting patterns? We explore a model of foraging ants in this paper that will be presented in early September in Exeter, UK, at UKCI 2015. (available on arXiv [PDF] and ResearchGate)

Cristian Jimenez-Romero, David Sousa-Rodrigues, Jeffrey H. Johnson, Vitorino Ramos; “A Model for Foraging Ants, Controlled by Spiking Neural Networks and Double Pheromones“, UKCI 2015 Computational Intelligence – University of Exeter, UK, September 2015.

Abstract: A model of an Ant System where ants are controlled by a spiking neural circuit and a second order pheromone mechanism in a foraging task is presented. A neural circuit is trained for individual ants and subsequently the ants are exposed to a virtual environment where a swarm of ants performed a resource foraging task. The model comprises an associative and unsupervised learning strategy for the neural circuit of the ant. The neural circuit adapts to the environment by means of classical conditioning. The initially unknown environment includes different types of stimuli representing food (rewarding) and obstacles (harmful) which, when they come in direct contact with the ant, elicit a reflex response in the motor neural system of the ant: moving towards or away from the source of the stimulus. The spiking neural circuits of the ant is trained to identify food and obstacles and move towards the former and avoid the latter. The ants are released on a landscape with multiple food sources where one ant alone would have difficulty harvesting the landscape to maximum efficiency. In this case the introduction of a double pheromone mechanism (positive and negative reinforcement feedback) yields better results than traditional ant colony optimization strategies. Traditional ant systems include mainly a positive reinforcement pheromone. This approach uses a second pheromone that acts as a marker for forbidden paths (negative feedback). This blockade is not permanent and is controlled by the evaporation rate of the pheromones. The combined action of both pheromones acts as a collective stigmergic memory of the swarm, which reduces the search space of the problem. This paper explores how the adaptation and learning abilities observed in biologically inspired cognitive architectures is synergistically enhanced by swarm optimization strategies. The model portraits two forms of artificial intelligent behaviour: at the individual level the spiking neural network is the main controller and at the collective level the pheromone distribution is a map towards the solution emerged by the colony. The presented model is an important pedagogical tool as it is also an easy to use library that allows access to the spiking neural network paradigm from inside a Netlogo—a language used mostly in agent based modelling and experimentation with complex systems.

References:

[1] C. G. Langton, “Studying artificial life with cellular automata,” Physica D: Nonlinear Phenomena, vol. 22, no. 1–3, pp. 120 – 149, 1986, proceedings of the Fifth Annual International Conference. [Online]. Available: http://www.sciencedirect.com/ science/article/pii/016727898690237X
[2] A. Abraham and V. Ramos, “Web usage mining using artificial ant colony clustering and linear genetic programming,” in Proceedings of the Congress on Evolutionary Computation. Australia: IEEE Press, 2003, pp. 1384–1391.
[3] V. Ramos, F. Muge, and P. Pina, “Self-organized data and image retrieval as a consequence of inter-dynamic synergistic relationships in artificial ant colonies,” Hybrid Intelligent Systems, vol. 87, 2002.
[4] V. Ramos and J. J. Merelo, “Self-organized stigmergic document maps: Environment as a mechanism for context learning,” in Proceddings of the AEB, Merida, Spain, February 2002. ´
[5] D. Sousa-Rodrigues and V. Ramos, “Traversing news with ant colony optimisation and negative pheromones,” in European Conference in Complex Systems, Lucca, Italy, Sep 2014.
[6] E. Bonabeau, G. Theraulaz, and M. Dorigo, Swarm Intelligence: From Natural to Artificial Systems, 1st ed., ser. Santa Fe Insitute Studies In The Sciences of Complexity. 198 Madison Avenue, New York: Oxford University Press, USA, Sep. 1999.
[7] M. Dorigo and L. M. Gambardella, “Ant colony system: A cooperative learning approach to the traveling salesman problem,” Universite Libre de Bruxelles, Tech. Rep. TR/IRIDIA/1996-5, ´ 1996.
[8] M. Dorigo, G. Di Caro, and L. M. Gambardella, “Ant algorithms for discrete optimization,” Artif. Life, vol. 5, no. 2, pp. 137– 172, Apr. 1999. [Online]. Available: http://dx.doi.org/10.1162/ 106454699568728
[9] L. M. Gambardella and M. Dorigo, “Ant-q: A reinforcement learning approach to the travelling salesman problem,” in Proceedings of the ML-95, Twelfth Intern. Conf. on Machine Learning, M. Kaufman, Ed., 1995, pp. 252–260.
[10] A. Gupta, V. Nagarajan, and R. Ravi, “Approximation algorithms for optimal decision trees and adaptive tsp problems,” in Proceedings of the 37th international colloquium conference on Automata, languages and programming, ser. ICALP’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 690–701. [Online]. Available: http://dl.acm.org/citation.cfm?id=1880918.1880993
[11] V. Ramos, D. Sousa-Rodrigues, and J. Louçã, “Second order ˜ swarm intelligence,” in HAIS’13. 8th International Conference on Hybrid Artificial Intelligence Systems, ser. Lecture Notes in Computer Science, J.-S. Pan, M. Polycarpou, M. Wozniak, A. Carvalho, ´ H. Quintian, and E. Corchado, Eds. Salamanca, Spain: Springer ´ Berlin Heidelberg, Sep 2013, vol. 8073, pp. 411–420.
[12] W. Maass and C. M. Bishop, Pulsed Neural Networks. Cambridge, Massachusetts: MIT Press, 1998.
[13] E. M. Izhikevich and E. M. Izhikevich, “Simple model of spiking neurons.” IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, vol. 14, no. 6, pp. 1569–72, 2003. [Online]. Available: http://www.ncbi.nlm.nih. gov/pubmed/18244602
[14] C. Liu and J. Shapiro, “Implementing classical conditioning with spiking neurons,” in Artificial Neural Networks ICANN 2007, ser. Lecture Notes in Computer Science, J. de S, L. Alexandre, W. Duch, and D. Mandic, Eds. Springer Berlin Heidelberg, 2007, vol. 4668, pp. 400–410. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-74690-4 41
[15] J. Haenicke, E. Pamir, and M. P. Nawrot, “A spiking neuronal network model of fast associative learning in the honeybee,” Frontiers in Computational Neuroscience, no. 149, 2012. [Online]. Available: http://www.frontiersin.org/computational neuroscience/10.3389/conf.fncom.2012.55.00149/full
[16] L. I. Helgadottir, J. Haenicke, T. Landgraf, R. Rojas, and M. P. Nawrot, “Conditioned behavior in a robot controlled by a spiking neural network,” in International IEEE/EMBS Conference on Neural Engineering, NER, 2013, pp. 891–894.
[17] A. Cyr and M. Boukadoum, “Classical conditioning in different temporal constraints: an STDP learning rule for robots controlled by spiking neural networks,” pp. 257–272, 2012.
[18] X. Wang, Z. G. Hou, F. Lv, M. Tan, and Y. Wang, “Mobile robots’ modular navigation controller using spiking neural networks,” Neurocomputing, vol. 134, pp. 230–238, 2014.
[19] C. Hausler, M. P. Nawrot, and M. Schmuker, “A spiking neuron classifier network with a deep architecture inspired by the olfactory system of the honeybee,” in 2011 5th International IEEE/EMBS Conference on Neural Engineering, NER 2011, 2011, pp. 198–202.
[20] U. Wilensky, “Netlogo,” Evanston IL, USA, 1999. [Online]. Available: http://ccl.northwestern.edu/netlogo/
[21] C. Jimenez-Romero and J. Johnson, “Accepted abstract: Simulation of agents and robots controlled by spiking neural networks using netlogo,” in International Conference on Brain Engineering and Neuro-computing, Mykonos, Greece, Oct 2015.
[22] W. Gerstner and W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge: Cambridge University Press, 2002.
[23] J. v. H. W Gerstner, R Kempter and H. Wagner, “A neuronal learning rule for sub-millisecond temporal coding,” Nature, vol. 386, pp. 76–78, 1996.
[24] I. P. Pavlov, “Conditioned reflexes: An investigation of the activity of the cerebral cortex,” New York, 1927.
[25] E. J. H. Robinson, D. E. Jackson, M. Holcombe, and F. L. W. Ratnieks, “Insect communication: ‘no entry’ signal in ant foraging,” Nature, vol. 438, no. 7067, pp. 442–442, 11 2005. [Online]. Available: http://dx.doi.org/10.1038/438442a
[26] E. J. Robinson, D. Jackson, M. Holcombe, and F. L. Ratnieks, “No entry signal in ant foraging (hymenoptera: Formicidae): new insights from an agent-based model,” Myrmecological News, vol. 10, no. 120, 2007.
[27] D. Sousa-Rodrigues, J. Louçã, and V. Ramos, “From standard ˜ to second-order swarm intelligence phase-space maps,” in 8th European Conference on Complex Systems, S. Thurner, Ed., Vienna, Austria, Sep 2011.
[28] V. Ramos, D. Sousa-Rodrigues, and J. Louçã, “Spatio-temporal ˜ dynamics on co-evolved stigmergy,” in 8th European Conference on Complex Systems, S. Thurner, Ed., Vienna, Austria, 9 2011.
[29] S. Tisue and U. Wilensky, “Netlogo: A simple environment for modeling complexity,” in International conference on complex systems. Boston, MA, 2004, pp. 16–21.

David MS Rodrigues Reading the News Through its Structure New Hybrid Connectivity Based ApproachesFigure – Two simplicies a and b connected by the 2-dimensional face, the triangle {1;2;3}. In the analysis of the time-line of The Guardian newspaper (link) the system used feature vectors based on frequency of words and them computed similarity between documents based on those feature vectors. This is a purely statistical approach that requires great computational power and that is difficult for problems that have large feature vectors and many documents. Feature vectors with 100,000 or more items are common and computing similarities between these documents becomes cumbersome. Instead of computing distance (or similarity) matrices between documents from feature vectors, the present approach explores the possibility of inferring the distance between documents from the Q-analysis description. Q-analysis is a very natural notion of connectivity between the simplicies of the structure and in the relation studied, documents are connected to each other through shared sets of tags entered by the journalists. Also in this framework, eccentricity is defined as a measure of the relatedness of one simplex in relation to another [7].

David M.S. Rodrigues and Vitorino Ramos, “Traversing News with Ant Colony Optimisation and Negative Pheromones” [PDF], accepted as preprint for oral presentation at the European Conference on Complex SystemsECCS14 in Lucca, Sept. 22-26, 2014, Italy.

Abstract: The past decade has seen the rapid development of the online newsroom. News published online are the main outlet of news surpassing traditional printed newspapers. This poses challenges to the production and to the consumption of those news. With those many sources of information available it is important to find ways to cluster and organise the documents if one wants to understand this new system. Traditional approaches to the problem of clustering documents usually embed the documents in a suitable similarity space. Previous studies have reported on the impact of the similarity measures used for clustering of textual corpora [1]. These similarity measures usually are calculated for bag of words representations of the documents. This makes the final document-word matrix high dimensional. Feature vectors with more than 10,000 dimensions are common and algorithms have severe problems with the high dimensionality of the data. A novel bio inspired approach to the problem of traversing the news is presented. It finds Hamiltonian cycles over documents published by the newspaper The Guardian. A Second Order Swarm Intelligence algorithm based on Ant Colony Optimisation was developed [2, 3] that uses a negative pheromone to mark unrewarding paths with a “no-entry” signal. This approach follows recent findings of negative pheromone usage in real ants [4].

In this case study the corpus of data is represented as a bipartite relation between documents and keywords entered by the journalists to characterise the news. A new similarity measure between documents is presented based on the Q-analysis description [5, 6, 7] of the simplicial complex formed between documents and keywords. The eccentricity between documents (two simplicies) is then used as a novel measure of similarity between documents. The results prove that the Second Order Swarm Intelligence algorithm performs better in benchmark problems of the travelling salesman problem, with faster convergence and optimal results. The addition of the negative pheromone as a non-entry signal improves the quality of the results. The application of the algorithm to the corpus of news of The Guardian creates a coherent navigation system among the news. This allows the users to navigate the news published during a certain period of time in a semantic sequence instead of a time sequence. This work as broader application as it can be applied to many cases where the data is mapped to bipartite relations (e.g. protein expressions in cells, sentiment analysis, brand awareness in social media, routing problems), as it highlights the connectivity of the underlying complex system.

Keywords: Self-Organization, Stigmergy, Co-Evolution, Swarm Intelligence, Dynamic Optimization, Foraging, Cooperative Learning, Hamiltonian cycles, Text Mining, Textual Corpora, Information Retrieval, Knowledge Discovery, Sentiment Analysis, Q-Analysis, Data Mining, Journalism, The Guardian.

References:

[1] Alexander Strehl, Joydeep Ghosh, and Raymond Mooney. Impact of similarity measures on web-page clustering.  In Workshop on Artifcial Intelligence for Web Search (AAAI 2000), pages 58-64, 2000.
[2] David M. S. Rodrigues, Jorge Louçã, and Vitorino Ramos. From standard to second-order Swarm Intelligence  phase-space maps. In Stefan Thurner, editor, 8th European Conference on Complex Systems, Vienna, Austria,  9 2011.
[3] Vitorino Ramos, David M. S. Rodrigues, and Jorge Louçã. Second order Swarm Intelligence. In Jeng-Shyang  Pan, Marios M. Polycarpou, Micha l Wozniak, André C.P.L.F. Carvalho, Hector Quintian, and Emilio Corchado,  editors, HAIS’13. 8th International Conference on Hybrid Artificial Intelligence Systems, volume 8073 of Lecture  Notes in Computer Science, pages 411-420. Springer Berlin Heidelberg, Salamanca, Spain, 9 2013.
[4] Elva J.H. Robinson, Duncan Jackson, Mike Holcombe, and Francis L.W. Ratnieks. No entry signal in ant  foraging (hymenoptera: Formicidae): new insights from an agent-based model. Myrmecological News, 10(120), 2007.
[5] Ronald Harry Atkin. Mathematical Structure in Human A ffairs. Heinemann Educational Publishers, 48 Charles  Street, London, 1 edition, 1974.
[6] J. H. Johnson. A survey of Q-analysis, part 1: The past and present. In Proceedings of the Seminar on Q-analysis  and the Social Sciences, Universty of Leeds, 9 1983.
[7] David M. S. Rodrigues. Identifying news clusters using Q-analysis and modularity. In Albert Diaz-Guilera,  Alex Arenas, and Alvaro Corral, editors, Proceedings of the European Conference on Complex Systems 2013, Barcelona, 9 2013.

Four different snapshots (click to enlarge) from one of my latest books, recently published in Japan: Ajith Abraham, Crina Grosan, Vitorino Ramos (Eds.), “Swarm Intelligence in Data Mining” (群知能と  データマイニング), Tokyo Denki University press [TDU], Tokyo, Japan, July 2012.

Figure (click to enlarge) – Cover from one of my books published last month (10 July 2012) “Swarm Intelligence in Data Mining” recently translated and edited in Japan (by Tokyo Denki University press [TDU]). Cover image from Amazon.co.jp (url). Title was translated into 群知能と  データマイニング. Funny also, to see my own name for the first time translated into Japanese – wonder if it’s Kanji. A brief synopsis follow:

(…) Swarm Intelligence (SI) is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Particle Swarm Optimization (PSO) incorporates swarming behaviours observed in flocks of birds, schools of fish, or swarms of bees, and even human social behaviour, from which the idea is emerged. Ant Colony Optimization (ACO) deals with artificial systems that is inspired from the foraging behaviour of real ants, which are used to solve discrete optimization problems. Historically the notion of finding useful patterns in data has been given a variety of names including data mining, knowledge discovery, information extraction, etc. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Data mining and Swarm intelligence may seem that they do not have many properties in common. However, recent studies suggests that they can be used together for several real world data mining problems especially when other methods would be too expensive or difficult to implement. This book deals with the application of swarm intelligence methodologies in data mining. Addressing the various issues of swarm intelligence and data mining using different intelligent approaches is the novelty of this edited volume. This volume comprises of 11 chapters including an introductory chapters giving the fundamental definitions and some important research challenges. Chapters were selected on the basis of fundamental ideas/concepts rather than the thoroughness of techniques deployed. (…) (more)

Video – “Journalism in the age of data” is a 50-minute documentary by Geoff McGhee on information visualization, data as medium, and its use in journalism. Produced during a 2009-2010 John S. Knight Journalism Fellowship at Stanford University.

Pensar por imagens é talvez o único processo eficaz de que a inteligência dispõe para perscrutar os altos problemas da filosofia, da
ciência e da arte
” ~ Manuel Teixeira-Gomes, Escritor nascido em Portimão. 7º Presidente da Rép. Portuguesa, in Carnaval Literário, 1938.

Journalists are coping with the rising information flood by borrowing data visualization techniques from computer scientists, researchers and artists. Some newsrooms are already beginning to retool their staffs and systems to prepare for a future in which data becomes a medium. But how do we communicate with data, how can traditional narratives be fused with sophisticated, interactive information displays? For more, watch the full version with annotations and links at datajournalism.stanford.edu.

Figure – Web Usage Mining of Monash’s Univ. web site using self-organized ant-based clustering (initial and final classification maps). Web usage Data was collected from the Monash University’s Web site (Australia), with over 7 million hits every week.

[] Vitorino Ramos, Ajith Abraham, Evolving a Stigmergic Self-Organized Data-Mining, in ISDA-04, 4th Int. Conf. on Intelligent Systems, Design and Applications, Budapest, Hungary, ISBN 963-7154-30-2, pp. 725-730, August 26-28, 2004.

Self-organizing complex systems typically are comprised of a large number of frequently similar components or events. Through their process, a pattern at the global-level of a system emerges solely from numerous interactions among the lower-level components of the system. Moreover, the rules specifying interactions among the system’s components are executed using only local information, without reference to the global pattern, which, as in many real-world problems is not easily accessible or possible to be found. Stigmergy, a kind of indirect communication and learning by the environment found in social insects is a well know example of self-organization, providing not only vital clues in order to understand how the components can interact to produce a complex pattern, as can pinpoint simple biological non-linear rules and methods to achieve improved artificial intelligent adaptive categorization systems, critical for Data-Mining. On the present work it is our intention to show that a new type of Data-Mining can be designed based on Stigmergic paradigms, taking profit of several natural features of this phenomenon. By hybridizing bio-inspired Swarm Intelligence with Evolutionary Computation we seek for an entire distributed, adaptive, collective and cooperative self-organized Data-Mining. As a real-world / real-time test bed for our proposal, World-Wide-Web Mining will be used. Having that purpose in mind, Web usage Data was collected from the Monash University’s Web site (Australia), with over 7 million hits every week. Results are compared to other recent systems, showing that the system presented is by far promising.

(to obtain the respective PDF file follow link above or visit chemoton.org)

From the author of “Rock, Paper, Scissors – Game Theory in everyday life” dedicated to evolution of cooperation in nature (published last year – Basic Books), a new book on related areas is now fresh on the stands (released Dec. 7,  2009): “The Perfect Swarm – The Science of Complexity in everyday life“. This time Len Fischer takes us into the realm of our interlinked modern lives, where complexity rules. But complexity also has rules. Understand these, and we are better placed to make sense of the mountain of data that confronts us every day.  Fischer ranges far and wide to discover what tips the science of complexity has for us. Studies of human (one good example is Gum voting) and animal behaviour, management science, statistics and network theory all enter the mix.

One of the greatest discoveries of recent times is that the complex patterns we find in life are often produced when all of the individuals in a group follow similar simple rules. Even if the final pattern is complex, rules are not. This process of “Self-Organization” reveals itself in the inanimate worlds of crystals and seashells, but as Len Fisher shows, it is also evident in living organisms, from fish to ants to human beings, being Stigmergy one among many cases of this type of Self-Organized behaviour, encompassing applications in several Engineering fields like Computer science and Artificial Intelligence, Data-Mining, Pattern Recognition, Image Analysis and Perception, Robotics, Optimization, Learning, Forecasting, etc. Since I do work on these precise areas, you may find several of my previous posts dedicated to these issues, such as Self-Organized Data and Image Retrieval systemsStigmergic Optimization, Computer-based Adaptive Dynamic Perception, Swarm-based Data MiningSelf-regulated Swarms and Memory, Ant based Data Clustering, Generative computer-based photography and painting, Classification, Extreme Dynamic Optimization, Self-Organized Pattern Recognition, among other applications.

For instance, the coordinated movements of fish in schools, arise from the simple rule: “Follow the fish in front.” Traffic flow arises from simple rules: “Keep your distance” and “Keep to the right.” Now, in his new book, Fisher shows how we can manage our complex social lives in an ever more chaotic world. His investigation encompasses topics ranging from “swarm intelligence” (check links above) to the science of parties (a beautiful example by ICOSYSTEM inc.) and the best ways to start a fad. Finally, Fisher sheds light on the beauty and utility of complexity theory. For those willing to understand a miriad of some basic examples (Fischer gaves us 33 nice food-for-thought examples in total) and to have a well writen introduction into this thrilling new branch of science, referred by Stephen Hawking as the science for the current century (“I think complexity is the science for the 21st century”), Perfect Swarm will be indeed an excelent companion.

Journalism is dying, they say. I do agree. And while the argue continues, many interested on the issue are now debating what really is the reason. The question is…, there is no reason at all, there are many. Intricate ones. Do ponder on this: while newspapers are facing the immense omnipresent and real-time competition from TV channels, TV on itself is dying also (while unexpectedly, … Radio is surging). On many broadcasted programs, TV anchors are now more important than the invited people who, on that subject (supposedly) worked hardly over years to provide that precise innovative content. As in large supermarkets and great malls, package by these means have turned more important than the content in itself. This related business editorial pressure for news quickness have become so intensive and aggressive, that contents are replaced every second without judge and once in the air hardly described, discussed,  opposed or dessicated. So at large,  TV CEO’s producers think that people are no longer waiting for a new interesting content to appear, they are instead waiting for the anchor which passes them down as they were peanuts. Peanuts are good, but in excess – we all agree – are damn awful. And many do so,  as an old passive addiction. Which means that in the long run, nothing remains (fact for both sides); … And if they give me no opportunity at all to check content carefully, if I happen to be on the mood to, … So, I move on. Buy this precise simple way, media cannibalizes itself.

We all know that attention spam is getting narrower these days, and, e.g., yes… greater literature classics are no longer read. So, Media CEO’s say – “they have no time“. But, really … do mind that gap. Think twice. If the whole environment suddenly recognizes (being this one of the major questions – see below) that they are getting enough of peanuts (and they really are), they will urge for beef-steaks. In fact, eating 1000 void peanuts takes more time to consume than one large good beef! And there is a difference, … the beef remains on our body for several hours, not seconds.

It’s promptly becoming a paradox, since Media CEO’s on their blindness competition refuge on saying that they – us readers – have no time (when in mediocrity no solution is found, easiest way is to repeat a mantra), and we (mostly of us) keep zapping news as never before. However, they never realized that we keep zapping it, because no news – by these means –  are of interest. They really all have become the same. And once they appear all the same, they all soon disappear from our minds. … We all in some aspects all wonder, what  really happened to  research journalism, stories about new complex issues, strong content, explained in detail but still provided in simple eloquent ways? Come on, this long-tailed huge market niche, once yours, is now void!

Newspapers do have this wonderful singularity. They still have journalists (at least some, if they had enough vision to nourish them). They could provide insightful detailed backup stories, open questions, or debating new ones as no one can in public space. Moreover, they have time from their consumers. That, at least, is what I am feed-backing to Guardian every Sunday when I put my money over the news bench in change for this newspaper, along others like The Economist. But in face of these overall great news-without-sense turmoil cascade, probably one of these days, people will instead desire silence… or listening to their grandfathers knowledge, good-sense, and long-lived emotion (which keeps increasing believe me). They will relate to him, as never before.  Not newspapers. At least, he do provides content.

But once the media is set (and in some way, not all the way, medium is the message, as postulated by Marshall McLuhan), the great gold-run will be on, … guess what, … content. And on relationships among content! Journalism will be no longer under atomization. Or crystallized.

Fig. – Spatial distribution of 931 items (words taken from an article at ABC Spanish newspaper) on a 61 x 61 non-parametric toroidal grid, at t=106. 91 ants used type 2 probability response functions, with k1=0.1 and k2=0.3. Some independent clusters examples are: (A) anunció, bilbao, embargo, titulos, entre, hacer, necesídad, tras, vida, lider, cualquier, derechos, medida.(B) dirigentes, prensa, ciu. (C) discos, amigos, grandes. (D) hechos, piloto, miedo, tipo, cd, informes. (E) dificil, gobierno, justicia, crisis, voluntad, creó, elección, horas, frente, técnica, unas, tarde, familia, sargento, necesídad, red, obra … (among other word semantic clusters; check paper article below).

For long, media decided to do nothing, while new media including social media was coming in to the plateu, stronger as never before. Let me give you one example. In order to understand how relations between item news could enhnace newspaper reading and social awareness, back in 2002 I decided to make an experiment. Together with a colleague, we took one article of the Spanish ABC magazine (photo above). The article was about spanish political parties and corruption. It contained 931words (snapshot above). In order to extract semantic meaning from it as a pre-processing computer analysis, we started by applying Latent Semantic Analysis (LSA). Then, Swarm Intelligent algorithms were developed in order to have a glimpse on the relations among all those words on the newspaper article. Guess what? Some words like “big”, friends” and “music discs” were segmented from the rest of the political related article (segregated it on a remote semantic “island”), that is, not only a whole conceptual semantic atlas of that entire news section was possible, as well as finding unrelated issues (which were uncorrelated semantic “islands”). Now, just imagine if this happens within a newspaper social network, live, 24 hours a day, while people grab for strong co-related content and discuss it as it happens. One strong journal article, could in facto, evolve to social collective knowledge and awareness as never before. That, in reality is something that classic journalism could use as and edge for their (nowadays awful) market approach. Providing not only good content, but along with it, an extra service not available anyware (which is in some way, priceless): The chance to provide co-related real-time meta-content. Not one view, but many aggregated views.  Edited real-world real-time good quality journalism which has the potential of an “endless” price, namely these days. On the other hand, what we now see is that news CEO’s along with some editors still keep their minds on 19th century journalism.  For worse, due to their legitimic panic. However, meanwhile, the world has indeed evolved.

[] Vitorino Ramos, Juan J. Merelo, Self-Organized Stigmergic Document Maps: Environment as a Mechanism for Context Learning, in AEB´2002 – 1st Spanish Conference on Evolutionary and Bio-Inspired Algorithms, E. Alba, F. Herrera, J.J. Merelo et al. (Eds.), pp. 284-293, Centro Univ. de Mérida, Mérida, Spain, 6-8 Feb. 2002.

Social insect societies and more specifically ant colonies, are distributed systems that, in spite of the simplicity of their individuals, present a highly structured social organization. As a result of this organization, ant colonies can accomplish complex tasks that in some cases exceed the individual capabilities of a single ant. The study of ant colonies behavior and of their self-organizing capabilities is of interest to knowledge retrieval/management and decision support systems sciences, because it provides models of distributed adaptive organization which are useful to solve difficult optimization, classification, and distributed control problems, among others. In the present work we overview some models derived from the observation of real ants, emphasizing the role played by stigmergy as distributed communication paradigm, and we present a novel strategy to tackle unsupervised clustering as well as data retrieval problems. The present ant clustering system (ACLUSTER) avoids not only short-term memory based strategies, as well as the use of several artificial ant types (using different speeds), present in some recent approaches. Moreover and according to our knowledge, this is also the first application of ant systems into textual document clustering.

(to obtain the respective PDF file follow link above or visit chemoton.org)


Pranav Mistry and SixthSense technology – Part 1 of 2


Pranav Mistry and SixthSense technology – Part 2 of 2

Figure – Book cover of Toby Segaran’s, “Programming Collective Intelligence – Building Smart Web 2.0 Applications“, O’Reilly Media, 368 pp., August 2007.

{scopus online description} Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting data-sets from other web sites, collect data from users of your own applications, and analyze and understand the data once you’ve found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general — all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application.

{even if I don’t totally agree, here’s a “over-rated” description – specially on the scientific side, by someone “dwa” – link above} Programming Collective Intelligence is a new book from O’Reilly, which was written by Toby Segaran. The author graduated from MIT and is currently working at Metaweb Technologies. He develops ways to put large public data-sets into Freebase, a free online semantic database. You can find more information about him on his blog:  http://blog.kiwitobes.com/. Web 2.0 cannot exist without Collective Intelligence. The “giants” use it everywhere, YouTube recommends similar movies, Last.fm knows what would you like to listen and Flickr which photos are your favorites etc. This technology empowers intelligent search, clustering, building price models and ranking on the web. I cannot imagine modern service without data analysis. That is the reason why it is worth to start read about it. There are many titles about collective intelligence but recently I have read two, this one and “Collective Intelligence in Action“. Both are very pragmatic, but the O’Reilly’s one is more focused on the merit of the CI. The code listings are much shorter (but examples are written in Python, so that was easy). In general these books comparison is like Java vs. Python. If you would like to build recommendation engine “in Action”/Java way, you would have to read whole book, attach extra jar-s and design dozens of classes. The rapid Python way requires reading only 15 pages and voila, you have got the first recommendations. It is awesome!

So how about rest of the book, there are still 319 pages! Further chapters say about: discovering groups, searching, ranking, optimization, document filtering, decision trees, price models or genetic algorithms. The book explains how to implement Simulated Annealing, k-Nearest Neighbors, Bayesian Classifier and many more. Take a look at the table of contents (here: http://oreilly.com/catalog/9780596529321/preview.html), it does not list all the algorithms but you can find more information there. Each chapter has about 20-30 pages. You do not have to read them all, you can choose the most important and still know what is going on. Every chapter contains minimum amount of theoretical introduction, for total beginners it might be not enough. I recommend this book for students who had statistics course (not only IT or computing science), this book will show you how to use your knowledge in practice _ there are many inspiring examples. For those who do not know Python – do not be afraid _ at the beginning you will find short introduction to language syntax. All listings are very short and well described by the author _ sometimes line by line. The book also contains necessary information about basic standard libraries responsible for xml processing or web pages downloading. If you would like to start learn about collective intelligence I would strongly recommend reading “Programming Collective Intelligence” first, then “Collective Intelligence in Action”. The first one shows how easy it is to implement basic algorithms, the second one would show you how to use existing open source projects related to machine learning.

[] Crina Grosan, Ajith Abraham, Sang Yong Han, Vitorino Ramos, Stock Market Prediction using Multi Expression Programming, in ALEA´05, Workshop on Artificial Life and Evolutionary Algorithms at EPIA´05 – Proc. of the 12th Portuguese Conference on Artificial Intelligence, C. Bento, A. Cardoso and G. Dias (Eds.), IEEE Press, pp. 73-78, 2005.

The use of intelligent systems for stock market predictions has been widely established. In this paper we introduce a genetic programming technique (called Multi-Expression programming) for the prediction of two stock indices. The performance is then compared with an artifcial neural network trained using Levenberg-Marquardt algorithm, support vector machine, Takagi-Sugeno neuro-fuzzy model, a difference boosting neural network. We considered Nasdaq-100 index of Nasdaq Stock MarketSM and the S&P CNX NIFTY stock index as test data.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Figure – A sequential clustering task of corpses performed by a real ant colony. In here 1500 corpses are randomly located in a circular arena with radius = 25 cm, where Messor Sancta workers are present. The figure shows the initial state (above), 2 hours, 6 hours and 26 hours (below) after the beginning of the experiment (from: Bonabeau E., M. Dorigo, G. Théraulaz. Swarm Intelligence: From Natural to Artificial Systems. Santa Fe Institute in the Sciences of the Complexity, Oxford University Press, New York, Oxford, 1999).

The following research paper exploits precisely this phenomena into digital data.

[] Vitorino Ramos, Fernando Muge, Pedro Pina, Self-Organized Data and Image Retrieval as a Consequence of Inter-Dynamic Synergistic Relationships in Artificial Ant Colonies, in Javier Ruiz-del-Solar, Ajith Abraham and Mario Köppen (Eds.), Frontiers in Artificial Intelligence and Applications, Soft Computing Systems – Design, Management and Applications, 2nd Int. Conf. on Hybrid Intelligent Systems, IOS Press, Vol. 87, ISBN 1 5860 32976, pp. 500-509, Santiago, Chile, Dec. 2002.

Social insects provide us with a powerful metaphor to create decentralized systems of simple interacting, and often mobile, agents. The emergent collective intelligence of social insects “swarm intelligence” resides not in complex individual abilities but rather in networks of interactions that exist among individuals and between individuals and their environment. The study of ant colonies behavior and of their self-organizing capabilities is of interest to knowledge retrieval/ management and decision support systems sciences, because it provides models of distributed adaptive organization which are useful to solve difficult optimization, classification, and distributed control problems, among others. In the present work we overview some models derived from the observation of real ants, emphasizing the role played by stigmergy as distributed communication paradigm, and we present a novel strategy (ACLUSTER) to tackle unsupervised data exploratory analysis as well as data retrieval problems. Moreover and according to our knowledge, this is also the first application of ant systems into digital image retrieval problems. Nevertheless, the present algorithm could be applied to any type of numeric data.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Figure – From top left to bottom right, a sequential data-items clustering task performed by an artificial ant colony. The system is able to cope with unforeseen data items in real-time, that is, as data appears in a continuous basis over a large period of time. Also, as time evolves, spatial entropy decreases.

[] Vitorino Ramos, Ajith Abraham, Swarms on Continuous Data, in CEC´03 – Congress on Evolutionary Computation, IEEE Press, ISBN 078-0378-04-0, pp.1370-1375, Canberra, Australia, 8-12 Dec. 2003.

While being it extremely important, many Exploratory Data Analysis (EDA) systems have the inability to perform classification and visualization in a continuous basis or to self-organize new data-items into the older ones (even more into new labels if necessary), which can be crucial in KDD – Knowledge Discovery, Retrieval and Data Mining Systems (interactive and online forms of Web Applications are just one example). This disadvantage is also present in more recent approaches using Self-Organizing Maps. On the present work, and exploiting past successes in recently proposed Stigmergic Ant Systems a robust online classifier is presented, which produces class decisions on a continuous stream data, allowing for continuous mappings. Results show that increasingly better results are achieved, as demonstrated by other authors in different areas.

(to obtain the respective PDF file follow link above or visit chemoton.org)

Springer book “Swarm Intelligence in Data Mining” (Studies in Computational Intelligence Series, Vol. 34) published in late 2006, is receiving a fair amount of attention, so much so, that early this year, Tokyo Denki University press (TDU) decided to negotiate with Springer the translation rights and copyrights in order to released it over their country in Japanese language. The Japanese version will now become shortly available, and I do hope – being one of the scientific editors – it will receive increasing attention as well in Japan, being it one of the most difficult and extraordinary real-world areas we could work nowadays among computer science. Multiple Sequence Alignment (MSA) within Bio-informatics is just one recent example, Financial Markets another. The amount of data – 100000 DVD’s every year -, CERN’s Large Hadron Collider (LHC) will collect is yet another. In order to transform data into information, and information into useful and critical knowledge, reliable and robust Data Mining is more than ever needed, on our daily life.

Meanwhile, I wonder how the Japanese cover design will be?! Starting with it’s own title, which appears to be pretty hard to translate. According to Yahoo BabelFish the Japanese characters (群れの知性) – derived among other language scripts from Kanji – correspond to the English sentence “Swarm Intelligence“. I wonder if this translation is correct or not, since “swarm” in itself, is kind of difficult to translate. Some meanings of it point out to a spaghetti dish, as well, which kind of makes some logic too. Moreover, the technical translation of it is also difficult. I guess the best person to handle the translation (at least from the list of colleagues around the world I know) is Claus Aranha. (IBA Lab., University of Tokyo). Not only he works in Japan for several years now, as well as some of his works focus this precise area.

SIDM book (Swarm Int. in Data Mining) focus on the hybridization of these two areas. As you may probably now, Data Mining (see also; Knowledge Extraction) refers to a collection of techniques – many of them classical – that envisions to tackle large amounts of data, in order to perform classification, clustering, sorting, feature selection, search, forecasting, decision, meaningful extraction, association rule discovery, sequential pattern discovery, etc. In recent years however (1985-2000), state of the art Artificial Intelligence such as Evolutionary Computation was also used, since some of his problems could be seen as – or properly translated to – optimization problems (namely, combinatorial). The same now happens with Swarm Intelligence, since some of it’s unique self-organizing distributed features (allowing direct applications over Grid Computing) seems ideal to tackle some of the most complex data mining problems we may face today.

For those willing for more, I will leave you with it’s contents (chapters), a foreword to this book by James Kennedy (one of the founding fathers of PSO Particle Swarm Optimization, along with Russell C. Eberhart, and Yuhui Shi) which I vividly recommend (starting with the sentence “Science is a Swarm“!), as well as a more detailed description to it:

Swarm Intelligence (SI) is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Particle Swarm Optimization (PSO) incorporates swarming behaviors observed in flocks of birds, schools of fish, or swarms of bees, and even human social behavior, from which the idea is emerged. Ant Colony Optimization (ACO) deals with artificial systems that is inspired from the foraging behavior of real ants, which are used to solve discrete optimization problems. Historically the notion of finding useful patterns in data has been given a variety of names including data mining, knowledge discovery, information extraction, etc. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition. Data mining and Swarm intelligence may seem that they do not have many properties in common. However, recent studies suggests that they can be used together for several real world data mining problems especially when other methods would be too expensive or difficult to implement. This book deals with the application of swarm intelligence methodologies in data mining. Addressing the various issues of swarm intelligence and data mining using different intelligent approaches is the novelty of this edited volume. This volume comprises of 11 chapters including an introductory chapters giving the fundamental definitions and some important research challenges. Chapters were selected on the basis of fundamental ideas/concepts rather than the thoroughness of techniques deployed.

The eleven chapters are organized as follows. In Chapter 1, Grosan et al. present the biological motivation and some of the theoretical concepts of swarm intelligence with an emphasis on particle swarm optimization and ant colony optimization algorithms. The basic data mining terminologies are explained and linked with some of the past and ongoing works using swarm intelligence techniques. Martens et al. in Chapter 2 introduce a new algorithm for classification, named AntMiner+, based on an artificial ant system with inherent selforganizing capabilities. AntMiner+ differs from the previously proposed AntMiner classification technique in three aspects. Firstly, AntMiner+ uses a MAX-MIN ant system which is an improved version of the originally proposed ant system, yielding better performing classifiers. Secondly, the complexity of the environment in which the ants operate has substantially decreased. Finally, AntMiner+ leads to fewer and better performing rules. In Chapter 3, Jensen presents a feature selection mechanism based on ant colony optimization algorithm to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. The proposed method is applied to two very different challenging tasks, namely web classification and complex systems monitoring. Galea and Shen in the fourth chapter present an ant colony optimization approach for the induction of fuzzy rules. Several ant colony optimization algorithms are run simultaneously, with each focusing on finding descriptive rules for a specific class. The final outcome is a fuzzy rulebase that has been evolved so that individual rules complement each other during the classification process. In the fifth chapter Tsang and Kwong present an ant colony based clustering model for intrusion detection. The proposed model improves existing ant-based clustering algorithms by incorporating some meta-heuristic principles. To further improve the clustering solution and alleviate the curse of dimensionality in network connection data, four unsupervised feature extraction algorithms are also studied and evaluated. Omran et al. in the sixth chapter present particle swarm optimization algorithms for pattern recognition and image processing problems. First a clustering method that is based on PSO is discussed. The application of the proposed clustering algorithm to the problem of unsupervised classification and segmentation of images is investigated. Then PSO-based approaches that tackle the color image quantization and spectral unmixing problems are discussed.
In the seventh chapter Azzag et al. present a new model for data clustering, which is inspired from the self-assembly behavior of real ants. Real ants can build complex structures by connecting themselves to each others. It is shown is this paper that this behavior can be used to build a hierarchical tree-structured partitioning of the data according to the similarities between those data. Authors have also introduced an incremental version of the artificial ants algorithm. Kazemian et al. in the eighth chapter presents a new swarm data clustering method based on Flowers Pollination by Artificial Bees (FPAB). FPAB does not require any parameter settings and any initial information such as the number of classes and the number of partitions on input data. Initially, in FPAB, bees move the pollens and pollinate them. Each pollen will grow in proportion to its garden flowers. Better growing will occur in better conditions. After some iterations, natural selection reduces the pollens and flowers and the gardens of the same type of flowers will be formed. The prototypes of each gardens are taken as the initial cluster centers for Fuzzy C Means algorithm which is used to reduce obvious misclassification errors. In the next stage, the prototypes of gardens are assumed as a single flower and FPAB is applied to them again. Palotai et al. in the ninth chapter propose an Alife architecture for news foraging. News foragers in the Internet were evolved by a simple internal selective algorithm: selection concerned the memory components, being finite in size and containing the list of most promising supplies. Foragers received reward for locating not yet found news and crawled by using value estimation. Foragers were allowed to multiply if they passed a given productivity threshold. A particular property of this community is that there is no direct interaction (here, communication) amongst foragers that allowed us to study compartmentalization, assumed to be important for scalability, in a very clear form. Veenhuis and Koppen in the tenth chapter introduce a data clustering algorithm based on species clustering. It combines methods of particle swarm optimization and flock algorithms. A given set of data is interpreted as a multi-species swarm which wants to separate into single-species swarms, i.e., clusters. The data to be clustered are assigned to datoids which form a swarm on a two-dimensional plane. A datoid can be imagined as a bird carrying a piece of data on its back. While swarming, this swarm divides into sub-swarms moving over the plane and consisting of datoids carrying similar data. After swarming, these sub swarms of datoids can be grouped together as clusters. In the last chapter Yang et al. present a clustering ensemble model using ant colony algorithm with validity index and ART neural network. Clusterings are visually formed on the plane by ants walking, picking up or dropping down projected data objects with different probabilities. Adaptive Resonance Theory (ART) is employed to combine the clusterings produced by ant colonies with different moving speeds. We are very much grateful to the authors of this volume and to the reviewers for their tremendous service by critically reviewing the chapters. The editors would like to thank Dr. Thomas Ditzinger (Springer Engineering Inhouse Editor, Studies in Computational Intelligence Series), Professor Janusz Kacprzyk (Editor-in-Chief, Springer Studies in Computational Intelligence Series) and Ms. Heather King (Editorial Assistant, Springer Verlag, Heidelberg) for the editorial assistance and excellent cooperative collaboration to produce this important scientific work. We hope that the reader will share our excitement to present this volume on ‘Swarm Intelligence in Data Mining’ and will find it useful.

April, 2006
Ajith Abraham, Chung-Ang University, Seoul, Korea
Crina Grosan, Cluj-Napoca, Babes-Bolyai University, Romania
Vitorino Ramos, IST Technical University of Lisbon, Portugal

[...] People should learn how to play Lego with their minds. Concepts are building bricks [...] V. Ramos, 2002.

@ViRAms on Twitter

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Archives

Blog Stats

  • 244,343 hits