<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Xavier Llorà &#187; map-reduce</title>
	<atom:link href="http://www.xavierllora.net/tag/map-reduce/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xavierllora.net</link>
	<description>A notebook about data-intensive computing, genetics-based machine learning, semantic-web technology, cloud computing,  and more.</description>
	<lastBuildDate>Thu, 15 Jul 2010 19:50:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Scaling eCGA Model Building via Data-Intensive Computing</title>
		<link>http://www.xavierllora.net/2010/04/08/scaling-ecga-model-building-via-data-intensive-computing/</link>
		<comments>http://www.xavierllora.net/2010/04/08/scaling-ecga-model-building-via-data-intensive-computing/#comments</comments>
		<pubDate>Thu, 08 Apr 2010 16:17:39 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
				<category><![CDATA[Data-Intensive Computing]]></category>
		<category><![CDATA[Estimation of Distribution Algorithms]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[eCGA]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[map-reduce]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[pro]]></category>

		<guid isPermaLink="false">http://www.xavierllora.net/?p=664</guid>
		<description><![CDATA[I just uploaded the technical report of the paper we put together for CEC 2010 on how we can scale up eCGA using a MapReduce approach. The paper, besides exploring the Hadoop implementation, it also presents some very compelling results obtained with MongoDB (a document based store able to perform parallel MapReduce tasks via sharding). [...]


Related posts:<ol><li><a href='http://www.xavierllora.net/2009/10/09/scaling-genetic-algorithms-using-mapreduce/' rel='bookmark' title='Permanent Link: Scaling Genetic Algorithms using MapReduce'>Scaling Genetic Algorithms using MapReduce</a></li>
<li><a href='http://www.xavierllora.net/2009/01/29/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre'>Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre</a></li>
<li><a href='http://www.xavierllora.net/2009/07/13/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre-2/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre'>Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I just uploaded the technical report of the paper we put together for <a href="http://www.wcci2010.org/">CEC 2010</a> on how we can scale up eCGA using a MapReduce approach. The paper, besides exploring the <a href="http://hadoop.apache.org/">Hadoop</a> implementation, it also presents some very compelling results obtained with <a href="http://www.mongodb.org/display/DOCS/Home">MongoDB</a> (a document based store able to perform parallel MapReduce tasks via sharding). The paper is available as <a href="http://www.illigal.uiuc.edu/pub/papers/IlliGALs/2010001.pdf">PDF</a> and <a href="http://www.illigal.uiuc.edu/pub/papers/IlliGALs/2010001.ps.Z">PS</a>.</p>
<p><strong>Abstract:</strong><br />
This paper shows how the extended compact genetic algorithm can be scaled using data-intensive computing techniques such as MapReduce. Two different frameworks (Hadoop and MongoDB) are used to deploy MapReduce implementations of the compact and extended com- pact genetic algorithms. Results show that both are good choices to deal with large-scale problems as they can scale with the number of commodity machines, as opposed to previous ef- forts with other techniques that either required specialized high-performance hardware or shared memory environments.</p>


<p>Related posts:<ol><li><a href='http://www.xavierllora.net/2009/10/09/scaling-genetic-algorithms-using-mapreduce/' rel='bookmark' title='Permanent Link: Scaling Genetic Algorithms using MapReduce'>Scaling Genetic Algorithms using MapReduce</a></li>
<li><a href='http://www.xavierllora.net/2009/01/29/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre'>Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre</a></li>
<li><a href='http://www.xavierllora.net/2009/07/13/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre-2/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre'>Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.xavierllora.net/2010/04/08/scaling-ecga-model-building-via-data-intensive-computing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Scaling Genetic Algorithms using MapReduce</title>
		<link>http://www.xavierllora.net/2009/10/09/scaling-genetic-algorithms-using-mapreduce/</link>
		<comments>http://www.xavierllora.net/2009/10/09/scaling-genetic-algorithms-using-mapreduce/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 15:51:19 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Data-Intensive Computing]]></category>
		<category><![CDATA[Estimation of Distribution Algorithms]]></category>
		<category><![CDATA[Publications]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Technical Reports]]></category>
		<category><![CDATA[genetic algorithms]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[map-reduce]]></category>

		<guid isPermaLink="false">http://www.xavierllora.net/?p=634</guid>
		<description><![CDATA[Below you may find the abstract to and the link to the technical report of the paper entitled &#8220;Scaling Genetic Algorithms using MapReduce&#8221; that will be presented at the Ninth International Conference on Intelligent Systems Design and Applications (ISDA) 2009 by Verma, A., Llorà, X., Campbell, R.H., Goldberg, D.E. next month. Abstract:Genetic algorithms(GAs) are increasingly [...]


Related posts:<ol><li><a href='http://www.xavierllora.net/2010/04/08/scaling-ecga-model-building-via-data-intensive-computing/' rel='bookmark' title='Permanent Link: Scaling eCGA Model Building via Data-Intensive Computing'>Scaling eCGA Model Building via Data-Intensive Computing</a></li>
<li><a href='http://www.xavierllora.net/2009/07/13/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre-2/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre'>Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre</a></li>
<li><a href='http://www.xavierllora.net/2009/01/29/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre'>Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Below you may find the abstract to and the link to the technical report of the paper entitled <em>&#8220;Scaling Genetic Algorithms using MapReduce&#8221;</em> that will be presented at the <a href="">Ninth International Conference on Intelligent Systems Design and Applications (ISDA) 2009</a> by Verma, A., Llorà, X., Campbell, R.H., Goldberg, D.E. next month. </p>
<p><strong>Abstract:</strong>Genetic algorithms(GAs) are increasingly being applied to large scale problems. The traditional MPI-based parallel GAs do not scale very well. MapReduce is a powerful abstraction developed by Google for making scalable and fault tolerant applications. In this paper, we mould genetic algorithms into the the MapReduce model. We describe the algorithm design and implementation of GAs on Hadoop, the open source implementation of MapReduce. Our experiments demonstrate the convergence and scalability upto 105 variable problems. Adding more resources would enable us to solve even larger problems without any changes in the algorithms and implementation.</p>
<p>The draft of the paper can be downloaded as <a href="http://www.illigal.uiuc.edu/pub/papers/IlliGALs/2009007.pdf">IlliGAL TR. No. 2009007</a>. For more information see the <a href="http://www.illigal.uiuc.edu/web/technical-reports/2009/10/09/scaling-genetic-algorithms-using-mapreduce/">IlliGAL technical reports web site</a>.</p>


<p>Related posts:<ol><li><a href='http://www.xavierllora.net/2010/04/08/scaling-ecga-model-building-via-data-intensive-computing/' rel='bookmark' title='Permanent Link: Scaling eCGA Model Building via Data-Intensive Computing'>Scaling eCGA Model Building via Data-Intensive Computing</a></li>
<li><a href='http://www.xavierllora.net/2009/07/13/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre-2/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre'>Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre</a></li>
<li><a href='http://www.xavierllora.net/2009/01/29/data-intensive-computing-for-competent-genetic-algorithms-a-pilot-study-using-meandre/' rel='bookmark' title='Permanent Link: Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre'>Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using  Meandre</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.xavierllora.net/2009/10/09/scaling-genetic-algorithms-using-mapreduce/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Large Scale Data Mining using Genetics-Based Machine Learning</title>
		<link>http://www.xavierllora.net/2009/07/15/large-scale-data-mining-using-genetics-based-machine-learning/</link>
		<comments>http://www.xavierllora.net/2009/07/15/large-scale-data-mining-using-genetics-based-machine-learning/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 21:56:17 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
				<category><![CDATA[Data-Intensive Computing]]></category>
		<category><![CDATA[GBML & LCS]]></category>
		<category><![CDATA[Learning Classifier Systems]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[data-intensive flows]]></category>
		<category><![CDATA[genetics-based machine learning]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[LCS]]></category>
		<category><![CDATA[map-reduce]]></category>

		<guid isPermaLink="false">http://www.xavierllora.net/?p=568</guid>
		<description><![CDATA[Below you may find the slides of the GECCO 2009 tutorial that Jaume Bacardit and I put together. Hope you enjoy it. Slides Abstract We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope [...]


Related posts:<ol><li><a href='http://www.xavierllora.net/2006/12/13/observer-invariant-histopathology-using-genetics-based-machine-learning/' rel='bookmark' title='Permanent Link: Observer-Invariant Histopathology using Genetics-Based Machine Learning'>Observer-Invariant Histopathology using Genetics-Based Machine Learning</a></li>
<li><a href='http://www.xavierllora.net/2009/04/07/deadline-extended-for-special-issue-on-metaheuristics-for-large-scale-data-mining/' rel='bookmark' title='Permanent Link: Deadline extended for special issue on Metaheuristics for Large Scale Data Mining'>Deadline extended for special issue on Metaheuristics for Large Scale Data Mining</a></li>
<li><a href='http://www.xavierllora.net/2008/03/26/bdcsg2008-algorithmic-perspectives-on-large-scale-social-network-data-jon-kleinberg/' rel='bookmark' title='Permanent Link: [BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)'>[BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Below you may find the slides of the <a href="http://www.sigevo.org/gecco-2009/tutorials.html#lsdm">GECCO 2009 tutorial</a> that <a href="http://www.cs.nott.ac.uk/~jqb/">Jaume Bacardit</a> and I put together. Hope you enjoy it.</p>
<p><strong>Slides</strong></p>
<object width="425&type=s" height="348"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=gecco2009largegbmltutorial-090715163244-phpapp01"/><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=gecco2009largegbmltutorial-090715163244-phpapp01"  type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425&type=s" height="348"></embed></object>
<p><strong>Abstract</strong></p>
<p>We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.</p>
<p>This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.</p>


<p>Related posts:<ol><li><a href='http://www.xavierllora.net/2006/12/13/observer-invariant-histopathology-using-genetics-based-machine-learning/' rel='bookmark' title='Permanent Link: Observer-Invariant Histopathology using Genetics-Based Machine Learning'>Observer-Invariant Histopathology using Genetics-Based Machine Learning</a></li>
<li><a href='http://www.xavierllora.net/2009/04/07/deadline-extended-for-special-issue-on-metaheuristics-for-large-scale-data-mining/' rel='bookmark' title='Permanent Link: Deadline extended for special issue on Metaheuristics for Large Scale Data Mining'>Deadline extended for special issue on Metaheuristics for Large Scale Data Mining</a></li>
<li><a href='http://www.xavierllora.net/2008/03/26/bdcsg2008-algorithmic-perspectives-on-large-scale-social-network-data-jon-kleinberg/' rel='bookmark' title='Permanent Link: [BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)'>[BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.xavierllora.net/2009/07/15/large-scale-data-mining-using-genetics-based-machine-learning/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
