<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Xavier Llorà &#187; HBase</title>
	<atom:link href="http://www.xavierllora.net/tag/hbase/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xavierllora.net</link>
	<description>A notebook on data-intensive computing, genetics-based machine learning &#38; more.</description>
	<lastBuildDate>Sun, 08 Jan 2012 19:39:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The next generation of data bases</title>
		<link>http://www.xavierllora.net/2008/06/05/the-next-generation-of-data-bases/</link>
		<comments>http://www.xavierllora.net/2008/06/05/the-next-generation-of-data-bases/#comments</comments>
		<pubDate>Thu, 05 Jun 2008 11:39:34 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
				<category><![CDATA[Notes]]></category>
		<category><![CDATA[couchDB]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[Metadata]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://www.xavierllora.net/?p=238</guid>
		<description><![CDATA[Yesterday I was reading an interview to Brian Aker (MySQL director of technology) I found via Slashdot when something caught my attention. On the second side of this which may actually be more exciting is the issue of&#8211;instead of the structured data world of the relational database but the semi&#8211;the semi-structured world. You look at what is being [...]
No related posts.]]></description>
			<content:encoded><![CDATA[<p>Yesterday I was reading an <a title="Brian Aker interview" href="http://news.oreilly.com/2008/06/brian-akers-vision-for-a-livab.html">interview to Brian Aker</a> (MySQL director of technology) I found via <a title="Slashdot" href="http://developers.slashdot.org/article.pl?sid=08/06/03/210224&amp;from=rss">Slashdot</a> when something caught my attention.</p>
<blockquote><p>On the second side of this which may actually be more exciting is the issue of&#8211;instead of the structured data world of the relational database but the semi&#8211;the semi-structured world. You look at what is being done today with <a title="CouchDB" href="http://incubator.apache.org/couchdb/">CouchDB</a>, you look at Amazon ScaleDB, to a lesser extent but to a similar extent you&#8211;not ScaleDB, SimpleDB&#8211;to a lesser extent or a similar extent <a title="Tokyo cabinet" href="http://tokyocabinet.sourceforge.net/">Tokyo Cabinet</a>, those databases are really kind of fascinating because those databases are redefining really how we access data and how we are going to be searching and using data. So there&#8217;s a whole world out there that&#8217;s just starting to open up in that direction.</p></blockquote>
<p>For a while now, <a title="Previous post on metadata stores" href="http://www.xavierllora.net/?s=metadata+stores">I have been using different flavors of metadata stores</a>. Everything tends to work nice and dandy as long as you do not push the storage volume too far. For instance, together with <a title="Bernie Acs" href="http://www.ncsa.uiuc.edu/AboutUs/People/contact.php?id=885">Bernie Acs</a> at NCSA, we have run experiments where we could deal with up to <a title="Virtuoso" href="http://virtuoso.openlinksw.com/wiki/main/">280 million triples using Virtuoso</a> or up to 60 million triples using <a title="Jena" href="http://jena.sourceforge.net/">Jena</a> with a <a title="MySQL" href="http://www.mysql.com/">MySQL</a> back end without much trouble and still be able to run arbitrary SPARQL queries on a reasonable time. However, these were relatively small tests. The first one represented only 120 documents in a collection, where as the second one was only a subset of the Wikipedia link graph. Yes, there are ways to move beyond via proper striping and replication of the data, but that is not by default a key concern of such engines. Another sad note was that we had to drop Mulgara because we had a pretty hard time pushing it that far (most of the bugs we run into has been fixed since last year and they have started a push toward getting SPARQL in, so it may be time for revisiting it).</p>
<p>However, all the above approaches were not born out of a distributed environment. Lately, I have started looking for alternative large scale storage born from the distributed environment&#8217;s soup. After the <a title="Hadoop Summit and BCDSG" href="http://www.xavierllora.net/2008/03/26/summary-of-bdcsg2008-blogging/">Hadoop Summit/BCDSG 2008</a> trip I started looking into <a title="HBase" href="http://wiki.apache.org/hadoop/Hbase">HBase</a> (open source take to <a title="Big Table" href="http://labs.google.com/papers/bigtable.html">Google&#8217;s Big Table</a> by the <a title="Hadoop" href="http://hadoop.apache.org/core/">Hadoop</a> community). Not a bad alternative if you can fit your application data needs into their structure. Since runs on <a title="Hadoop" href="http://hadoop.apache.org/core/">Hadoop FS</a>  you get all its benefits for free. But, after reading the above-mentioned interview, however, I just found myself intrigued about <a title="CouchDB" href="http://incubator.apache.org/couchdb/">CouchDB</a> and <a title="Tokyo cabinet" href="http://tokyocabinet.sourceforge.net/">Tokyo Cabinet</a>. I guess I better go and take a look at them <img src='http://www.xavierllora.net/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://www.xavierllora.net/2008/06/05/the-next-generation-of-data-bases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

