<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: In-Memory Map Compression?</title>
	<atom:link href="http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/feed/" rel="self" type="application/rss+xml" />
	<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/</link>
	<description>Technology and Geek Stuff by Eric Burke</description>
	<lastBuildDate>Tue, 24 Aug 2010 19:55:26 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Pi</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-3187</link>
		<dc:creator>Pi</dc:creator>
		<pubDate>Tue, 18 Dec 2007 04:09:10 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-3187</guid>
		<description>There is a problem here. HashMap is usually used for accessing random keys whereas partial decompression is often done in blocks which is good for sequencial access. In terms of performance, the compressed map thing might not be sufficiently efficient.</description>
		<content:encoded><![CDATA[<p>There is a problem here. HashMap is usually used for accessing random keys whereas partial decompression is often done in blocks which is good for sequencial access. In terms of performance, the compressed map thing might not be sufficiently efficient.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ashish</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-3185</link>
		<dc:creator>ashish</dc:creator>
		<pubDate>Tue, 18 Dec 2007 02:40:42 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-3185</guid>
		<description>1. 
I would suggest to look into jms queue persistent mechanism. They use pretty fast file based persistence for fail-over. 

2.
if u getting data from DB, fine tuning hibernate cache &amp; lazy loading might work

3. if data source is different from DB, then persisting to file might be fastest. (compare to DB)</description>
		<content:encoded><![CDATA[<p>1.<br />
I would suggest to look into jms queue persistent mechanism. They use pretty fast file based persistence for fail-over. </p>
<p>2.<br />
if u getting data from DB, fine tuning hibernate cache &amp; lazy loading might work</p>
<p>3. if data source is different from DB, then persisting to file might be fastest. (compare to DB)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Charlie Hubbard</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-3165</link>
		<dc:creator>Charlie Hubbard</dc:creator>
		<pubDate>Mon, 17 Dec 2007 22:53:04 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-3165</guid>
		<description>I think what you are referring to is a property of using the Flyweight pattern in that collapsing/sharing references reduces memory footprints.  I have done something similar when interacting with JDBC and RMI before.  Say I&#039;m joining two tables together, and one of those tables has largely similar data, and the other is very unique.  For example maybe I&#039;m joining to a types table or something like a tags table.  Lots of things are tagging with the same string so why not store that tag just once.  So in order to collapse that memory I pass all tags through a HashMap.  Something like:

public class Flyweight implements Map {
  private Map map = new HashMap();

  public Object collapse( Object key ) {
      if( !map.containsKey( key ) ) {
        map.put( key, key );
      }
      return map.get( key );
  }
}

Flyweight flyweight = new FlyWeight();
obj.setTag( flyweight.collapse( resultSet.getString( i ) );

That makes sure I get collapse all references to similar Strings down to one actual object, and all references point at that shared object.  Using a HashSet might be better.  Great for immutable data like String.  We did this to compress the amount of data being held in memory, but also serialized the resulting structures over RMI.  Java serialization is awesome in this regard because it will reconstruct the memory graph verbatim on the client side so the compress transports across the wire.  We cut memory usage by 75% in some cases using this technique.  It doesn&#039;t have to be just strings though.  String.intern() is a highly specialized version of this, but you have to be very careful with it because those things that are intern()&#039;ed last the lifetime of the program.

Charlie</description>
		<content:encoded><![CDATA[<p>I think what you are referring to is a property of using the Flyweight pattern in that collapsing/sharing references reduces memory footprints.  I have done something similar when interacting with JDBC and RMI before.  Say I&#8217;m joining two tables together, and one of those tables has largely similar data, and the other is very unique.  For example maybe I&#8217;m joining to a types table or something like a tags table.  Lots of things are tagging with the same string so why not store that tag just once.  So in order to collapse that memory I pass all tags through a HashMap.  Something like:</p>
<p>public class Flyweight implements Map {<br />
  private Map map = new HashMap();</p>
<p>  public Object collapse( Object key ) {<br />
      if( !map.containsKey( key ) ) {<br />
        map.put( key, key );<br />
      }<br />
      return map.get( key );<br />
  }<br />
}</p>
<p>Flyweight flyweight = new FlyWeight();<br />
obj.setTag( flyweight.collapse( resultSet.getString( i ) );</p>
<p>That makes sure I get collapse all references to similar Strings down to one actual object, and all references point at that shared object.  Using a HashSet might be better.  Great for immutable data like String.  We did this to compress the amount of data being held in memory, but also serialized the resulting structures over RMI.  Java serialization is awesome in this regard because it will reconstruct the memory graph verbatim on the client side so the compress transports across the wire.  We cut memory usage by 75% in some cases using this technique.  It doesn&#8217;t have to be just strings though.  String.intern() is a highly specialized version of this, but you have to be very careful with it because those things that are intern()&#8217;ed last the lifetime of the program.</p>
<p>Charlie</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: afisna</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-3135</link>
		<dc:creator>afisna</dc:creator>
		<pubDate>Mon, 17 Dec 2007 13:54:04 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-3135</guid>
		<description>womble: 
it is too bold to talk about something related with java yet not use or know anything about it. i very much doubt that the overhead is significant using JDK maps, i would urge you to check the HashMap implementation in JDK or NIO classes for memory mapped files. Plus, you can go to the way you said by writing a custom map implementation using primitives, i am sure it will be easier than C-C++, but i doubt it worth the pain anyway.</description>
		<content:encoded><![CDATA[<p>womble:<br />
it is too bold to talk about something related with java yet not use or know anything about it. i very much doubt that the overhead is significant using JDK maps, i would urge you to check the HashMap implementation in JDK or NIO classes for memory mapped files. Plus, you can go to the way you said by writing a custom map implementation using primitives, i am sure it will be easier than C-C++, but i doubt it worth the pain anyway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: womble</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-3077</link>
		<dc:creator>womble</dc:creator>
		<pubDate>Mon, 17 Dec 2007 01:30:24 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-3077</guid>
		<description>a bunch of ways, offset into a memory mapped file, malloc it in, and so on. The problem was the overhead introduced by the java objects and hashmap? The usual way to do this in a non GC language is create a proxy object for each of your different types and create and destroy them on the fly (flyweight pattern I think it&#039;s called), you could probably do this in java some how - not my area.

cheers</description>
		<content:encoded><![CDATA[<p>a bunch of ways, offset into a memory mapped file, malloc it in, and so on. The problem was the overhead introduced by the java objects and hashmap? The usual way to do this in a non GC language is create a proxy object for each of your different types and create and destroy them on the fly (flyweight pattern I think it&#8217;s called), you could probably do this in java some how &#8211; not my area.</p>
<p>cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: afsina</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-3023</link>
		<dc:creator>afsina</dc:creator>
		<pubDate>Sat, 15 Dec 2007 23:54:54 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-3023</guid>
		<description>womble: err.. ok you hashed the data cool. where are you storing the &quot;actual&quot; data?</description>
		<content:encoded><![CDATA[<p>womble: err.. ok you hashed the data cool. where are you storing the &#8220;actual&#8221; data?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: womble</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-3021</link>
		<dc:creator>womble</dc:creator>
		<pubDate>Sat, 15 Dec 2007 22:58:12 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-3021</guid>
		<description>20MB, whats the problem? just hash it, each entry uses a pointer - 4 bytes. 10,000 entries = 40K, done, for multiple entries add a linked list to the hash entry. Oh it&#039;s java tee hee, lots of luck.</description>
		<content:encoded><![CDATA[<p>20MB, whats the problem? just hash it, each entry uses a pointer &#8211; 4 bytes. 10,000 entries = 40K, done, for multiple entries add a linked list to the hash entry. Oh it&#8217;s java tee hee, lots of luck.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Miler</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-2995</link>
		<dc:creator>Alex Miler</dc:creator>
		<pubDate>Sat, 15 Dec 2007 04:03:29 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-2995</guid>
		<description>Embedded db may be an overkill but if not, H2 and db40 are excellent options, although I&#039;d also throw Derby and Berkeley DB in the mix as well.</description>
		<content:encoded><![CDATA[<p>Embedded db may be an overkill but if not, H2 and db40 are excellent options, although I&#8217;d also throw Derby and Berkeley DB in the mix as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: afsina</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-2979</link>
		<dc:creator>afsina</dc:creator>
		<pubDate>Fri, 14 Dec 2007 22:39:59 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-2979</guid>
		<description>i think an embedded database may be a solution for this, they have cached table support. H2 is a good choice IMO.
Another alternative may be db4o , it is very easy to use and pretty fast. it uses disk, i don&#039;t know the cache options. 
Or as mentioned earlier, Ehcache may be an alternative too.</description>
		<content:encoded><![CDATA[<p>i think an embedded database may be a solution for this, they have cached table support. H2 is a good choice IMO.<br />
Another alternative may be db4o , it is very easy to use and pretty fast. it uses disk, i don&#8217;t know the cache options.<br />
Or as mentioned earlier, Ehcache may be an alternative too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Miler</title>
		<link>http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/comment-page-1/#comment-2974</link>
		<dc:creator>Alex Miler</dc:creator>
		<pubDate>Fri, 14 Dec 2007 20:17:24 +0000</pubDate>
		<guid isPermaLink="false">http://stuffthathappens.com/blog/2007/12/14/in-memory-map-compression/#comment-2974</guid>
		<description>Several things spring to mind in no organized fashion:

1) If many strings are identical, you could intern() them, although sounds like they may already be literal/constant strings and auto-interned

2) Or if many strings are identical *pattern* with inserted params, you could store them as pattern+values instead of with the values replaced and then your patterns could be interned.

3) You could use a cache that spills to disk (ehcache is quite good)

4) JSR 203 for Java 7 is NIO 2 (the new new IO :) and includes an overhaul of file system access.  Part of the design is a filesystem abstraction, that should let you treat a zip archive file system just like your disk file system (and potentially just like an in-memory file system).  From talking to those guys, memory-based file systems were definitely a consideration.  I&#039;m not sure that solves your problem at all, just an interesting related thought. :)

5) You could implement Map, proxy another Map and compress/decompress big strings in the proxy.  That seems relatively straightforward.

6) Terracotta does big string compression like this automatically to avoid passing humongous strings around the system.  

7) If your big strings happen to be XML, there are several optimized object forms out there for XML that are much more efficient than a string.  Saxon has a TinyTree structure that&#039;s pretty great and I believe Nux has some good stuff too.  Plus there are some binary XML libs but I haven&#039;t worked with those.</description>
		<content:encoded><![CDATA[<p>Several things spring to mind in no organized fashion:</p>
<p>1) If many strings are identical, you could intern() them, although sounds like they may already be literal/constant strings and auto-interned</p>
<p>2) Or if many strings are identical *pattern* with inserted params, you could store them as pattern+values instead of with the values replaced and then your patterns could be interned.</p>
<p>3) You could use a cache that spills to disk (ehcache is quite good)</p>
<p>4) JSR 203 for Java 7 is NIO 2 (the new new IO <img src='http://stuffthathappens.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  and includes an overhaul of file system access.  Part of the design is a filesystem abstraction, that should let you treat a zip archive file system just like your disk file system (and potentially just like an in-memory file system).  From talking to those guys, memory-based file systems were definitely a consideration.  I&#8217;m not sure that solves your problem at all, just an interesting related thought. <img src='http://stuffthathappens.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>5) You could implement Map, proxy another Map and compress/decompress big strings in the proxy.  That seems relatively straightforward.</p>
<p>6) Terracotta does big string compression like this automatically to avoid passing humongous strings around the system.  </p>
<p>7) If your big strings happen to be XML, there are several optimized object forms out there for XML that are much more efficient than a string.  Saxon has a TinyTree structure that&#8217;s pretty great and I believe Nux has some good stuff too.  Plus there are some binary XML libs but I haven&#8217;t worked with those.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.262 seconds -->
