
-----------------------------------
aiolos
Tue 10 Apr, 2007

Similarity searches accelerate P2P downloads by 30-70 percen
-----------------------------------
P2P file sharing offers the possibility of dramatically increased download speeds by avoiding the problems associated with a single <b style="color:#FFA34F"></b>(overloaded)<b style="color:#FFA34F"></b> server.<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b> But even P2P downloads can bog down when files are first seeded or if few active users have a copy of the file.<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b> A research team with members at Carnegie Mellon,<b style="color:#FFA34F"></b> Purdue,<b style="color:#FFA34F"></b> and Intel thinks they've found a way around some of these limitations using what they call Similarity-Enhanced Transfer <b style="color:#FFA34F"></b>(SET)<b style="color:#FFA34F"></b>,<b style="color:#FFA34F"></b> a technique they claim can speed up P2P downloads by anywhere from 30 to 70 percent.<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b> They'll be presenting their technique at the 4th Symposium on Networked Systems Design and Implementation tomorrow.<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
<b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
The <b style="color:#FFA34F"></b>"Similarity"<b style="color:#FFA34F"></b> portion of SET comes from the realization that many of the files being shared contain pieces of identical data.<b style="color:#FFA34F"></b> Examples include music files that differ only in terms of tags,<b style="color:#FFA34F"></b> movies or movie trailers that are dubbed in different languages,<b style="color:#FFA34F"></b> and updated versions of software.<b style="color:#FFA34F"></b> Like other P2P systems,<b style="color:#FFA34F"></b> SET divides large files into small segments.<b style="color:#FFA34F"></b> Once that process is complete,<b style="color:#FFA34F"></b> however,<b style="color:#FFA34F"></b> the SET software searches for similar files using a method called <b style="color:#FFA34F"></b>"handprinting,<b style="color:#FFA34F"></b>"<b style="color:#FFA34F"></b> which is similar to the pattern matching techniques used to cluster search results or filter spam.<b style="color:#FFA34F"></b> Once similar files are identified,<b style="color:#FFA34F"></b> they are scanned for any individual chunks that are identical to pieces of the file being downloaded.<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
<b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
As a result,<b style="color:#FFA34F"></b> SET should greatly expand the available sources of any given file.<b style="color:#FFA34F"></b> In practice,<b style="color:#FFA34F"></b> it seemed to work pretty well.<b style="color:#FFA34F"></b> Using existing P2P networks,<b style="color:#FFA34F"></b> they were able to grab a 30MB movie trailer in only a third of the time,<b style="color:#FFA34F"></b> since their software was able to find other sources that shared about 50 percent similarity.<b style="color:#FFA34F"></b> The rate of an MP3 download shot up by over 70 percent.<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
<b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
We may be able to see SET appearing in clients and distribution services soon.<b style="color:#FFA34F"></b> The presentation will come with actual implementation code,<b style="color:#FFA34F"></b> and the team hopes to see others put it to use.<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b>"This is a technique that I would like people to steal,<b style="color:#FFA34F"></b>"<b style="color:#FFA34F"></b> said David Andersen of Carnegie Mellon,<b style="color:#FFA34F"></b> <b style="color:#FFA34F"></b>"Developers should just take the idea and use it in their own systems.<b style="color:#FFA34F"></b>"<b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
<b style="color:#FFA34F"></b><b style="color:#FFA34F"></b>
http:<b style="color:#FFA34F"></b>/<b style="color:#FFA34F"></b>/arstechnica.com/news.ars/post/20070410-accelerated-p2p-by-similarity-searches.html
