tag:blogger.com,1999:blog-8623074010562846957.post6298478417350691637..comments2023-09-01T03:38:08.236-04:00Comments on Changing Bits: Lucene performance with the PForDelta codecMichael McCandlesshttp://www.blogger.com/profile/04277432937861334672noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-8623074010562846957.post-15048393543823593902011-04-28T12:37:18.019-04:002011-04-28T12:37:18.019-04:00Anonymous,
That's great! We have a branch of...Anonymous,<br /><br />That's great! We have a branch off Lucene's trunk right now to improve how intblock codecs integrate into Lucene's low level postings enumeration APIs...<br /><br />Those two packages look very interesting; thanks for sharing.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-4478962764912726682011-04-27T21:22:06.418-04:002011-04-27T21:22:06.418-04:00Well, I found my way here searching for "PFor...Well, I found my way here searching for "PForDelta", and how interesting to find that someone is looking at it in relation to Lucene itself, which I also use. :)<br /><br />This post makes me curious about http://code.google.com/p/javaewah/ and http://ricerca.mat.uniroma3.it/users/colanton/concise.html .. in particular, concise seems to be nice for performing set operations.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-60678078993311382152010-12-14T06:16:37.943-05:002010-12-14T06:16:37.943-05:00Hi fancyerii,
Looks like you also posted on Lucen...Hi fancyerii,<br /><br />Looks like you also posted on Lucene/Solr's dev list -- I just responded there.<br /><br />On how I integrated PFor into Lucene, the patch here was hacked up, but we are working on the "real" integration under LUCENE-2723, which should make intblock codecs easy and high-performance to plug into Lucene as a Codec.<br /><br />MikeMichael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-86057808330336559212010-12-14T03:09:30.005-05:002010-12-14T03:09:30.005-05:00Hi
I tried integrating PForDelta into lucene 2....Hi<br /> I tried integrating PForDelta into lucene 2.9 but confronted a problem.<br /> I use the implementation in http://code.google.com/p/integer-array-compress-kit/<br /> it implements a basic PForDelta algorithm and an improved one(which called NewPForDelta, but there are many bugs and I have fixed them),<br /> But compare it with VInt and S9, it's speed is very slow when only decode small number of integer arrays.<br /> e.g. when I decoded int[256] arrays which values are randomly generated between 0 and 100, if decode just one array. PFor(or NewPFor) is very slow. when it continuously decodes many arrays such as 10000, it's faster than s9 and vint. <br /> Another strange phenomena is that when call PFor decoder twice, the 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor is faster. reverse the call sequcence, the 2nd called decoder is faster<br /> e.g.<br /> ct.testNewPFDCodes(list);<br /> ct.testPFor(list);<br /> ct.testVInt(list);<br /> ct.testS9(list);<br /><br />NewPFD decode: 3614705<br />PForDelta decode: 17320<br />VINT decode: 16483<br />S9 decode: 19835<br />when I call by the following sequence<br /><br /> ct.testPFor(list);<br /> ct.testNewPFDCodes(list);<br /> ct.testVInt(list);<br /> ct.testS9(list);<br /><br />PForDelta decode: 3212140<br />NewPFD decode: 19556<br />VINT decode: 16762<br />S9 decode: 16483<br /><br /> how did you integrate PFor into lucene? my implementation is -- group docIDs and termDocFreqs into block which contains 128 integers. when SegmentTermDocs's next method called(or read readNoTf).it decodes a block and save it to a cache.Unknownhttps://www.blogger.com/profile/07076380799784347625noreply@blogger.com