Package org.apache.lucene.util
Class BytesRefHash
- java.lang.Object
-
- org.apache.lucene.util.BytesRefHash
-
- All Implemented Interfaces:
Accountable
public final class BytesRefHash extends java.lang.Object implements Accountable
BytesRefHash
is a special purpose hash-map like data-structure optimized forBytesRef
instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated insideBytesRefHash
and is guaranteed to be increased for each addedBytesRef
.Note: The maximum capacity
BytesRef
instance passed toadd(BytesRef)
must not be longer thanByteBlockPool.BYTE_BLOCK_SIZE
-2. The internal storage is limited to 2GB total byte storage.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BytesRefHash.BytesStartArray
Manages allocation of the per-term addresses.static class
BytesRefHash.DirectBytesStartArray
A simpleBytesRefHash.BytesStartArray
that tracks memory allocation using a privateCounter
instance.static class
BytesRefHash.MaxBytesLengthExceededException
-
Field Summary
Fields Modifier and Type Field Description private static long
BASE_RAM_BYTES
(package private) int[]
bytesStart
private BytesRefHash.BytesStartArray
bytesStartArray
private Counter
bytesUsed
private int
count
static int
DEFAULT_CAPACITY
private int
hashHalfSize
private int
hashMask
private int
hashSize
private int[]
ids
private int
lastCount
(package private) ByteBlockPool
pool
private BytesRef
scratch1
-
Constructor Summary
Constructors Constructor Description BytesRefHash()
BytesRefHash(ByteBlockPool pool)
Creates a newBytesRefHash
BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
Creates a newBytesRefHash
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
add(BytesRef bytes)
Adds a newBytesRef
int
addByPoolOffset(int offset)
Adds a "arbitrary" int offset instead of a BytesRef term.int
byteStart(int bytesID)
Returns the bytesStart offset into the internally usedByteBlockPool
for the given bytesIDvoid
clear()
void
clear(boolean resetPool)
void
close()
Closes the BytesRefHash and releases all internally used memoryint[]
compact()
Returns the ids array in arbitrary order.private int
doHash(byte[] bytes, int offset, int length)
private boolean
equals(int id, BytesRef b)
int
find(BytesRef bytes)
Returns the id of the givenBytesRef
.private int
findHash(BytesRef bytes)
BytesRef
get(int bytesID, BytesRef ref)
Populates and returns aBytesRef
with the bytes for the given bytesID.long
ramBytesUsed()
Return the memory usage of this object in bytes.private void
rehash(int newSize, boolean hashOnData)
Called when hash is too small (> 50%
occupied) or too large (< 20%
occupied).void
reinit()
reinitializes theBytesRefHash
after a previousclear()
call.private boolean
shrink(int targetSize)
int
size()
Returns the number ofBytesRef
values in thisBytesRefHash
.int[]
sort()
Returns the values array sorted by the referenced byte values.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
BASE_RAM_BYTES
private static final long BASE_RAM_BYTES
-
DEFAULT_CAPACITY
public static final int DEFAULT_CAPACITY
- See Also:
- Constant Field Values
-
pool
final ByteBlockPool pool
-
bytesStart
int[] bytesStart
-
scratch1
private final BytesRef scratch1
-
hashSize
private int hashSize
-
hashHalfSize
private int hashHalfSize
-
hashMask
private int hashMask
-
count
private int count
-
lastCount
private int lastCount
-
ids
private int[] ids
-
bytesStartArray
private final BytesRefHash.BytesStartArray bytesStartArray
-
bytesUsed
private Counter bytesUsed
-
-
Constructor Detail
-
BytesRefHash
public BytesRefHash()
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool)
Creates a newBytesRefHash
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
Creates a newBytesRefHash
-
-
Method Detail
-
size
public int size()
Returns the number ofBytesRef
values in thisBytesRefHash
.- Returns:
- the number of
BytesRef
values in thisBytesRefHash
.
-
get
public BytesRef get(int bytesID, BytesRef ref)
Populates and returns aBytesRef
with the bytes for the given bytesID.Note: the given bytesID must be a positive integer less than the current size (
size()
)- Parameters:
bytesID
- the idref
- theBytesRef
to populate- Returns:
- the given BytesRef instance populated with the bytes for the given bytesID
-
compact
public int[] compact()
Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit ofsize()
- 1Note: This is a destructive operation.
clear()
must be called in order to reuse thisBytesRefHash
instance.
-
sort
public int[] sort()
Returns the values array sorted by the referenced byte values.Note: This is a destructive operation.
clear()
must be called in order to reuse thisBytesRefHash
instance.
-
equals
private boolean equals(int id, BytesRef b)
-
shrink
private boolean shrink(int targetSize)
-
clear
public void clear(boolean resetPool)
-
clear
public void clear()
-
close
public void close()
Closes the BytesRefHash and releases all internally used memory
-
add
public int add(BytesRef bytes)
Adds a newBytesRef
- Parameters:
bytes
- the bytes to hash- Returns:
- the id the given bytes are hashed if there was no mapping for the
given bytes, otherwise
(-(id)-1)
. This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. - Throws:
BytesRefHash.MaxBytesLengthExceededException
- if the given bytes are> 2 +
ByteBlockPool.BYTE_BLOCK_SIZE
-
find
public int find(BytesRef bytes)
Returns the id of the givenBytesRef
.- Parameters:
bytes
- the bytes to look for- Returns:
- the id of the given bytes, or
-1
if there is no mapping for the given bytes.
-
findHash
private int findHash(BytesRef bytes)
-
addByPoolOffset
public int addByPoolOffset(int offset)
Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
-
rehash
private void rehash(int newSize, boolean hashOnData)
Called when hash is too small (> 50%
occupied) or too large (< 20%
occupied).
-
doHash
private int doHash(byte[] bytes, int offset, int length)
-
reinit
public void reinit()
reinitializes theBytesRefHash
after a previousclear()
call. Ifclear()
has not been called previously this method has no effect.
-
byteStart
public int byteStart(int bytesID)
Returns the bytesStart offset into the internally usedByteBlockPool
for the given bytesID- Parameters:
bytesID
- the id to look up- Returns:
- the bytesStart offset into the internally used
ByteBlockPool
for the given id
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
-