AI

LSH Banding for Large-Scale Retrieval with Memory and Recall Constraints

Abstract

Locality Sensitive Hashing (LSH) is widely used for efficient retrieval of candidate matches in very large audio, video, and image systems. However, extremely large reference databases necessitate a guaranteed limit on the memory used by the table lookup itself, no matter how the entries crowd different parts of the signature space, a guarantee that LSH does not give. In this paper, we provide such guaranteed limits, primarily through the design of the LSH bands. When combined with data-adaptive bin splitting (needed on only 0.04% of the occupied bins), this approach provides the required guarantee on memory usage. At the same time, it avoids the reduced recall that more extensive use of bin splitting would give.