Fast kNN graph construction with locality sensitive hashing

Yan Ming Zhang, Kaizhu Huang, Guanggang Geng, Cheng Lin Liu

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

54 Citations (Scopus)

Abstract

The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graph-based learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n2), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d + logn)n) only (d is the dimensionality and l is usually a small number). This is much faster than most existing fast methods. Specifically, we engage the locality sensitive hashing technique to divide items into small subsets with equal size, and then build one kNN graph on each subset using the brute force method. To enhance the approximation quality, we repeat this procedure for several times to generate multiple basic approximate graphs, and combine them to yield a high quality graph. Compared with existing methods, the proposed approach has features that are: (1) much more efficient in speed (2) applicable to generic similarity measures; (3) easy to parallelize. Finally, on three benchmark large-scale data sets, our method beats existing fast methods with obvious advantages.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings
Pages660-674
Number of pages15
EditionPART 2
DOIs
Publication statusPublished - 2013
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 - Prague, Czech Republic
Duration: 23 Sept 201327 Sept 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume8189 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013
Country/TerritoryCzech Republic
CityPrague
Period23/09/1327/09/13

Keywords

  • graph construction
  • graph-based machine learning
  • locality sensitive hashing

Fingerprint

Dive into the research topics of 'Fast kNN graph construction with locality sensitive hashing'. Together they form a unique fingerprint.

Cite this