Fast kNN graph construction with locality sensitive hashing

Yan Ming Zhang; Kaizhu Huang; Guanggang Geng; Cheng Lin Liu

doi:10.1007/978-3-642-40991-2_42

Fast kNN graph construction with locality sensitive hashing

Yan Ming Zhang, Kaizhu Huang, Guanggang Geng, Cheng Lin Liu

School of Advanced Technology

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

61 Citations (Scopus)

Abstract

The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graph-based learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n²), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d + logn)n) only (d is the dimensionality and l is usually a small number). This is much faster than most existing fast methods. Specifically, we engage the locality sensitive hashing technique to divide items into small subsets with equal size, and then build one kNN graph on each subset using the brute force method. To enhance the approximation quality, we repeat this procedure for several times to generate multiple basic approximate graphs, and combine them to yield a high quality graph. Compared with existing methods, the proposed approach has features that are: (1) much more efficient in speed (2) applicable to generic similarity measures; (3) easy to parallelize. Finally, on three benchmark large-scale data sets, our method beats existing fast methods with obvious advantages.

Original language	English
Title of host publication	Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings
Pages	660-674
Number of pages	15
Edition	PART 2
DOIs	https://doi.org/10.1007/978-3-642-40991-2_42
Publication status	Published - 2013
Event	European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 - Prague, Czech Republic Duration: 23 Sept 2013 → 27 Sept 2013

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Number	PART 2
Volume	8189 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013
Country/Territory	Czech Republic
City	Prague
Period	23/09/13 → 27/09/13

Keywords

graph construction
graph-based machine learning
locality sensitive hashing

Access to Document

10.1007/978-3-642-40991-2_42

Cite this

Zhang, Y. M., Huang, K., Geng, G., & Liu, C. L. (2013). Fast kNN graph construction with locality sensitive hashing. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings (PART 2 ed., pp. 660-674). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8189 LNAI, No. PART 2). https://doi.org/10.1007/978-3-642-40991-2_42

Zhang, Yan Ming ; Huang, Kaizhu ; Geng, Guanggang et al. / Fast kNN graph construction with locality sensitive hashing. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 2. ed. 2013. pp. 660-674 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).

@inproceedings{d72db1288b3f47cfba6cbe2297fae000,

title = "Fast kNN graph construction with locality sensitive hashing",

abstract = "The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graph-based learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n2), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d + logn)n) only (d is the dimensionality and l is usually a small number). This is much faster than most existing fast methods. Specifically, we engage the locality sensitive hashing technique to divide items into small subsets with equal size, and then build one kNN graph on each subset using the brute force method. To enhance the approximation quality, we repeat this procedure for several times to generate multiple basic approximate graphs, and combine them to yield a high quality graph. Compared with existing methods, the proposed approach has features that are: (1) much more efficient in speed (2) applicable to generic similarity measures; (3) easy to parallelize. Finally, on three benchmark large-scale data sets, our method beats existing fast methods with obvious advantages.",

keywords = "graph construction, graph-based machine learning, locality sensitive hashing",

author = "Zhang, {Yan Ming} and Kaizhu Huang and Guanggang Geng and Liu, {Cheng Lin}",

year = "2013",

doi = "10.1007/978-3-642-40991-2_42",

language = "English",

isbn = "9783642409905",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

number = "PART 2",

pages = "660--674",

booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings",

edition = "PART 2",

note = "European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 ; Conference date: 23-09-2013 Through 27-09-2013",

}

Zhang, YM, Huang, K, Geng, G & Liu, CL 2013, Fast kNN graph construction with locality sensitive hashing. in Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 2 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 8189 LNAI, pp. 660-674, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013, Prague, Czech Republic, 23/09/13. https://doi.org/10.1007/978-3-642-40991-2_42

Fast kNN graph construction with locality sensitive hashing. / Zhang, Yan Ming; Huang, Kaizhu; Geng, Guanggang et al.
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 2. ed. 2013. p. 660-674 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8189 LNAI, No. PART 2).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Fast kNN graph construction with locality sensitive hashing

AU - Zhang, Yan Ming

AU - Huang, Kaizhu

AU - Geng, Guanggang

AU - Liu, Cheng Lin

PY - 2013

Y1 - 2013

N2 - The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graph-based learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n2), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d + logn)n) only (d is the dimensionality and l is usually a small number). This is much faster than most existing fast methods. Specifically, we engage the locality sensitive hashing technique to divide items into small subsets with equal size, and then build one kNN graph on each subset using the brute force method. To enhance the approximation quality, we repeat this procedure for several times to generate multiple basic approximate graphs, and combine them to yield a high quality graph. Compared with existing methods, the proposed approach has features that are: (1) much more efficient in speed (2) applicable to generic similarity measures; (3) easy to parallelize. Finally, on three benchmark large-scale data sets, our method beats existing fast methods with obvious advantages.

AB - The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graph-based learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n2), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d + logn)n) only (d is the dimensionality and l is usually a small number). This is much faster than most existing fast methods. Specifically, we engage the locality sensitive hashing technique to divide items into small subsets with equal size, and then build one kNN graph on each subset using the brute force method. To enhance the approximation quality, we repeat this procedure for several times to generate multiple basic approximate graphs, and combine them to yield a high quality graph. Compared with existing methods, the proposed approach has features that are: (1) much more efficient in speed (2) applicable to generic similarity measures; (3) easy to parallelize. Finally, on three benchmark large-scale data sets, our method beats existing fast methods with obvious advantages.

KW - graph construction

KW - graph-based machine learning

KW - locality sensitive hashing

UR - http://www.scopus.com/inward/record.url?scp=84886519795&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-40991-2_42

DO - 10.1007/978-3-642-40991-2_42

M3 - Conference Proceeding

AN - SCOPUS:84886519795

SN - 9783642409905

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 660

EP - 674

BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings

T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013

Y2 - 23 September 2013 through 27 September 2013

ER -

Zhang YM, Huang K, Geng G, Liu CL. Fast kNN graph construction with locality sensitive hashing. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 2 ed. 2013. p. 660-674. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). doi: 10.1007/978-3-642-40991-2_42

Fast kNN graph construction with locality sensitive hashing

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this