.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation .. .. SPDX-License-Identifier: CC-BY-4.0 .. highlight:: cpp .. default-domain:: cpp .. _alg_knn: ========================================= k-Nearest Neighbors Classification (k-NN) ========================================= :math:`k`-NN :capterm:`classification` algorithm infers the class for the new feature vector by computing majority vote of the :math:`k` nearest observations from the training set. .. |t_math| replace:: `Training `_ .. |t_brute_f| replace:: `Brute-force `_ .. |t_kd_tree| replace:: `k-d tree `_ .. |t_input| replace:: `train_input `_ .. |t_result| replace:: `train_result `_ .. |t_op| replace:: `train(...) `_ .. |i_math| replace:: `Inference `_ .. |i_brute_f| replace:: `Brute-force `_ .. |i_kd_tree| replace:: `k-d tree `_ .. |i_input| replace:: `infer_input `_ .. |i_result| replace:: `infer_result `_ .. |i_op| replace:: `infer(...) `_ =============== ============= ============= ======== =========== ============ **Operation** **Computational methods** **Programming Interface** --------------- --------------------------- --------------------------------- |t_math| |t_brute_f| |t_kd_tree| |t_op| |t_input| |t_result| |i_math| |i_brute_f| |i_kd_tree| |i_op| |i_input| |i_result| =============== ============= ============= ======== =========== ============ ------------------------ Mathematical formulation ------------------------ .. _knn_t_math: Training -------- Let :math:`X = \{ x_1, \ldots, x_n \}` be the training set of :math:`p`-dimensional feature vectors, let :math:`Y = \{ y_1, \ldots, y_n \}` be the set of class labels, where :math:`y_i \in \{ 0, \ldots, c-1 \}`, :math:`1 \leq i \leq n`. Given :math:`X`, :math:`Y` and the number of nearest neighbors :math:`k`, the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage. .. _knn_t_math_brute_force: Training method: *brute-force* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The training operation produces the model that stores all the feature vectors from the initial training set :math:`X`. .. _knn_t_math_kd_tree: Training method: *k-d tree* ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The training operation builds a :math:`k`-:math:`d` tree that partitions the training set :math:`X` (for more details, see :txtref:`k-d Tree `). .. _knn_i_math: Inference --------- Let :math:`X' = \{ x_1', \ldots, x_m' \}` be the inference set of :math:`p`-dimensional feature vectors. Given :math:`X'`, the model produced at the training stage and the number of nearest neighbors :math:`k`, the problem is to predict the label :math:`y_j'` for each :math:`x_j'`, :math:`1 \leq j \leq m`, by performing the following steps: #. Identify the set :math:`N(x_j') \subseteq X` of the :math:`k` feature vectors in the training set that are nearest to :math:`x_j'` with respect to the Euclidean distance. #. Estimate the conditional probability for the :math:`l`-th class as the fraction of vectors in :math:`N(x_j')` whose labels :math:`y_j` are equal to :math:`l`: .. math:: :label: p_predict P_{jl} = \frac{1}{| N(x_j') |} \Big| \big\{ x_r \in N(x_j') : y_r = l \big\} \Big|, \quad 1 \leq j \leq m, \; 0 \leq l < c. #. Predict the class that has the highest probability for the feature vector :math:`x_j'`: .. math:: :label: y_predict y_j' = \mathrm{arg}\max_{0 \leq l < c} P_{jl}, \quad 1 \leq j \leq m. .. _knn_i_math_brute_force: Inference method: *brute-force* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Brute-force inference method determines the set :math:`N(x_j')` of the nearest feature vectors by iterating over all the pairs :math:`(x_j', x_i)` in the implementation defined order, :math:`1 \leq i \leq n`, :math:`1 \leq j \leq m`. The final prediction is computed according to the equations :eq:`p_predict` and :eq:`y_predict`. .. _knn_i_math_kd_tree: Inference method: *k-d tree* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ K-d tree inference method traverses the :math:`k`-:math:`d` tree to find feature vectors associated with a leaf node that are closest to :math:`x_j'`, :math:`1 \leq j \leq m`. The set :math:`\tilde{n}(x_j')` of the currently-known nearest :math:`k`-th neighbors is progressively updated during tree traversal. The search algorithm limits exploration of the nodes for which the distance between the :math:`x_j'` and respective part of the feature space is not less than the distance between :math:`x_j'` and the most distant feature vector from :math:`\tilde{n}(x_j')`. Once tree traversal is finished, :math:`\tilde{n}(x_j') \equiv N(x_j')`. The final prediction is computed according to the equations :eq:`p_predict` and :eq:`y_predict`. ------------- Usage example ------------- Training -------- .. onedal_code:: oneapi::dal::knn::example::run_training Inference --------- .. onedal_code:: oneapi::dal::knn::example::run_inference --------------------- Programming Interface --------------------- All types and functions in this section shall be declared in the ``oneapi::dal::knn`` namespace and be available via inclusion of the ``oneapi/dal/algo/knn.hpp`` header file. Descriptor ---------- .. onedal_class:: oneapi::dal::knn::descriptor Method tags ~~~~~~~~~~~ .. onedal_tags_namespace:: oneapi::dal::knn::method Task tags ~~~~~~~~~ .. onedal_tags_namespace:: oneapi::dal::knn::task Model ----- .. onedal_class:: oneapi::dal::knn::model .. _knn_t_api: Training :expr:`train(...)` -------------------------------- .. _knn_t_api_input: Input ~~~~~ .. onedal_class:: oneapi::dal::knn::train_input .. _knn_t_api_result: Result ~~~~~~ .. onedal_class:: oneapi::dal::knn::train_result Operation ~~~~~~~~~ .. onedal_func:: oneapi::dal::knn::train .. _knn_i_api: Inference :expr:`infer(...)` ---------------------------- .. _knn_i_api_input: Input ~~~~~ .. onedal_class:: oneapi::dal::knn::infer_input .. _knn_i_api_result: Result ~~~~~~ .. onedal_class:: oneapi::dal::knn::infer_result Operation ~~~~~~~~~ .. onedal_func:: oneapi::dal::knn::infer