Benchmarks for neighbors¶

KNeighborsClassifier-brute-minimadelon¶

Benchmark setup

from sklearn.neighbors import KNeighborsClassifier
from deps import load_data

kwargs = {'n_neighbors': 5, 'algorithm': 'brute'}
X, y, X_t, y_t = load_data('minimadelon')
obj = KNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/KNeighborsClassifier-brute-minimadelon-step0-timing.png

Memory usage

_images/KNeighborsClassifier-brute-minimadelon-step0-memory.png

Additional output

cProfile

         63 function calls in 0.001 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.001    0.001 <f>:1(<module>)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     4    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     7    0.000    0.000    0.000    0.000 {isinstance}
     6    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000515 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          121    121.0     23.5          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.6          self._y = y
   573         1          239    239.0     46.4          self._classes = np.sort(np.unique(y))
   574         1          152    152.0     29.5          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129                                                   X = atleast2d_or_csr(X)
   130
   131                                                   neigh_dist, neigh_ind = self.kneighbors(X)
   132                                                   pred_labels = self._y[neigh_ind]
   133
   134                                                   weights = _get_weights(neigh_dist, self.weights)
   135
   136                                                   if weights is None:
   137                                                       mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/KNeighborsClassifier-brute-minimadelon-step1-timing.png

Memory usage

_images/KNeighborsClassifier-brute-minimadelon-step1-memory.png

Additional output

cProfile

         188 function calls in 0.003 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.003    0.003 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.003    0.003 <f>:1(<module>)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/neighbors/classification.py:116(predict)
     1    0.000    0.000    0.002    0.002 /tmp/vb_sklearn/sklearn/neighbors/base.py:156(kneighbors)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:404(pairwise_distances)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:101(euclidean_distances)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/utils/extmath.py:70(safe_sparse_dot)
    10    0.001    0.000    0.001    0.000 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core._dotblas.dot}
     4    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:52(check_pairwise_arrays)
     6    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     4    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.where}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     1    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'argsort' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
    28    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
    14    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
    18    0.000    0.000    0.000    0.000 {isinstance}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1508(any)
     2    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}
     1    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
    14    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000515 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          121    121.0     23.5          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.6          self._y = y
   573         1          239    239.0     46.4          self._classes = np.sort(np.unique(y))
   574         1          152    152.0     29.5          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 0.002751 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129         1          146    146.0      5.3          X = atleast2d_or_csr(X)
   130
   131         1         1697   1697.0     61.7          neigh_dist, neigh_ind = self.kneighbors(X)
   132         1           78     78.0      2.8          pred_labels = self._y[neigh_ind]
   133
   134         1           22     22.0      0.8          weights = _get_weights(neigh_dist, self.weights)
   135
   136         1            3      3.0      0.1          if weights is None:
   137         1          782    782.0     28.4              mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141         1           23     23.0      0.8          return mode.flatten().astype(np.int)

KNeighborsClassifier-brute-madelon¶

Benchmark setup

from sklearn.neighbors import KNeighborsClassifier
from deps import load_data

kwargs = {'n_neighbors': 5, 'algorithm': 'brute'}
X, y, X_t, y_t = load_data('madelon')
obj = KNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/KNeighborsClassifier-brute-madelon-step0-timing.png

Memory usage

_images/KNeighborsClassifier-brute-madelon-step0-memory.png

Additional output

cProfile

         63 function calls in 0.003 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.003    0.003 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.003    0.003 <f>:1(<module>)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.003    0.003    0.003    0.003 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     4    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     7    0.000    0.000    0.000    0.000 {isinstance}
     6    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {len}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.002887 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          115    115.0      4.0          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            2      2.0      0.1          self._y = y
   573         1          185    185.0      6.4          self._classes = np.sort(np.unique(y))
   574         1         2585   2585.0     89.5          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129                                                   X = atleast2d_or_csr(X)
   130
   131                                                   neigh_dist, neigh_ind = self.kneighbors(X)
   132                                                   pred_labels = self._y[neigh_ind]
   133
   134                                                   weights = _get_weights(neigh_dist, self.weights)
   135
   136                                                   if weights is None:
   137                                                       mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/KNeighborsClassifier-brute-madelon-step1-timing.png

Memory usage

_images/KNeighborsClassifier-brute-madelon-step1-memory.png

Additional output

cProfile

         188 function calls in 0.634 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.634    0.634 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.634    0.634 <f>:1(<module>)
     1    0.000    0.000    0.634    0.634 /tmp/vb_sklearn/sklearn/neighbors/classification.py:116(predict)
     1    0.001    0.001    0.632    0.632 /tmp/vb_sklearn/sklearn/neighbors/base.py:156(kneighbors)
     1    0.000    0.000    0.447    0.447 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:404(pairwise_distances)
     1    0.025    0.025    0.447    0.447 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:101(euclidean_distances)
     1    0.000    0.000    0.411    0.411 /tmp/vb_sklearn/sklearn/utils/extmath.py:70(safe_sparse_dot)
     1    0.411    0.411    0.411    0.411 {numpy.core._dotblas.dot}
     1    0.182    0.182    0.182    0.182 {method 'argsort' of 'numpy.ndarray' objects}
    10    0.013    0.001    0.013    0.001 {method 'sum' of 'numpy.ndarray' objects}
     6    0.000    0.000    0.009    0.001 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.000    0.000    0.007    0.007 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:52(check_pairwise_arrays)
     4    0.000    0.000    0.005    0.001 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     4    0.000    0.000    0.004    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
     2    0.000    0.000    0.004    0.002 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.where}
     4    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     1    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
    28    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
    18    0.000    0.000    0.000    0.000 {isinstance}
    14    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1508(any)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}
     1    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
    14    0.000    0.000    0.000    0.000 {len}
     2    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.002887 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          115    115.0      4.0          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            2      2.0      0.1          self._y = y
   573         1          185    185.0      6.4          self._classes = np.sort(np.unique(y))
   574         1         2585   2585.0     89.5          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 0.619438 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129         1          836    836.0      0.1          X = atleast2d_or_csr(X)
   130
   131         1       617841 617841.0     99.7          neigh_dist, neigh_ind = self.kneighbors(X)
   132         1           54     54.0      0.0          pred_labels = self._y[neigh_ind]
   133
   134         1            9      9.0      0.0          weights = _get_weights(neigh_dist, self.weights)
   135
   136         1            1      1.0      0.0          if weights is None:
   137         1          685    685.0      0.1              mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141         1           12     12.0      0.0          return mode.flatten().astype(np.int)

KNeighborsClassifier-ball_tree-arcene¶

Benchmark setup

from sklearn.neighbors import KNeighborsClassifier
from deps import load_data

kwargs = {'n_neighbors': 5, 'algorithm': 'ball_tree'}
X, y, X_t, y_t = load_data('arcene')
obj = KNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/KNeighborsClassifier-ball_tree-arcene-step0-timing.png

Memory usage

_images/KNeighborsClassifier-ball_tree-arcene-step0-memory.png

Additional output

cProfile

         65 function calls in 0.028 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.028    0.028 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.028    0.028 <f>:1(<module>)
     1    0.000    0.000    0.028    0.028 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.025    0.025    0.027    0.027 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.003    0.003    0.003    0.003 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     5    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     6    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     7    0.000    0.000    0.000    0.000 {isinstance}
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.027086 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          127    127.0      0.5          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.0          self._y = y
   573         1          111    111.0      0.4          self._classes = np.sort(np.unique(y))
   574         1        26845  26845.0     99.1          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129                                                   X = atleast2d_or_csr(X)
   130
   131                                                   neigh_dist, neigh_ind = self.kneighbors(X)
   132                                                   pred_labels = self._y[neigh_ind]
   133
   134                                                   weights = _get_weights(neigh_dist, self.weights)
   135
   136                                                   if weights is None:
   137                                                       mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/KNeighborsClassifier-ball_tree-arcene-step1-timing.png

Memory usage

_images/KNeighborsClassifier-ball_tree-arcene-step1-memory.png

Additional output

cProfile

         100 function calls in 0.287 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.287    0.287 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.287    0.287 <f>:1(<module>)
     1    0.000    0.000    0.287    0.287 /tmp/vb_sklearn/sklearn/neighbors/classification.py:116(predict)
     1    0.000    0.000    0.283    0.283 /tmp/vb_sklearn/sklearn/neighbors/base.py:156(kneighbors)
     1    0.280    0.280    0.280    0.280 {method 'query' of 'sklearn.neighbors.ball_tree.BallTree' objects}
     2    0.000    0.000    0.006    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     4    0.005    0.001    0.005    0.001 {method 'sum' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.005    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.where}
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     7    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
    10    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     7    0.000    0.000    0.000    0.000 {isinstance}
     1    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
    10    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     2    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
     2    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     9    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
     3    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.027086 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          127    127.0      0.5          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.0          self._y = y
   573         1          111    111.0      0.4          self._classes = np.sort(np.unique(y))
   574         1        26845  26845.0     99.1          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 0.288072 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129         1         2844   2844.0      1.0          X = atleast2d_or_csr(X)
   130
   131         1       284552 284552.0     98.8          neigh_dist, neigh_ind = self.kneighbors(X)
   132         1           49     49.0      0.0          pred_labels = self._y[neigh_ind]
   133
   134         1           14     14.0      0.0          weights = _get_weights(neigh_dist, self.weights)
   135
   136         1            2      2.0      0.0          if weights is None:
   137         1          586    586.0      0.2              mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141         1           25     25.0      0.0          return mode.flatten().astype(np.int)

KNeighborsClassifier-ball_tree-madelon¶

Benchmark setup

from sklearn.neighbors import KNeighborsClassifier
from deps import load_data

kwargs = {'n_neighbors': 5, 'algorithm': 'ball_tree'}
X, y, X_t, y_t = load_data('madelon')
obj = KNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/KNeighborsClassifier-ball_tree-madelon-step0-timing.png

Memory usage

_images/KNeighborsClassifier-ball_tree-madelon-step0-memory.png

Additional output

cProfile

         65 function calls in 0.156 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.156    0.156 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.156    0.156 <f>:1(<module>)
     1    0.000    0.000    0.156    0.156 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.153    0.153    0.156    0.156 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.003    0.003    0.003    0.003 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     5    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     6    0.000    0.000    0.000    0.000 {hasattr}
     7    0.000    0.000    0.000    0.000 {isinstance}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.178033 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          126    126.0      0.1          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.0          self._y = y
   573         1          205    205.0      0.1          self._classes = np.sort(np.unique(y))
   574         1       177699 177699.0     99.8          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129                                                   X = atleast2d_or_csr(X)
   130
   131                                                   neigh_dist, neigh_ind = self.kneighbors(X)
   132                                                   pred_labels = self._y[neigh_ind]
   133
   134                                                   weights = _get_weights(neigh_dist, self.weights)
   135
   136                                                   if weights is None:
   137                                                       mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/KNeighborsClassifier-ball_tree-madelon-step1-timing.png

Memory usage

_images/KNeighborsClassifier-ball_tree-madelon-step1-memory.png

Additional output

cProfile

         100 function calls in 2.203 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    2.203    2.203 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    2.203    2.203 <f>:1(<module>)
     1    0.000    0.000    2.203    2.203 /tmp/vb_sklearn/sklearn/neighbors/classification.py:116(predict)
     1    0.000    0.000    2.201    2.201 /tmp/vb_sklearn/sklearn/neighbors/base.py:156(kneighbors)
     1    2.200    2.200    2.200    2.200 {method 'query' of 'sklearn.neighbors.ball_tree.BallTree' objects}
     4    0.002    0.001    0.002    0.001 {method 'sum' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.002    0.001 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     2    0.000    0.000    0.002    0.001 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.where}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     1    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     7    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
    10    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     2    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
    10    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     7    0.000    0.000    0.000    0.000 {isinstance}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
     2    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     9    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
     3    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.178033 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          126    126.0      0.1          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.0          self._y = y
   573         1          205    205.0      0.1          self._classes = np.sort(np.unique(y))
   574         1       177699 177699.0     99.8          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 116
Total time: 2.2864 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   116                                               def predict(self, X):
   117                                                   """Predict the class labels for the provided data
   118
   119                                                   Parameters
   120                                                   ----------
   121                                                   X: array
   122                                                       A 2-D array representing the test points.
   123
   124                                                   Returns
   125                                                   -------
   126                                                   labels: array
   127                                                       List of class labels (one for each data sample).
   128                                                   """
   129         1          981    981.0      0.0          X = atleast2d_or_csr(X)
   130
   131         1      2284084 2284084.0     99.9          neigh_dist, neigh_ind = self.kneighbors(X)
   132         1           89     89.0      0.0          pred_labels = self._y[neigh_ind]
   133
   134         1           14     14.0      0.0          weights = _get_weights(neigh_dist, self.weights)
   135
   136         1            2      2.0      0.0          if weights is None:
   137         1         1199   1199.0      0.1              mode, _ = stats.mode(pred_labels, axis=1)
   138                                                   else:
   139                                                       mode, _ = weighted_mode(pred_labels, weights, axis=1)
   140
   141         1           27     27.0      0.0          return mode.flatten().astype(np.int)

RadiusNeighborsClassifier-brute-minimadelon¶

Benchmark setup

from sklearn.neighbors import RadiusNeighborsClassifier
from deps import load_data

kwargs = {'radius': 500.0, 'algorithm': 'brute'}
X, y, X_t, y_t = load_data('minimadelon')
obj = RadiusNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/RadiusNeighborsClassifier-brute-minimadelon-step0-timing.png

Memory usage

_images/RadiusNeighborsClassifier-brute-minimadelon-step0-memory.png

Additional output

cProfile

         63 function calls in 0.001 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.001    0.001 <f>:1(<module>)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     4    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     6    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     7    0.000    0.000    0.000    0.000 {isinstance}
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {len}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000465 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          136    136.0     29.2          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.6          self._y = y
   573         1          163    163.0     35.1          self._classes = np.sort(np.unique(y))
   574         1          163    163.0     35.1          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289                                                   X = atleast2d_or_csr(X)
   290
   291                                                   neigh_dist, neigh_ind = self.radius_neighbors(X)
   292                                                   pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294                                                   if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303                                                       for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305                                                           if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312                                                   weights = _get_weights(neigh_dist, self.weights)
   313
   314                                                   if weights is None:
   315                                                       mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316                                                                         dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/RadiusNeighborsClassifier-brute-minimadelon-step1-timing.png

Memory usage

_images/RadiusNeighborsClassifier-brute-minimadelon-step1-memory.png

Additional output

cProfile

         798 function calls in 0.011 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.011    0.011 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.011    0.011 <f>:1(<module>)
     1    0.001    0.001    0.011    0.011 /tmp/vb_sklearn/sklearn/neighbors/classification.py:276(predict)
    20    0.002    0.000    0.007    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
     1    0.001    0.001    0.003    0.003 /tmp/vb_sklearn/sklearn/neighbors/base.py:334(radius_neighbors)
    20    0.001    0.000    0.002    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.002    0.002 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:404(pairwise_distances)
     1    0.000    0.000    0.002    0.002 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:101(euclidean_distances)
    48    0.001    0.000    0.001    0.000 {method 'sum' of 'numpy.ndarray' objects}
    42    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
    60    0.001    0.000    0.001    0.000 {numpy.core.multiarray.where}
    88    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
    93    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
    20    0.001    0.000    0.001    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/utils/extmath.py:70(safe_sparse_dot)
    40    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
     1    0.001    0.001    0.001    0.001 {numpy.core._dotblas.dot}
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:52(check_pairwise_arrays)
     4    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     6    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
    20    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
    20    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
    20    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
     4    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
    40    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
    56    0.000    0.000    0.000    0.000 {isinstance}
    21    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
    40    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
    20    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
    28    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
    34    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
     4    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000465 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          136    136.0     29.2          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.6          self._y = y
   573         1          163    163.0     35.1          self._classes = np.sort(np.unique(y))
   574         1          163    163.0     35.1          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0.01136 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289         1          165    165.0      1.5          X = atleast2d_or_csr(X)
   290
   291         1         2322   2322.0     20.4          neigh_dist, neigh_ind = self.radius_neighbors(X)
   292        21          613     29.2      5.4          pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294         1            3      3.0      0.0          if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303        21           94      4.5      0.8              for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305        20           58      2.9      0.5                  if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312         1           11     11.0      0.1          weights = _get_weights(neigh_dist, self.weights)
   313
   314         1            2      2.0      0.0          if weights is None:
   315        21         7917    377.0     69.7              mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316         1          148    148.0      1.3                                dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322         1           27     27.0      0.2          return mode.flatten().astype(np.int)

RadiusNeighborsClassifier-brute-blobs¶

Benchmark setup

from sklearn.neighbors import RadiusNeighborsClassifier
from deps import load_data

kwargs = {'radius': 500.0, 'algorithm': 'brute'}
X, y, X_t, y_t = load_data('blobs')
obj = RadiusNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/RadiusNeighborsClassifier-brute-blobs-step0-timing.png

Memory usage

_images/RadiusNeighborsClassifier-brute-blobs-step0-memory.png

Additional output

cProfile

         63 function calls in 0.001 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.001    0.001 <f>:1(<module>)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     6    0.000    0.000    0.000    0.000 {hasattr}
     7    0.000    0.000    0.000    0.000 {isinstance}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {len}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000423 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          127    127.0     30.0          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.7          self._y = y
   573         1          141    141.0     33.3          self._classes = np.sort(np.unique(y))
   574         1          152    152.0     35.9          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289                                                   X = atleast2d_or_csr(X)
   290
   291                                                   neigh_dist, neigh_ind = self.radius_neighbors(X)
   292                                                   pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294                                                   if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303                                                       for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305                                                           if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312                                                   weights = _get_weights(neigh_dist, self.weights)
   313
   314                                                   if weights is None:
   315                                                       mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316                                                                         dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/RadiusNeighborsClassifier-brute-blobs-step1-timing.png

Memory usage

_images/RadiusNeighborsClassifier-brute-blobs-step1-memory.png

Additional output

cProfile

         19358 function calls in 0.239 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.239    0.239 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.239    0.239 <f>:1(<module>)
     1    0.003    0.003    0.239    0.239 /tmp/vb_sklearn/sklearn/neighbors/classification.py:276(predict)
   200    0.069    0.000    0.194    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
  2200    0.042    0.000    0.042    0.000 {numpy.core.multiarray.where}
     1    0.008    0.008    0.041    0.041 /tmp/vb_sklearn/sklearn/neighbors/base.py:334(radius_neighbors)
  2002    0.006    0.000    0.039    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
  2413    0.036    0.000    0.036    0.000 {numpy.core.multiarray.array}
  2008    0.030    0.000    0.030    0.000 {method 'sum' of 'numpy.ndarray' objects}
  2408    0.005    0.000    0.028    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
  2000    0.009    0.000    0.026    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
   200    0.005    0.000    0.016    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
   200    0.006    0.000    0.006    0.000 {numpy.core.multiarray.concatenate}
  2000    0.005    0.000    0.005    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.004    0.004 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:404(pairwise_distances)
     1    0.001    0.001    0.004    0.004 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:101(euclidean_distances)
  2016    0.004    0.000    0.004    0.000 {isinstance}
   200    0.004    0.000    0.004    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.003    0.003 /tmp/vb_sklearn/sklearn/utils/extmath.py:70(safe_sparse_dot)
     1    0.003    0.003    0.003    0.003 {numpy.core._dotblas.dot}
   200    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
   200    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
   201    0.001    0.000    0.001    0.000 {method 'flatten' of 'numpy.ndarray' objects}
   400    0.001    0.000    0.001    0.000 {numpy.core.multiarray.zeros}
     4    0.000    0.000    0.001    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     6    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:52(check_pairwise_arrays)
   200    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    14    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     4    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
   214    0.000    0.000    0.000    0.000 {len}
    28    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000423 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          127    127.0     30.0          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            3      3.0      0.7          self._y = y
   573         1          141    141.0     33.3          self._classes = np.sort(np.unique(y))
   574         1          152    152.0     35.9          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0.274331 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289         1          248    248.0      0.1          X = atleast2d_or_csr(X)
   290
   291         1        37585  37585.0     13.7          neigh_dist, neigh_ind = self.radius_neighbors(X)
   292       201         2496     12.4      0.9          pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294         1            3      3.0      0.0          if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303       201          515      2.6      0.2              for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305       200          588      2.9      0.2                  if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312         1           11     11.0      0.0          weights = _get_weights(neigh_dist, self.weights)
   313
   314         1            3      3.0      0.0          if weights is None:
   315       201       232612   1157.3     84.8              mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316         1          222    222.0      0.1                                dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322         1           48     48.0      0.0          return mode.flatten().astype(np.int)

RadiusNeighborsClassifier-ball_tree-minimadelon¶

Benchmark setup

from sklearn.neighbors import RadiusNeighborsClassifier
from deps import load_data

kwargs = {'radius': 500.0, 'algorithm': 'ball_tree'}
X, y, X_t, y_t = load_data('minimadelon')
obj = RadiusNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/RadiusNeighborsClassifier-ball_tree-minimadelon-step0-timing.png

Memory usage

_images/RadiusNeighborsClassifier-ball_tree-minimadelon-step0-memory.png

Additional output

cProfile

         65 function calls in 0.001 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.001    0.001 <f>:1(<module>)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     5    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     7    0.000    0.000    0.000    0.000 {isinstance}
     6    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {len}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000555 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          122    122.0     22.0          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            4      4.0      0.7          self._y = y
   573         1          135    135.0     24.3          self._classes = np.sort(np.unique(y))
   574         1          294    294.0     53.0          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289                                                   X = atleast2d_or_csr(X)
   290
   291                                                   neigh_dist, neigh_ind = self.radius_neighbors(X)
   292                                                   pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294                                                   if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303                                                       for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305                                                           if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312                                                   weights = _get_weights(neigh_dist, self.weights)
   313
   314                                                   if weights is None:
   315                                                       mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316                                                                         dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/RadiusNeighborsClassifier-ball_tree-minimadelon-step1-timing.png

Memory usage

_images/RadiusNeighborsClassifier-ball_tree-minimadelon-step1-memory.png

Additional output

cProfile

         704 function calls in 0.010 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.010    0.010 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.010    0.010 <f>:1(<module>)
     1    0.001    0.001    0.010    0.010 /tmp/vb_sklearn/sklearn/neighbors/classification.py:276(predict)
    20    0.002    0.000    0.007    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
    20    0.001    0.000    0.002    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.002    0.002 /tmp/vb_sklearn/sklearn/neighbors/base.py:334(radius_neighbors)
     1    0.001    0.001    0.002    0.002 {method 'query_radius' of 'sklearn.neighbors.ball_tree.BallTree' objects}
    40    0.001    0.000    0.001    0.000 {numpy.core.multiarray.where}
    40    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
    42    0.001    0.000    0.001    0.000 {method 'sum' of 'numpy.ndarray' objects}
    20    0.001    0.000    0.001    0.000 {numpy.core.multiarray.concatenate}
    40    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
    85    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
    89    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
    20    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
    20    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
    20    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
    21    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
    41    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
    45    0.000    0.000    0.000    0.000 {isinstance}
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
    40    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
    20    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:6(atleast_1d)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
    10    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
    31    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
     1    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000555 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1          122    122.0     22.0          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            4      4.0      0.7          self._y = y
   573         1          135    135.0     24.3          self._classes = np.sort(np.unique(y))
   574         1          294    294.0     53.0          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0.01045 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289         1          142    142.0      1.4          X = atleast2d_or_csr(X)
   290
   291         1         1426   1426.0     13.6          neigh_dist, neigh_ind = self.radius_neighbors(X)
   292        21          630     30.0      6.0          pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294         1            3      3.0      0.0          if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303        21           57      2.7      0.5              for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305        20           57      2.9      0.5                  if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312         1            9      9.0      0.1          weights = _get_weights(neigh_dist, self.weights)
   313
   314         1            2      2.0      0.0          if weights is None:
   315        21         7971    379.6     76.3              mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316         1          133    133.0      1.3                                dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322         1           20     20.0      0.2          return mode.flatten().astype(np.int)

RadiusNeighborsClassifier-ball_tree-blobs¶

Benchmark setup

from sklearn.neighbors import RadiusNeighborsClassifier
from deps import load_data

kwargs = {'radius': 500.0, 'algorithm': 'ball_tree'}
X, y, X_t, y_t = load_data('blobs')
obj = RadiusNeighborsClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/RadiusNeighborsClassifier-ball_tree-blobs-step0-timing.png

Memory usage

_images/RadiusNeighborsClassifier-ball_tree-blobs-step0-memory.png

Additional output

cProfile

         65 function calls in 0.001 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.001    0.001 <f>:1(<module>)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/neighbors/base.py:559(fit)
     1    0.001    0.001    0.001    0.001 /tmp/vb_sklearn/sklearn/neighbors/base.py:96(_fit)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:490(sort)
     2    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     5    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     7    0.000    0.000    0.000    0.000 {isinstance}
     6    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {len}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000765 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1           71     71.0      9.3          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            2      2.0      0.3          self._y = y
   573         1           75     75.0      9.8          self._classes = np.sort(np.unique(y))
   574         1          617    617.0     80.7          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289                                                   X = atleast2d_or_csr(X)
   290
   291                                                   neigh_dist, neigh_ind = self.radius_neighbors(X)
   292                                                   pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294                                                   if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303                                                       for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305                                                           if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312                                                   weights = _get_weights(neigh_dist, self.weights)
   313
   314                                                   if weights is None:
   315                                                       mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316                                                                         dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322                                                   return mode.flatten().astype(np.int)

Benchmark statement

obj.predict(X_t)

Execution time

_images/RadiusNeighborsClassifier-ball_tree-blobs-step1-timing.png

Memory usage

_images/RadiusNeighborsClassifier-ball_tree-blobs-step1-memory.png

Additional output

cProfile

         19084 function calls in 0.125 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.125    0.125 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.125    0.125 <f>:1(<module>)
     1    0.002    0.002    0.125    0.125 /tmp/vb_sklearn/sklearn/neighbors/classification.py:276(predict)
   200    0.039    0.000    0.108    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/stats.py:586(mode)
  2000    0.003    0.000    0.022    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
  2000    0.022    0.000    0.022    0.000 {numpy.core.multiarray.where}
  2002    0.017    0.000    0.017    0.000 {method 'sum' of 'numpy.ndarray' objects}
  2000    0.005    0.000    0.015    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/shape_base.py:194(expand_dims)
     1    0.000    0.000    0.014    0.014 /tmp/vb_sklearn/sklearn/neighbors/base.py:334(radius_neighbors)
     1    0.014    0.014    0.014    0.014 {method 'query_radius' of 'sklearn.neighbors.ball_tree.BallTree' objects}
   200    0.003    0.000    0.008    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
  2405    0.003    0.000    0.008    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
  2409    0.005    0.000    0.005    0.000 {numpy.core.multiarray.array}
   200    0.003    0.000    0.003    0.000 {numpy.core.multiarray.concatenate}
  2001    0.003    0.000    0.003    0.000 {method 'reshape' of 'numpy.ndarray' objects}
  2005    0.002    0.000    0.002    0.000 {isinstance}
   200    0.001    0.000    0.001    0.000 {method 'sort' of 'numpy.ndarray' objects}
   200    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
   200    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/stats/_support.py:212(_chk_asarray)
   201    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
   400    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
   200    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
   211    0.000    0.000    0.000    0.000 {len}
     4    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:6(atleast_1d)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
    10    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
     1    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
     4    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/neighbors/base.py:44(_get_weights)
     1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/base.py
Function: fit at line 559
Total time: 0.000765 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   559                                               def fit(self, X, y):
   560                                                   """Fit the model using X as training data and y as target values
   561
   562                                                   Parameters
   563                                                   ----------
   564                                                   X : {array-like, sparse matrix, BallTree, cKDTree}
   565                                                       Training data. If array or matrix, then the shape
   566                                                       is [n_samples, n_features]
   567
   568                                                   y : {array-like, sparse matrix}, shape = [n_samples]
   569                                                       Target values, array of integer values.
   570                                                   """
   571         1           71     71.0      9.3          X, y = check_arrays(X, y, sparse_format="csr")
   572         1            2      2.0      0.3          self._y = y
   573         1           75     75.0      9.8          self._classes = np.sort(np.unique(y))
   574         1          617    617.0     80.7          return self._fit(X)

File: /tmp/vb_sklearn/sklearn/neighbors/classification.py
Function: predict at line 276
Total time: 0.141574 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   276                                               def predict(self, X):
   277                                                   """Predict the class labels for the provided data
   278
   279                                                   Parameters
   280                                                   ----------
   281                                                   X: array
   282                                                       A 2-D array representing the test points.
   283
   284                                                   Returns
   285                                                   -------
   286                                                   labels: array
   287                                                       List of class labels (one for each data sample).
   288                                                   """
   289         1           83     83.0      0.1          X = atleast2d_or_csr(X)
   290
   291         1        13462  13462.0      9.5          neigh_dist, neigh_ind = self.radius_neighbors(X)
   292       201         1471      7.3      1.0          pred_labels = [self._y[ind] for ind in neigh_ind]
   293
   294         1            2      2.0      0.0          if self.outlier_label:
   295                                                       outlier_label = np.array((self.outlier_label, ))
   296                                                       small_value = np.array((1e-6, ))
   297                                                       for i, pl in enumerate(pred_labels):
   298                                                           # Check that all have at least 1 neighbor
   299                                                           if len(pl) < 1:
   300                                                               pred_labels[i] = outlier_label
   301                                                               neigh_dist[i] = small_value
   302                                                   else:
   303       201          289      1.4      0.2              for pl in pred_labels:
   304                                                           # Check that all have at least 1 neighbor
   305       200          315      1.6      0.2                  if len(pl) < 1:
   306                                                               raise ValueError('no neighbors found for a test sample, '
   307                                                                                'you can try using larger radius, '
   308                                                                                'give a label for outliers, '
   309                                                                                'or consider removing them in your '
   310                                                                                'dataset')
   311
   312         1            6      6.0      0.0          weights = _get_weights(neigh_dist, self.weights)
   313
   314         1            2      2.0      0.0          if weights is None:
   315       201       125799    625.9     88.9              mode = np.asarray([stats.mode(pl)[0] for pl in pred_labels],
   316         1          121    121.0      0.1                                dtype=np.int)
   317                                                   else:
   318                                                       mode = np.asarray([weighted_mode(pl, w)[0]
   319                                                                          for (pl, w) in zip(pred_labels, weights)],
   320                                                                         dtype=np.int)
   321
   322         1           24     24.0      0.0          return mode.flatten().astype(np.int)

NearestCentroid-madelon¶

Benchmark setup

from sklearn.neighbors import NearestCentroid
from deps import load_data

kwargs = {}
X, y, X_t, y_t = load_data('madelon')
obj = NearestCentroid(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/NearestCentroid-madelon-step0-timing.png

Memory usage

_images/NearestCentroid-madelon-step0-memory.png

Additional output

cProfile

         57 function calls in 0.057 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.057    0.057 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.057    0.057 <f>:1(<module>)
     1    0.037    0.037    0.057    0.057 /tmp/vb_sklearn/sklearn/neighbors/nearest_centroid.py:74(fit)
     2    0.020    0.010    0.020    0.010 {method 'mean' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
    10    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     5    0.000    0.000    0.000    0.000 {isinstance}
     6    0.000    0.000    0.000    0.000 {hasattr}
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/nearest_centroid.py
Function: fit at line 74
Total time: 0.05637 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    74                                               def fit(self, X, y):
    75                                                   """
    76                                                   Fit the NearestCentroid model according to the given training data.
    77
    78                                                   Parameters
    79                                                   ----------
    80                                                   X : {array-like, sparse matrix}, shape = [n_samples, n_features]
    81                                                       Training vector, where n_samples in the number of samples and
    82                                                       n_features is the number of features.
    83                                                       Note that centroid shrinking cannot be used with sparse matrices.
    84                                                   y : array, shape = [n_samples]
    85                                                       Target values (integers)
    86                                                   """
    87         1           88     88.0      0.2          X, y = check_arrays(X, y, sparse_format="csr")
    88         1           12     12.0      0.0          if sp.issparse(X) and self.shrink_threshold:
    89                                                       raise ValueError("threshold shrinking not supported"
    90                                                                        " for sparse input")
    91
    92         1            2      2.0      0.0          n_samples, n_features = X.shape
    93         1          129    129.0      0.2          classes = np.unique(y)
    94         1            3      3.0      0.0          self.classes_ = classes
    95         1            2      2.0      0.0          n_classes = classes.size
    96         1            2      2.0      0.0          if n_classes < 2:
    97                                                       raise ValueError('y has less than 2 classes')
    98
    99                                                   # Mask mapping each class to it's members.
   100         1            7      7.0      0.0          self.centroids_ = np.empty((n_classes, n_features), dtype=np.float64)
   101         3           23      7.7      0.0          for i, cur_class in enumerate(classes):
   102         2           31     15.5      0.1              center_mask = y == cur_class
   103         2           44     22.0      0.1              if sp.issparse(X):
   104                                                           center_mask = np.where(center_mask)[0]
   105         2        56023  28011.5     99.4              self.centroids_[i] = X[center_mask].mean(axis=0)
   106
   107         1            2      2.0      0.0          if self.shrink_threshold:
   108                                                       dataset_centroid_ = np.array(X.mean(axis=0))[0]
   109                                                       # Number of clusters in each class.
   110                                                       nk = np.array([np.sum(classes == cur_class)
   111                                                                      for cur_class in classes])
   112                                                       # m parameter for determining deviation
   113                                                       m = np.sqrt((1. / nk) + (1. / n_samples))
   114                                                       # Calculate deviation using the standard deviation of centroids.
   115                                                       variance = np.array(np.power(X - self.centroids_[y], 2))
   116                                                       variance = variance.sum(axis=0)
   117                                                       s = np.sqrt(variance / (n_samples - n_classes))
   118                                                       s += np.median(s)  # To deter outliers from affecting the results.
   119                                                       mm = m.reshape(len(m), 1)  # Reshape to allow broadcasting.
   120                                                       ms = mm * s
   121                                                       deviation = ((self.centroids_ - dataset_centroid_) / ms)
   122                                                       # Soft thresholding: if the deviation crosses 0 during shrinking,
   123                                                       # it becomes zero.
   124                                                       signs = np.sign(deviation)
   125                                                       deviation = (np.abs(deviation) - self.shrink_threshold)
   126                                                       deviation[deviation < 0] = 0
   127                                                       deviation = np.multiply(deviation, signs)
   128                                                       # Now adjust the centroids using the deviation
   129                                                       msd = np.multiply(ms, deviation)
   130                                                       self.centroids_ = np.array([dataset_centroid_ + msd[i]
   131                                                                                   for i in xrange(n_classes)])
   132         1            2      2.0      0.0          return self

File: /tmp/vb_sklearn/sklearn/neighbors/nearest_centroid.py
Function: predict at line 134
Total time: 0 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   134                                               def predict(self, X):
   135                                                   """Perform classification on an array of test vectors X.
   136
   137                                                   The predicted class C for each sample in X is returned.
   138
   139                                                   Parameters
   140                                                   ----------
   141                                                   X : array-like, shape = [n_samples, n_features]
   142
   143                                                   Returns
   144                                                   -------
   145                                                   C : array, shape = [n_samples]
   146
   147                                                   Notes
   148                                                   -----
   149                                                   If the metric constructor parameter is "precomputed", X is assumed to
   150                                                   be the distance matrix between the data to be predicted and
   151                                                   ``self.centroids_``.
   152                                                   """
   153                                                   X = atleast2d_or_csr(X)
   154                                                   if not hasattr(self, "centroids_"):
   155                                                       raise AttributeError("Model has not been trained yet.")
   156                                                   return self.classes_[pairwise_distances(
   157                                                       X, self.centroids_, metric=self.metric).argmin(axis=1)]

Benchmark statement

obj.predict(X_t)

Execution time

_images/NearestCentroid-madelon-step1-timing.png

Memory usage

_images/NearestCentroid-madelon-step1-memory.png

Additional output

cProfile

         128 function calls in 0.004 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.004    0.004 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    0.004    0.004 <f>:1(<module>)
     1    0.000    0.000    0.004    0.004 /tmp/vb_sklearn/sklearn/neighbors/nearest_centroid.py:134(predict)
     1    0.000    0.000    0.004    0.004 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:404(pairwise_distances)
     1    0.001    0.001    0.004    0.004 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:101(euclidean_distances)
     7    0.002    0.000    0.002    0.000 {method 'sum' of 'numpy.ndarray' objects}
     5    0.000    0.000    0.002    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:10(assert_all_finite)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/utils/extmath.py:70(safe_sparse_dot)
     1    0.001    0.001    0.001    0.001 {numpy.core._dotblas.dot}
     3    0.000    0.000    0.001    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:70(atleast2d_or_csr)
     1    0.000    0.000    0.001    0.001 /tmp/vb_sklearn/sklearn/metrics/pairwise.py:52(check_pairwise_arrays)
     2    0.000    0.000    0.001    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:23(safe_asarray)
     2    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
    12    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    12    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
    24    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     1    0.000    0.000    0.000    0.000 {method 'argmin' of 'numpy.ndarray' objects}
     8    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
     5    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     3    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
    14    0.000    0.000    0.000    0.000 {isinstance}
     3    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
    11    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/neighbors/nearest_centroid.py
Function: fit at line 74
Total time: 0.05637 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    74                                               def fit(self, X, y):
    75                                                   """
    76                                                   Fit the NearestCentroid model according to the given training data.
    77
    78                                                   Parameters
    79                                                   ----------
    80                                                   X : {array-like, sparse matrix}, shape = [n_samples, n_features]
    81                                                       Training vector, where n_samples in the number of samples and
    82                                                       n_features is the number of features.
    83                                                       Note that centroid shrinking cannot be used with sparse matrices.
    84                                                   y : array, shape = [n_samples]
    85                                                       Target values (integers)
    86                                                   """
    87         1           88     88.0      0.2          X, y = check_arrays(X, y, sparse_format="csr")
    88         1           12     12.0      0.0          if sp.issparse(X) and self.shrink_threshold:
    89                                                       raise ValueError("threshold shrinking not supported"
    90                                                                        " for sparse input")
    91
    92         1            2      2.0      0.0          n_samples, n_features = X.shape
    93         1          129    129.0      0.2          classes = np.unique(y)
    94         1            3      3.0      0.0          self.classes_ = classes
    95         1            2      2.0      0.0          n_classes = classes.size
    96         1            2      2.0      0.0          if n_classes < 2:
    97                                                       raise ValueError('y has less than 2 classes')
    98
    99                                                   # Mask mapping each class to it's members.
   100         1            7      7.0      0.0          self.centroids_ = np.empty((n_classes, n_features), dtype=np.float64)
   101         3           23      7.7      0.0          for i, cur_class in enumerate(classes):
   102         2           31     15.5      0.1              center_mask = y == cur_class
   103         2           44     22.0      0.1              if sp.issparse(X):
   104                                                           center_mask = np.where(center_mask)[0]
   105         2        56023  28011.5     99.4              self.centroids_[i] = X[center_mask].mean(axis=0)
   106
   107         1            2      2.0      0.0          if self.shrink_threshold:
   108                                                       dataset_centroid_ = np.array(X.mean(axis=0))[0]
   109                                                       # Number of clusters in each class.
   110                                                       nk = np.array([np.sum(classes == cur_class)
   111                                                                      for cur_class in classes])
   112                                                       # m parameter for determining deviation
   113                                                       m = np.sqrt((1. / nk) + (1. / n_samples))
   114                                                       # Calculate deviation using the standard deviation of centroids.
   115                                                       variance = np.array(np.power(X - self.centroids_[y], 2))
   116                                                       variance = variance.sum(axis=0)
   117                                                       s = np.sqrt(variance / (n_samples - n_classes))
   118                                                       s += np.median(s)  # To deter outliers from affecting the results.
   119                                                       mm = m.reshape(len(m), 1)  # Reshape to allow broadcasting.
   120                                                       ms = mm * s
   121                                                       deviation = ((self.centroids_ - dataset_centroid_) / ms)
   122                                                       # Soft thresholding: if the deviation crosses 0 during shrinking,
   123                                                       # it becomes zero.
   124                                                       signs = np.sign(deviation)
   125                                                       deviation = (np.abs(deviation) - self.shrink_threshold)
   126                                                       deviation[deviation < 0] = 0
   127                                                       deviation = np.multiply(deviation, signs)
   128                                                       # Now adjust the centroids using the deviation
   129                                                       msd = np.multiply(ms, deviation)
   130                                                       self.centroids_ = np.array([dataset_centroid_ + msd[i]
   131                                                                                   for i in xrange(n_classes)])
   132         1            2      2.0      0.0          return self

File: /tmp/vb_sklearn/sklearn/neighbors/nearest_centroid.py
Function: predict at line 134
Total time: 0.004384 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   134                                               def predict(self, X):
   135                                                   """Perform classification on an array of test vectors X.
   136
   137                                                   The predicted class C for each sample in X is returned.
   138
   139                                                   Parameters
   140                                                   ----------
   141                                                   X : array-like, shape = [n_samples, n_features]
   142
   143                                                   Returns
   144                                                   -------
   145                                                   C : array, shape = [n_samples]
   146
   147                                                   Notes
   148                                                   -----
   149                                                   If the metric constructor parameter is "precomputed", X is assumed to
   150                                                   be the distance matrix between the data to be predicted and
   151                                                   ``self.centroids_``.
   152                                                   """
   153         1          614    614.0     14.0          X = atleast2d_or_csr(X)
   154         1            3      3.0      0.1          if not hasattr(self, "centroids_"):
   155                                                       raise AttributeError("Model has not been trained yet.")
   156         1            2      2.0      0.0          return self.classes_[pairwise_distances(
   157         1         3765   3765.0     85.9              X, self.centroids_, metric=self.metric).argmin(axis=1)]

Table Of Contents

Benchmarks for neighbors¶

KNeighborsClassifier-brute-minimadelon¶

KNeighborsClassifier-brute-madelon¶

KNeighborsClassifier-ball_tree-arcene¶

KNeighborsClassifier-ball_tree-madelon¶

RadiusNeighborsClassifier-brute-minimadelon¶

RadiusNeighborsClassifier-brute-blobs¶

RadiusNeighborsClassifier-ball_tree-minimadelon¶

RadiusNeighborsClassifier-ball_tree-blobs¶

NearestCentroid-madelon¶