Benchmarks for ensemble¶

RandomForestClassifier-arcene¶

Benchmark setup

from sklearn.ensemble import RandomForestClassifier
from deps import load_data

kwargs = {}
X, y, X_t, y_t = load_data('arcene')
obj = RandomForestClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/RandomForestClassifier-arcene-step0-timing.png

Memory usage

_images/RandomForestClassifier-arcene-step0-memory.png

Additional output

cProfile

         4810 function calls (4660 primitive calls) in 2.108 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    2.109    2.109 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    2.109    2.109 <f>:1(<module>)
     1    0.000    0.000    2.108    2.108 /tmp/vb_sklearn/sklearn/ensemble/forest.py:211(fit)
     1    0.000    0.000    2.097    2.097 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:430(__call__)
     1    0.000    0.000    2.097    2.097 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:292(dispatch)
     1    0.000    0.000    2.096    2.096 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:121(__init__)
     1    0.627    0.627    2.096    2.096 /tmp/vb_sklearn/sklearn/ensemble/forest.py:62(_parallel_build_trees)
    10    0.002    0.000    1.433    0.143 /tmp/vb_sklearn/sklearn/tree/tree.py:171(fit)
    10    0.584    0.058    1.429    0.143 {method 'build' of 'sklearn.tree._tree.Tree' objects}
    18    0.000    0.000    0.843    0.047 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:598(argsort)
    18    0.843    0.047    0.843    0.047 {method 'argsort' of 'numpy.ndarray' objects}
    10    0.000    0.000    0.035    0.003 /tmp/vb_sklearn/sklearn/ensemble/base.py:49(_make_estimator)
 90/10    0.001    0.000    0.033    0.003 /tmp/vb_sklearn/sklearn/base.py:16(clone)
150/80    0.001    0.000    0.029    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:145(deepcopy)
    10    0.000    0.000    0.027    0.003 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:306(_reconstruct)
    10    0.025    0.003    0.025    0.003 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/random/__init__.py:93(__RandomState_ctor)
    65    0.012    0.000    0.012    0.000 {numpy.core.multiarray.array}
    23    0.000    0.000    0.011    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     1    0.000    0.000    0.011    0.011 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
    50    0.001    0.000    0.004    0.000 /tmp/vb_sklearn/sklearn/base.py:193(get_params)
    30    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:211(set_params)
    50    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:164(_get_param_names)
    50    0.000    0.000    0.002    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:801(getargspec)
    50    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:741(getargs)
    11    0.001    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
    83    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    20    0.000    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:234(_deepcopy_tuple)
    83    0.001    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1774(amax)
   150    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:267(_keep_alive)
   348    0.001    0.000    0.001    0.000 {hasattr}
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
   587    0.000    0.000    0.000    0.000 {getattr}
    11    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
    10    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
    10    0.000    0.000    0.000    0.000 {method '__reduce_ex__' of 'object' objects}
   395    0.000    0.000    0.000    0.000 {isinstance}
    18    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:325(asfortranarray)
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:6(atleast_1d)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:436(__init__)
   256    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
    18    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
    21    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:181(check_random_state)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:144(__init__)
    12    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:281(<genexpr>)
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:757(searchsorted)
    10    0.000    0.000    0.000    0.000 {method 'max' of 'numpy.ndarray' objects}
    50    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
    10    0.000    0.000    0.000    0.000 {method '__setstate__' of 'mtrand.RandomState' objects}
    21    0.000    0.000    0.000    0.000 {method 'randint' of 'mtrand.RandomState' objects}
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:67(ismethod)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:101(delayed)
   410    0.000    0.000    0.000    0.000 {id}
    80    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/base.py:56(<genexpr>)
   310    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/function_base.py:781(copy)
    11    0.000    0.000    0.000    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
    50    0.000    0.000    0.000    0.000 <string>:8(__new__)
    70    0.000    0.000    0.000    0.000 {range}
   100    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x781380}
    20    0.000    0.000    0.000    0.000 {max}
   168    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
    11    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
    18    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
    93    0.000    0.000    0.000    0.000 {setattr}
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:142(isfunction)
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:209(iscode)
    10    0.000    0.000    0.000    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
    51    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
    18    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
   159    0.000    0.000    0.000    0.000 {len}
    11    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {cPickle.dumps}
   110    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:198(_deepcopy_atomic)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:39(short_format_time)
     1    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:17(update_wrapper)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:390(retrieve)
    20    0.000    0.000    0.000    0.000 {issubclass}
    40    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:123(_partition_trees)
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:449(isfortran)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:23(_squeeze_time)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:289(ascontiguousarray)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:282(__init__)
     1    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:39(wraps)
     2    0.000    0.000    0.000    0.000 {time.time}
     1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:337(_print)
     1    0.000    0.000    0.000    0.000 {min}
     1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:49(_verbosity_filter)
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:108(delayed_function)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:126(get)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/ensemble/forest.py
Function: fit at line 211
Total time: 2.21975 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   211                                               def fit(self, X, y):
   212                                                   """Build a forest of trees from the training set (X, y).
   213
   214                                                   Parameters
   215                                                   ----------
   216                                                   X : array-like of shape = [n_samples, n_features]
   217                                                       The training input samples.
   218
   219                                                   y : array-like, shape = [n_samples] or [n_samples, n_outputs]
   220                                                       The target values (integers that correspond to classes in
   221                                                       classification, real numbers in regression).
   222
   223                                                   Returns
   224                                                   -------
   225                                                   self : object
   226                                                       Returns self.
   227                                                   """
   228                                                   # Precompute some data
   229         1          152    152.0      0.0          X, y = check_arrays(X, y, sparse_format="dense")
   230         1           12     12.0      0.0          if getattr(X, "dtype", None) != DTYPE or \
   231                                                      X.ndim != 2 or not X.flags.fortran:
   232         1         9328   9328.0      0.4              X = array2d(X, dtype=DTYPE, order="F")
   233
   234         1           13     13.0      0.0          n_samples, self.n_features_ = X.shape
   235
   236         1            7      7.0      0.0          if self.bootstrap:
   237         1            6      6.0      0.0              sample_mask = None
   238         1            7      7.0      0.0              X_argsorted = None
   239
   240                                                   else:
   241                                                       if self.oob_score:
   242                                                           raise ValueError("Out of bag estimation only available"
   243                                                                   " if bootstrap=True")
   244
   245                                                       sample_mask = np.ones((n_samples,), dtype=np.bool)
   246
   247                                                       n_jobs, _, starts = _partition_features(self, self.n_features_)
   248
   249                                                       all_X_argsorted = Parallel(n_jobs=n_jobs)(
   250                                                           delayed(_parallel_X_argsort)(
   251                                                               X[:, starts[i]:starts[i + 1]])
   252                                                           for i in xrange(n_jobs))
   253
   254                                                       X_argsorted = np.asfortranarray(np.hstack(all_X_argsorted))
   255
   256         1           47     47.0      0.0          y = np.atleast_1d(y)
   257         1            7      7.0      0.0          if y.ndim == 1:
   258         1           19     19.0      0.0              y = y[:, np.newaxis]
   259
   260         1            7      7.0      0.0          self.classes_ = []
   261         1            7      7.0      0.0          self.n_classes_ = []
   262         1            8      8.0      0.0          self.n_outputs_ = y.shape[1]
   263
   264         1           11     11.0      0.0          if isinstance(self.base_estimator, ClassifierMixin):
   265         1           29     29.0      0.0              y = np.copy(y)
   266
   267         2           19      9.5      0.0              for k in xrange(self.n_outputs_):
   268         1          129    129.0      0.0                  unique = np.unique(y[:, k])
   269         1            9      9.0      0.0                  self.classes_.append(unique)
   270         1            9      9.0      0.0                  self.n_classes_.append(unique.shape[0])
   271         1           36     36.0      0.0                  y[:, k] = np.searchsorted(unique, y[:, k])
   272
   273         1           12     12.0      0.0          if getattr(y, "dtype", None) != DTYPE or not y.flags.contiguous:
   274         1           21     21.0      0.0              y = np.ascontiguousarray(y, dtype=DOUBLE)
   275
   276                                                   # Assign chunk of trees to jobs
   277         1           34     34.0      0.0          n_jobs, n_trees, _ = _partition_trees(self)
   278
   279                                                   # Parallel loop
   280         1           25     25.0      0.0          all_trees = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
   281         1            8      8.0      0.0              delayed(_parallel_build_trees)(
   282                                                           n_trees[i],
   283                                                           self,
   284                                                           X,
   285                                                           y,
   286                                                           sample_mask,
   287                                                           X_argsorted,
   288                                                           self.random_state.randint(MAX_INT),
   289                                                           verbose=self.verbose)
   290         1      2209689 2209689.0     99.5              for i in xrange(n_jobs))
   291
   292                                                   # Reduce
   293        11           81      7.4      0.0          self.estimators_ = [tree for tree in itertools.chain(*all_trees)]
   294
   295                                                   # Calculate out of bag predictions and score
   296         1            6      6.0      0.0          if self.oob_score:
   297                                                       if isinstance(self, ClassifierMixin):
   298                                                           self.oob_decision_function_ = []
   299                                                           self.oob_score_ = 0.0
   300
   301                                                           predictions = []
   302                                                           for k in xrange(self.n_outputs_):
   303                                                               predictions.append(np.zeros((n_samples,
   304                                                                                            self.n_classes_[k])))
   305
   306                                                           for estimator in self.estimators_:
   307                                                               mask = np.ones(n_samples, dtype=np.bool)
   308                                                               mask[estimator.indices_] = False
   309
   310                                                               p_estimator = estimator.predict_proba(X[mask, :])
   311                                                               if self.n_outputs_ == 1:
   312                                                                   p_estimator = [p_estimator]
   313
   314                                                               for k in xrange(self.n_outputs_):
   315                                                                   predictions[k][mask, :] += p_estimator[k]
   316
   317                                                           for k in xrange(self.n_outputs_):
   318                                                               if (predictions[k].sum(axis=1) == 0).any():
   319                                                                   warn("Some inputs do not have OOB scores. "
   320                                                                        "This probably means too few trees were used "
   321                                                                        "to compute any reliable oob estimates.")
   322                                                               decision = predictions[k] \
   323                                                                          / predictions[k].sum(axis=1)[:, np.newaxis]
   324                                                               self.oob_decision_function_.append(decision)
   325
   326                                                               self.oob_score_ += np.mean(y[:, k] \
   327                                                                                  == np.argmax(predictions[k], axis=1))
   328
   329                                                           if self.n_outputs_ == 1:
   330                                                               self.oob_decision_function_ = \
   331                                                                   self.oob_decision_function_[0]
   332
   333                                                           self.oob_score_ /= self.n_outputs_
   334
   335                                                       else:
   336                                                           # Regression:
   337                                                           predictions = np.zeros((n_samples, self.n_outputs_))
   338                                                           n_predictions = np.zeros((n_samples, self.n_outputs_))
   339
   340                                                           for estimator in self.estimators_:
   341                                                               mask = np.ones(n_samples, dtype=np.bool)
   342                                                               mask[estimator.indices_] = False
   343
   344                                                               p_estimator = estimator.predict(X[mask, :])
   345                                                               if self.n_outputs_ == 1:
   346                                                                   p_estimator = p_estimator[:, np.newaxis]
   347
   348                                                               predictions[mask, :] += p_estimator
   349                                                               n_predictions[mask, :] += 1
   350                                                           if (n_predictions == 0).any():
   351                                                               warn("Some inputs do not have OOB scores. "
   352                                                                    "This probably means too few trees were used "
   353                                                                    "to compute any reliable oob estimates.")
   354                                                               n_predictions[n_predictions == 0] = 1
   355                                                           predictions /= n_predictions
   356
   357                                                           self.oob_prediction_ = predictions
   358                                                           if self.n_outputs_ == 1:
   359                                                               self.oob_prediction_ = \
   360                                                                   self.oob_prediction_.reshape((n_samples, ))
   361
   362                                                           self.oob_score_ = 0.0
   363                                                           for k in xrange(self.n_outputs_):
   364                                                               self.oob_score_ += r2_score(y[:, k], predictions[:, k])
   365                                                           self.oob_score_ /= self.n_outputs_
   366
   367                                                   # Sum the importances
   368         1            6      6.0      0.0          if self.compute_importances:
   369                                                       self.feature_importances_ = \
   370                                                           sum(tree.feature_importances_ for tree in self.estimators_) \
   371                                                           / self.n_estimators
   372
   373         1            6      6.0      0.0          return self

RandomForestClassifier-madelon¶

Benchmark setup

from sklearn.ensemble import RandomForestClassifier
from deps import load_data

kwargs = {}
X, y, X_t, y_t = load_data('madelon')
obj = RandomForestClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/RandomForestClassifier-madelon-step0-timing.png

Memory usage

_images/RandomForestClassifier-madelon-step0-memory.png

Additional output

cProfile

         8912 function calls (8762 primitive calls) in 6.091 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    6.091    6.091 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    6.091    6.091 <f>:1(<module>)
     1    0.000    0.000    6.091    6.091 /tmp/vb_sklearn/sklearn/ensemble/forest.py:211(fit)
     1    0.000    0.000    6.080    6.080 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:430(__call__)
     1    0.000    0.000    6.080    6.080 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:292(dispatch)
     1    0.000    0.000    6.079    6.079 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:121(__init__)
     1    0.841    0.841    6.079    6.079 /tmp/vb_sklearn/sklearn/ensemble/forest.py:62(_parallel_build_trees)
    10    0.002    0.000    5.201    0.520 /tmp/vb_sklearn/sklearn/tree/tree.py:171(fit)
    10    2.390    0.239    5.194    0.519 {method 'build' of 'sklearn.tree._tree.Tree' objects}
   604    0.003    0.000    2.792    0.005 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:598(argsort)
   604    2.789    0.005    2.789    0.005 {method 'argsort' of 'numpy.ndarray' objects}
    10    0.000    0.000    0.035    0.003 /tmp/vb_sklearn/sklearn/ensemble/base.py:49(_make_estimator)
 90/10    0.001    0.000    0.033    0.003 /tmp/vb_sklearn/sklearn/base.py:16(clone)
150/80    0.001    0.000    0.029    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:145(deepcopy)
    10    0.000    0.000    0.027    0.003 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:306(_reconstruct)
    10    0.025    0.003    0.025    0.003 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/random/__init__.py:93(__RandomState_ctor)
   651    0.014    0.000    0.014    0.000 {numpy.core.multiarray.array}
    23    0.000    0.000    0.010    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     1    0.000    0.000    0.010    0.010 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
   604    0.002    0.000    0.006    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
   604    0.002    0.000    0.006    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:325(asfortranarray)
    50    0.001    0.000    0.004    0.000 /tmp/vb_sklearn/sklearn/base.py:193(get_params)
    30    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:211(set_params)
    50    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:164(_get_param_names)
    11    0.001    0.000    0.003    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
   604    0.002    0.000    0.002    0.000 {method 'fill' of 'numpy.ndarray' objects}
    50    0.000    0.000    0.002    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:801(getargspec)
   604    0.002    0.000    0.002    0.000 {numpy.core.multiarray.empty}
    11    0.001    0.000    0.001    0.000 {method 'sort' of 'numpy.ndarray' objects}
    50    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:741(getargs)
    20    0.000    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:234(_deepcopy_tuple)
    83    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    11    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:757(searchsorted)
    11    0.001    0.000    0.001    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
    83    0.001    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1774(amax)
    21    0.001    0.000    0.001    0.000 {method 'randint' of 'mtrand.RandomState' objects}
   150    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:267(_keep_alive)
   348    0.001    0.000    0.001    0.000 {hasattr}
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
   587    0.000    0.000    0.000    0.000 {getattr}
    10    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
    11    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
   395    0.000    0.000    0.000    0.000 {isinstance}
    10    0.000    0.000    0.000    0.000 {method '__reduce_ex__' of 'object' objects}
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:6(atleast_1d)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:436(__init__)
   256    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
    21    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:181(check_random_state)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:144(__init__)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:281(<genexpr>)
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/function_base.py:781(copy)
    12    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:67(ismethod)
    10    0.000    0.000    0.000    0.000 {method 'max' of 'numpy.ndarray' objects}
    50    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:101(delayed)
    80    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/base.py:56(<genexpr>)
   310    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
    50    0.000    0.000    0.000    0.000 <string>:8(__new__)
   168    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
   410    0.000    0.000    0.000    0.000 {id}
    10    0.000    0.000    0.000    0.000 {method '__setstate__' of 'mtrand.RandomState' objects}
    70    0.000    0.000    0.000    0.000 {range}
   100    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x781380}
    93    0.000    0.000    0.000    0.000 {setattr}
    11    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
    10    0.000    0.000    0.000    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:142(isfunction)
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:209(iscode)
    51    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
    20    0.000    0.000    0.000    0.000 {max}
   159    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {cPickle.dumps}
   110    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:198(_deepcopy_atomic)
     1    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:17(update_wrapper)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:39(short_format_time)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
    20    0.000    0.000    0.000    0.000 {issubclass}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:390(retrieve)
    40    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:123(_partition_trees)
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:449(isfortran)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:23(_squeeze_time)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:282(__init__)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:289(ascontiguousarray)
     1    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:39(wraps)
     2    0.000    0.000    0.000    0.000 {time.time}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:108(delayed_function)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:337(_print)
     1    0.000    0.000    0.000    0.000 {min}
     1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:49(_verbosity_filter)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:126(get)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/ensemble/forest.py
Function: fit at line 211
Total time: 6.15436 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   211                                               def fit(self, X, y):
   212                                                   """Build a forest of trees from the training set (X, y).
   213
   214                                                   Parameters
   215                                                   ----------
   216                                                   X : array-like of shape = [n_samples, n_features]
   217                                                       The training input samples.
   218
   219                                                   y : array-like, shape = [n_samples] or [n_samples, n_outputs]
   220                                                       The target values (integers that correspond to classes in
   221                                                       classification, real numbers in regression).
   222
   223                                                   Returns
   224                                                   -------
   225                                                   self : object
   226                                                       Returns self.
   227                                                   """
   228                                                   # Precompute some data
   229         1          131    131.0      0.0          X, y = check_arrays(X, y, sparse_format="dense")
   230         1           11     11.0      0.0          if getattr(X, "dtype", None) != DTYPE or \
   231                                                      X.ndim != 2 or not X.flags.fortran:
   232         1        20345  20345.0      0.3              X = array2d(X, dtype=DTYPE, order="F")
   233
   234         1           13     13.0      0.0          n_samples, self.n_features_ = X.shape
   235
   236         1            7      7.0      0.0          if self.bootstrap:
   237         1            7      7.0      0.0              sample_mask = None
   238         1            7      7.0      0.0              X_argsorted = None
   239
   240                                                   else:
   241                                                       if self.oob_score:
   242                                                           raise ValueError("Out of bag estimation only available"
   243                                                                   " if bootstrap=True")
   244
   245                                                       sample_mask = np.ones((n_samples,), dtype=np.bool)
   246
   247                                                       n_jobs, _, starts = _partition_features(self, self.n_features_)
   248
   249                                                       all_X_argsorted = Parallel(n_jobs=n_jobs)(
   250                                                           delayed(_parallel_X_argsort)(
   251                                                               X[:, starts[i]:starts[i + 1]])
   252                                                           for i in xrange(n_jobs))
   253
   254                                                       X_argsorted = np.asfortranarray(np.hstack(all_X_argsorted))
   255
   256         1           46     46.0      0.0          y = np.atleast_1d(y)
   257         1            8      8.0      0.0          if y.ndim == 1:
   258         1           18     18.0      0.0              y = y[:, np.newaxis]
   259
   260         1            7      7.0      0.0          self.classes_ = []
   261         1            6      6.0      0.0          self.n_classes_ = []
   262         1            8      8.0      0.0          self.n_outputs_ = y.shape[1]
   263
   264         1           12     12.0      0.0          if isinstance(self.base_estimator, ClassifierMixin):
   265         1           32     32.0      0.0              y = np.copy(y)
   266
   267         2           19      9.5      0.0              for k in xrange(self.n_outputs_):
   268         1          218    218.0      0.0                  unique = np.unique(y[:, k])
   269         1            9      9.0      0.0                  self.classes_.append(unique)
   270         1            9      9.0      0.0                  self.n_classes_.append(unique.shape[0])
   271         1          112    112.0      0.0                  y[:, k] = np.searchsorted(unique, y[:, k])
   272
   273         1           11     11.0      0.0          if getattr(y, "dtype", None) != DTYPE or not y.flags.contiguous:
   274         1           25     25.0      0.0              y = np.ascontiguousarray(y, dtype=DOUBLE)
   275
   276                                                   # Assign chunk of trees to jobs
   277         1           35     35.0      0.0          n_jobs, n_trees, _ = _partition_trees(self)
   278
   279                                                   # Parallel loop
   280         1           26     26.0      0.0          all_trees = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
   281         1            8      8.0      0.0              delayed(_parallel_build_trees)(
   282                                                           n_trees[i],
   283                                                           self,
   284                                                           X,
   285                                                           y,
   286                                                           sample_mask,
   287                                                           X_argsorted,
   288                                                           self.random_state.randint(MAX_INT),
   289                                                           verbose=self.verbose)
   290         1      6133133 6133133.0     99.7              for i in xrange(n_jobs))
   291
   292                                                   # Reduce
   293        11           80      7.3      0.0          self.estimators_ = [tree for tree in itertools.chain(*all_trees)]
   294
   295                                                   # Calculate out of bag predictions and score
   296         1            7      7.0      0.0          if self.oob_score:
   297                                                       if isinstance(self, ClassifierMixin):
   298                                                           self.oob_decision_function_ = []
   299                                                           self.oob_score_ = 0.0
   300
   301                                                           predictions = []
   302                                                           for k in xrange(self.n_outputs_):
   303                                                               predictions.append(np.zeros((n_samples,
   304                                                                                            self.n_classes_[k])))
   305
   306                                                           for estimator in self.estimators_:
   307                                                               mask = np.ones(n_samples, dtype=np.bool)
   308                                                               mask[estimator.indices_] = False
   309
   310                                                               p_estimator = estimator.predict_proba(X[mask, :])
   311                                                               if self.n_outputs_ == 1:
   312                                                                   p_estimator = [p_estimator]
   313
   314                                                               for k in xrange(self.n_outputs_):
   315                                                                   predictions[k][mask, :] += p_estimator[k]
   316
   317                                                           for k in xrange(self.n_outputs_):
   318                                                               if (predictions[k].sum(axis=1) == 0).any():
   319                                                                   warn("Some inputs do not have OOB scores. "
   320                                                                        "This probably means too few trees were used "
   321                                                                        "to compute any reliable oob estimates.")
   322                                                               decision = predictions[k] \
   323                                                                          / predictions[k].sum(axis=1)[:, np.newaxis]
   324                                                               self.oob_decision_function_.append(decision)
   325
   326                                                               self.oob_score_ += np.mean(y[:, k] \
   327                                                                                  == np.argmax(predictions[k], axis=1))
   328
   329                                                           if self.n_outputs_ == 1:
   330                                                               self.oob_decision_function_ = \
   331                                                                   self.oob_decision_function_[0]
   332
   333                                                           self.oob_score_ /= self.n_outputs_
   334
   335                                                       else:
   336                                                           # Regression:
   337                                                           predictions = np.zeros((n_samples, self.n_outputs_))
   338                                                           n_predictions = np.zeros((n_samples, self.n_outputs_))
   339
   340                                                           for estimator in self.estimators_:
   341                                                               mask = np.ones(n_samples, dtype=np.bool)
   342                                                               mask[estimator.indices_] = False
   343
   344                                                               p_estimator = estimator.predict(X[mask, :])
   345                                                               if self.n_outputs_ == 1:
   346                                                                   p_estimator = p_estimator[:, np.newaxis]
   347
   348                                                               predictions[mask, :] += p_estimator
   349                                                               n_predictions[mask, :] += 1
   350                                                           if (n_predictions == 0).any():
   351                                                               warn("Some inputs do not have OOB scores. "
   352                                                                    "This probably means too few trees were used "
   353                                                                    "to compute any reliable oob estimates.")
   354                                                               n_predictions[n_predictions == 0] = 1
   355                                                           predictions /= n_predictions
   356
   357                                                           self.oob_prediction_ = predictions
   358                                                           if self.n_outputs_ == 1:
   359                                                               self.oob_prediction_ = \
   360                                                                   self.oob_prediction_.reshape((n_samples, ))
   361
   362                                                           self.oob_score_ = 0.0
   363                                                           for k in xrange(self.n_outputs_):
   364                                                               self.oob_score_ += r2_score(y[:, k], predictions[:, k])
   365                                                           self.oob_score_ /= self.n_outputs_
   366
   367                                                   # Sum the importances
   368         1            7      7.0      0.0          if self.compute_importances:
   369                                                       self.feature_importances_ = \
   370                                                           sum(tree.feature_importances_ for tree in self.estimators_) \
   371                                                           / self.n_estimators
   372
   373         1            6      6.0      0.0          return self

ExtraTreesClassifier-arcene¶

Benchmark setup

from sklearn.ensemble import ExtraTreesClassifier
from deps import load_data

kwargs = {}
X, y, X_t, y_t = load_data('arcene')
obj = ExtraTreesClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/ExtraTreesClassifier-arcene-step0-timing.png

Memory usage

_images/ExtraTreesClassifier-arcene-step0-memory.png

Additional output

cProfile

         5077 function calls (4927 primitive calls) in 1.558 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    1.558    1.558 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    1.558    1.558 <f>:1(<module>)
     1    0.000    0.000    1.558    1.558 /tmp/vb_sklearn/sklearn/ensemble/forest.py:211(fit)
     2    0.000    0.000    1.545    0.773 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:430(__call__)
     2    0.000    0.000    1.545    0.772 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:292(dispatch)
     2    0.000    0.000    1.545    0.772 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:121(__init__)
     1    0.000    0.000    1.464    1.464 /tmp/vb_sklearn/sklearn/ensemble/forest.py:62(_parallel_build_trees)
    10    0.001    0.000    1.426    0.143 /tmp/vb_sklearn/sklearn/tree/tree.py:171(fit)
    10    1.301    0.130    1.423    0.142 {method 'build' of 'sklearn.tree._tree.Tree' objects}
    46    0.000    0.000    0.198    0.004 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:598(argsort)
    46    0.198    0.004    0.198    0.004 {method 'argsort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.081    0.081 /tmp/vb_sklearn/sklearn/ensemble/forest.py:146(_parallel_X_argsort)
    10    0.000    0.000    0.035    0.003 /tmp/vb_sklearn/sklearn/ensemble/base.py:49(_make_estimator)
 90/10    0.001    0.000    0.033    0.003 /tmp/vb_sklearn/sklearn/base.py:16(clone)
150/80    0.001    0.000    0.029    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:145(deepcopy)
    10    0.000    0.000    0.027    0.003 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:306(_reconstruct)
    10    0.025    0.002    0.025    0.002 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/random/__init__.py:93(__RandomState_ctor)
    95    0.015    0.000    0.015    0.000 {numpy.core.multiarray.array}
    24    0.000    0.000    0.015    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     1    0.000    0.000    0.011    0.011 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
    50    0.001    0.000    0.004    0.000 /tmp/vb_sklearn/sklearn/base.py:193(get_params)
    30    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:211(set_params)
    50    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:164(_get_param_names)
    50    0.000    0.000    0.002    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:801(getargspec)
    12    0.002    0.000    0.002    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:228(hstack)
    50    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:741(getargs)
    11    0.001    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
    20    0.000    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:234(_deepcopy_tuple)
    83    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    83    0.001    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
   150    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:267(_keep_alive)
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1774(amax)
   350    0.001    0.000    0.001    0.000 {hasattr}
    46    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:325(asfortranarray)
    46    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
   592    0.001    0.000    0.001    0.000 {getattr}
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
    10    0.000    0.000    0.000    0.000 {method '__reduce_ex__' of 'object' objects}
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:660(__init__)
   395    0.000    0.000    0.000    0.000 {isinstance}
    10    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:436(__init__)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:101(delayed)
   256    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
    21    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:181(check_random_state)
    12    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:6(atleast_1d)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:144(__init__)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:250(<genexpr>)
    46    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
    10    0.000    0.000    0.000    0.000 {method 'max' of 'numpy.ndarray' objects}
    46    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
    50    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
    80    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/base.py:56(<genexpr>)
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:67(ismethod)
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:757(searchsorted)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:281(<genexpr>)
   310    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
   410    0.000    0.000    0.000    0.000 {id}
    10    0.000    0.000    0.000    0.000 {method '__setstate__' of 'mtrand.RandomState' objects}
   100    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x781380}
    50    0.000    0.000    0.000    0.000 <string>:8(__new__)
    70    0.000    0.000    0.000    0.000 {range}
     2    0.000    0.000    0.000    0.000 {cPickle.dumps}
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/function_base.py:781(copy)
    13    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
    11    0.000    0.000    0.000    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
    96    0.000    0.000    0.000    0.000 {setattr}
    11    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
   171    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:209(iscode)
    52    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:142(isfunction)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:390(retrieve)
    10    0.000    0.000    0.000    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:17(update_wrapper)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:39(short_format_time)
   184    0.000    0.000    0.000    0.000 {len}
   110    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:198(_deepcopy_atomic)
     1    0.000    0.000    0.000    0.000 {map}
    11    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
    20    0.000    0.000    0.000    0.000 {max}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:151(_partition_features)
    40    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
    20    0.000    0.000    0.000    0.000 {issubclass}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:23(_squeeze_time)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:282(__init__)
    11    0.000    0.000    0.000    0.000 {method 'randint' of 'mtrand.RandomState' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:123(_partition_trees)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:449(isfortran)
     2    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:39(wraps)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:289(ascontiguousarray)
     4    0.000    0.000    0.000    0.000 {time.time}
     2    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
     2    0.000    0.000    0.000    0.000 {min}
     2    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:337(_print)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:49(_verbosity_filter)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:126(get)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:108(delayed_function)
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/ensemble/forest.py
Function: fit at line 211
Total time: 1.56702 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   211                                               def fit(self, X, y):
   212                                                   """Build a forest of trees from the training set (X, y).
   213
   214                                                   Parameters
   215                                                   ----------
   216                                                   X : array-like of shape = [n_samples, n_features]
   217                                                       The training input samples.
   218
   219                                                   y : array-like, shape = [n_samples] or [n_samples, n_outputs]
   220                                                       The target values (integers that correspond to classes in
   221                                                       classification, real numbers in regression).
   222
   223                                                   Returns
   224                                                   -------
   225                                                   self : object
   226                                                       Returns self.
   227                                                   """
   228                                                   # Precompute some data
   229         1          133    133.0      0.0          X, y = check_arrays(X, y, sparse_format="dense")
   230         1           11     11.0      0.0          if getattr(X, "dtype", None) != DTYPE or \
   231                                                      X.ndim != 2 or not X.flags.fortran:
   232         1         9217   9217.0      0.6              X = array2d(X, dtype=DTYPE, order="F")
   233
   234         1           13     13.0      0.0          n_samples, self.n_features_ = X.shape
   235
   236         1            7      7.0      0.0          if self.bootstrap:
   237                                                       sample_mask = None
   238                                                       X_argsorted = None
   239
   240                                                   else:
   241         1            7      7.0      0.0              if self.oob_score:
   242                                                           raise ValueError("Out of bag estimation only available"
   243                                                                   " if bootstrap=True")
   244
   245         1           42     42.0      0.0              sample_mask = np.ones((n_samples,), dtype=np.bool)
   246
   247         1           38     38.0      0.0              n_jobs, _, starts = _partition_features(self, self.n_features_)
   248
   249         1           25     25.0      0.0              all_X_argsorted = Parallel(n_jobs=n_jobs)(
   250         1            9      9.0      0.0                  delayed(_parallel_X_argsort)(
   251                                                               X[:, starts[i]:starts[i + 1]])
   252         1        98773  98773.0      6.3                  for i in xrange(n_jobs))
   253
   254         1         1438   1438.0      0.1              X_argsorted = np.asfortranarray(np.hstack(all_X_argsorted))
   255
   256         1           37     37.0      0.0          y = np.atleast_1d(y)
   257         1            8      8.0      0.0          if y.ndim == 1:
   258         1           17     17.0      0.0              y = y[:, np.newaxis]
   259
   260         1            8      8.0      0.0          self.classes_ = []
   261         1            7      7.0      0.0          self.n_classes_ = []
   262         1            8      8.0      0.0          self.n_outputs_ = y.shape[1]
   263
   264         1           11     11.0      0.0          if isinstance(self.base_estimator, ClassifierMixin):
   265         1           30     30.0      0.0              y = np.copy(y)
   266
   267         2           18      9.0      0.0              for k in xrange(self.n_outputs_):
   268         1          116    116.0      0.0                  unique = np.unique(y[:, k])
   269         1            9      9.0      0.0                  self.classes_.append(unique)
   270         1            8      8.0      0.0                  self.n_classes_.append(unique.shape[0])
   271         1           36     36.0      0.0                  y[:, k] = np.searchsorted(unique, y[:, k])
   272
   273         1           11     11.0      0.0          if getattr(y, "dtype", None) != DTYPE or not y.flags.contiguous:
   274         1           22     22.0      0.0              y = np.ascontiguousarray(y, dtype=DOUBLE)
   275
   276                                                   # Assign chunk of trees to jobs
   277         1           38     38.0      0.0          n_jobs, n_trees, _ = _partition_trees(self)
   278
   279                                                   # Parallel loop
   280         1           23     23.0      0.0          all_trees = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
   281         1            9      9.0      0.0              delayed(_parallel_build_trees)(
   282                                                           n_trees[i],
   283                                                           self,
   284                                                           X,
   285                                                           y,
   286                                                           sample_mask,
   287                                                           X_argsorted,
   288                                                           self.random_state.randint(MAX_INT),
   289                                                           verbose=self.verbose)
   290         1      1456787 1456787.0     93.0              for i in xrange(n_jobs))
   291
   292                                                   # Reduce
   293        11           88      8.0      0.0          self.estimators_ = [tree for tree in itertools.chain(*all_trees)]
   294
   295                                                   # Calculate out of bag predictions and score
   296         1            7      7.0      0.0          if self.oob_score:
   297                                                       if isinstance(self, ClassifierMixin):
   298                                                           self.oob_decision_function_ = []
   299                                                           self.oob_score_ = 0.0
   300
   301                                                           predictions = []
   302                                                           for k in xrange(self.n_outputs_):
   303                                                               predictions.append(np.zeros((n_samples,
   304                                                                                            self.n_classes_[k])))
   305
   306                                                           for estimator in self.estimators_:
   307                                                               mask = np.ones(n_samples, dtype=np.bool)
   308                                                               mask[estimator.indices_] = False
   309
   310                                                               p_estimator = estimator.predict_proba(X[mask, :])
   311                                                               if self.n_outputs_ == 1:
   312                                                                   p_estimator = [p_estimator]
   313
   314                                                               for k in xrange(self.n_outputs_):
   315                                                                   predictions[k][mask, :] += p_estimator[k]
   316
   317                                                           for k in xrange(self.n_outputs_):
   318                                                               if (predictions[k].sum(axis=1) == 0).any():
   319                                                                   warn("Some inputs do not have OOB scores. "
   320                                                                        "This probably means too few trees were used "
   321                                                                        "to compute any reliable oob estimates.")
   322                                                               decision = predictions[k] \
   323                                                                          / predictions[k].sum(axis=1)[:, np.newaxis]
   324                                                               self.oob_decision_function_.append(decision)
   325
   326                                                               self.oob_score_ += np.mean(y[:, k] \
   327                                                                                  == np.argmax(predictions[k], axis=1))
   328
   329                                                           if self.n_outputs_ == 1:
   330                                                               self.oob_decision_function_ = \
   331                                                                   self.oob_decision_function_[0]
   332
   333                                                           self.oob_score_ /= self.n_outputs_
   334
   335                                                       else:
   336                                                           # Regression:
   337                                                           predictions = np.zeros((n_samples, self.n_outputs_))
   338                                                           n_predictions = np.zeros((n_samples, self.n_outputs_))
   339
   340                                                           for estimator in self.estimators_:
   341                                                               mask = np.ones(n_samples, dtype=np.bool)
   342                                                               mask[estimator.indices_] = False
   343
   344                                                               p_estimator = estimator.predict(X[mask, :])
   345                                                               if self.n_outputs_ == 1:
   346                                                                   p_estimator = p_estimator[:, np.newaxis]
   347
   348                                                               predictions[mask, :] += p_estimator
   349                                                               n_predictions[mask, :] += 1
   350                                                           if (n_predictions == 0).any():
   351                                                               warn("Some inputs do not have OOB scores. "
   352                                                                    "This probably means too few trees were used "
   353                                                                    "to compute any reliable oob estimates.")
   354                                                               n_predictions[n_predictions == 0] = 1
   355                                                           predictions /= n_predictions
   356
   357                                                           self.oob_prediction_ = predictions
   358                                                           if self.n_outputs_ == 1:
   359                                                               self.oob_prediction_ = \
   360                                                                   self.oob_prediction_.reshape((n_samples, ))
   361
   362                                                           self.oob_score_ = 0.0
   363                                                           for k in xrange(self.n_outputs_):
   364                                                               self.oob_score_ += r2_score(y[:, k], predictions[:, k])
   365                                                           self.oob_score_ /= self.n_outputs_
   366
   367                                                   # Sum the importances
   368         1            6      6.0      0.0          if self.compute_importances:
   369                                                       self.feature_importances_ = \
   370                                                           sum(tree.feature_importances_ for tree in self.estimators_) \
   371                                                           / self.n_estimators
   372
   373         1            6      6.0      0.0          return self

ExtraTreesClassifier-madelon¶

Benchmark setup

from sklearn.ensemble import ExtraTreesClassifier
from deps import load_data

kwargs = {}
X, y, X_t, y_t = load_data('madelon')
obj = ExtraTreesClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/ExtraTreesClassifier-madelon-step0-timing.png

Memory usage

_images/ExtraTreesClassifier-madelon-step0-memory.png

Additional output

cProfile

         15409 function calls (15259 primitive calls) in 5.432 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    5.432    5.432 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000    5.432    5.432 <f>:1(<module>)
     1    0.000    0.000    5.432    5.432 /tmp/vb_sklearn/sklearn/ensemble/forest.py:211(fit)
     2    0.000    0.000    5.419    2.710 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:430(__call__)
     2    0.000    0.000    5.419    2.709 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:292(dispatch)
     2    0.000    0.000    5.419    2.709 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:121(__init__)
     1    0.000    0.000    5.265    5.265 /tmp/vb_sklearn/sklearn/ensemble/forest.py:62(_parallel_build_trees)
    10    0.001    0.000    5.217    0.522 /tmp/vb_sklearn/sklearn/tree/tree.py:171(fit)
    10    3.795    0.379    5.212    0.521 {method 'build' of 'sklearn.tree._tree.Tree' objects}
  1522    0.006    0.000    1.540    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:598(argsort)
  1522    1.534    0.001    1.534    0.001 {method 'argsort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.153    0.153 /tmp/vb_sklearn/sklearn/ensemble/forest.py:146(_parallel_X_argsort)
    10    0.000    0.000    0.046    0.005 /tmp/vb_sklearn/sklearn/ensemble/base.py:49(_make_estimator)
 90/10    0.001    0.000    0.044    0.004 /tmp/vb_sklearn/sklearn/base.py:16(clone)
150/80    0.001    0.000    0.040    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:145(deepcopy)
    10    0.000    0.000    0.038    0.004 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:306(_reconstruct)
    10    0.036    0.004    0.036    0.004 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/random/__init__.py:93(__RandomState_ctor)
  1571    0.021    0.000    0.021    0.000 {numpy.core.multiarray.array}
  1522    0.005    0.000    0.014    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
    24    0.000    0.000    0.014    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
  1522    0.005    0.000    0.012    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:325(asfortranarray)
     1    0.000    0.000    0.010    0.010 /tmp/vb_sklearn/sklearn/utils/validation.py:62(array2d)
  1522    0.005    0.000    0.005    0.000 {method 'fill' of 'numpy.ndarray' objects}
    50    0.001    0.000    0.004    0.000 /tmp/vb_sklearn/sklearn/base.py:193(get_params)
  1522    0.004    0.000    0.004    0.000 {numpy.core.multiarray.empty}
    30    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:211(set_params)
    50    0.000    0.000    0.003    0.000 /tmp/vb_sklearn/sklearn/base.py:164(_get_param_names)
    11    0.001    0.000    0.003    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
    50    0.000    0.000    0.002    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:801(getargspec)
    12    0.002    0.000    0.002    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.001    0.001 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:228(hstack)
    11    0.001    0.000    0.001    0.000 {method 'sort' of 'numpy.ndarray' objects}
    50    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:741(getargs)
    20    0.000    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:234(_deepcopy_tuple)
    83    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
    11    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:757(searchsorted)
    11    0.001    0.000    0.001    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
    83    0.001    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
   150    0.001    0.000    0.001    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:267(_keep_alive)
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1774(amax)
   350    0.001    0.000    0.001    0.000 {hasattr}
    10    0.000    0.000    0.001    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
   592    0.001    0.000    0.001    0.000 {getattr}
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
    10    0.000    0.000    0.000    0.000 {method '__reduce_ex__' of 'object' objects}
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:660(__init__)
    10    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
   395    0.000    0.000    0.000    0.000 {isinstance}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:101(delayed)
    21    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:181(check_random_state)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:436(__init__)
   256    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
    12    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:6(atleast_1d)
    10    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/tree/tree.py:144(__init__)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:250(<genexpr>)
    10    0.000    0.000    0.000    0.000 {method 'max' of 'numpy.ndarray' objects}
    11    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/function_base.py:781(copy)
    50    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:281(<genexpr>)
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:67(ismethod)
    80    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/base.py:56(<genexpr>)
   310    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {cPickle.dumps}
    10    0.000    0.000    0.000    0.000 {method '__setstate__' of 'mtrand.RandomState' objects}
   100    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x781380}
    50    0.000    0.000    0.000    0.000 <string>:8(__new__)
    70    0.000    0.000    0.000    0.000 {range}
   410    0.000    0.000    0.000    0.000 {id}
    96    0.000    0.000    0.000    0.000 {setattr}
    13    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:237(asanyarray)
    11    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
    52    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
   171    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:209(iscode)
    10    0.000    0.000    0.000    0.000 {method '__deepcopy__' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:390(retrieve)
    50    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/inspect.py:142(isfunction)
    20    0.000    0.000    0.000    0.000 {issubclass}
   184    0.000    0.000    0.000    0.000 {len}
     2    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:17(update_wrapper)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:39(short_format_time)
   110    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/copy.py:198(_deepcopy_atomic)
     1    0.000    0.000    0.000    0.000 {map}
    20    0.000    0.000    0.000    0.000 {max}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:151(_partition_features)
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
    40    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/forest.py:123(_partition_trees)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/logger.py:23(_squeeze_time)
    10    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:449(isfortran)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/shape_base.py:58(atleast_2d)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:282(__init__)
    11    0.000    0.000    0.000    0.000 {method 'randint' of 'mtrand.RandomState' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:289(ascontiguousarray)
     2    0.000    0.000    0.000    0.000 /sp/lib/python/cpython-2.7.2/lib/python2.7/functools.py:39(wraps)
     4    0.000    0.000    0.000    0.000 {time.time}
     2    0.000    0.000    0.000    0.000 {min}
     2    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
     2    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:108(delayed_function)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:337(_print)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:49(_verbosity_filter)
     2    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/externals/joblib/parallel.py:126(get)
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/ensemble/forest.py
Function: fit at line 211
Total time: 5.38112 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   211                                               def fit(self, X, y):
   212                                                   """Build a forest of trees from the training set (X, y).
   213
   214                                                   Parameters
   215                                                   ----------
   216                                                   X : array-like of shape = [n_samples, n_features]
   217                                                       The training input samples.
   218
   219                                                   y : array-like, shape = [n_samples] or [n_samples, n_outputs]
   220                                                       The target values (integers that correspond to classes in
   221                                                       classification, real numbers in regression).
   222
   223                                                   Returns
   224                                                   -------
   225                                                   self : object
   226                                                       Returns self.
   227                                                   """
   228                                                   # Precompute some data
   229         1          142    142.0      0.0          X, y = check_arrays(X, y, sparse_format="dense")
   230         1           11     11.0      0.0          if getattr(X, "dtype", None) != DTYPE or \
   231                                                      X.ndim != 2 or not X.flags.fortran:
   232         1        21090  21090.0      0.4              X = array2d(X, dtype=DTYPE, order="F")
   233
   234         1           14     14.0      0.0          n_samples, self.n_features_ = X.shape
   235
   236         1            7      7.0      0.0          if self.bootstrap:
   237                                                       sample_mask = None
   238                                                       X_argsorted = None
   239
   240                                                   else:
   241         1            6      6.0      0.0              if self.oob_score:
   242                                                           raise ValueError("Out of bag estimation only available"
   243                                                                   " if bootstrap=True")
   244
   245         1           43     43.0      0.0              sample_mask = np.ones((n_samples,), dtype=np.bool)
   246
   247         1           37     37.0      0.0              n_jobs, _, starts = _partition_features(self, self.n_features_)
   248
   249         1           26     26.0      0.0              all_X_argsorted = Parallel(n_jobs=n_jobs)(
   250         1            8      8.0      0.0                  delayed(_parallel_X_argsort)(
   251                                                               X[:, starts[i]:starts[i + 1]])
   252         1       148770 148770.0      2.8                  for i in xrange(n_jobs))
   253
   254         1         1300   1300.0      0.0              X_argsorted = np.asfortranarray(np.hstack(all_X_argsorted))
   255
   256         1           35     35.0      0.0          y = np.atleast_1d(y)
   257         1            7      7.0      0.0          if y.ndim == 1:
   258         1           17     17.0      0.0              y = y[:, np.newaxis]
   259
   260         1            8      8.0      0.0          self.classes_ = []
   261         1            7      7.0      0.0          self.n_classes_ = []
   262         1            8      8.0      0.0          self.n_outputs_ = y.shape[1]
   263
   264         1           11     11.0      0.0          if isinstance(self.base_estimator, ClassifierMixin):
   265         1           34     34.0      0.0              y = np.copy(y)
   266
   267         2           18      9.0      0.0              for k in xrange(self.n_outputs_):
   268         1          209    209.0      0.0                  unique = np.unique(y[:, k])
   269         1           10     10.0      0.0                  self.classes_.append(unique)
   270         1            8      8.0      0.0                  self.n_classes_.append(unique.shape[0])
   271         1          101    101.0      0.0                  y[:, k] = np.searchsorted(unique, y[:, k])
   272
   273         1           11     11.0      0.0          if getattr(y, "dtype", None) != DTYPE or not y.flags.contiguous:
   274         1           24     24.0      0.0              y = np.ascontiguousarray(y, dtype=DOUBLE)
   275
   276                                                   # Assign chunk of trees to jobs
   277         1           39     39.0      0.0          n_jobs, n_trees, _ = _partition_trees(self)
   278
   279                                                   # Parallel loop
   280         1           23     23.0      0.0          all_trees = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
   281         1            8      8.0      0.0              delayed(_parallel_build_trees)(
   282                                                           n_trees[i],
   283                                                           self,
   284                                                           X,
   285                                                           y,
   286                                                           sample_mask,
   287                                                           X_argsorted,
   288                                                           self.random_state.randint(MAX_INT),
   289                                                           verbose=self.verbose)
   290         1      5208979 5208979.0     96.8              for i in xrange(n_jobs))
   291
   292                                                   # Reduce
   293        11           87      7.9      0.0          self.estimators_ = [tree for tree in itertools.chain(*all_trees)]
   294
   295                                                   # Calculate out of bag predictions and score
   296         1            7      7.0      0.0          if self.oob_score:
   297                                                       if isinstance(self, ClassifierMixin):
   298                                                           self.oob_decision_function_ = []
   299                                                           self.oob_score_ = 0.0
   300
   301                                                           predictions = []
   302                                                           for k in xrange(self.n_outputs_):
   303                                                               predictions.append(np.zeros((n_samples,
   304                                                                                            self.n_classes_[k])))
   305
   306                                                           for estimator in self.estimators_:
   307                                                               mask = np.ones(n_samples, dtype=np.bool)
   308                                                               mask[estimator.indices_] = False
   309
   310                                                               p_estimator = estimator.predict_proba(X[mask, :])
   311                                                               if self.n_outputs_ == 1:
   312                                                                   p_estimator = [p_estimator]
   313
   314                                                               for k in xrange(self.n_outputs_):
   315                                                                   predictions[k][mask, :] += p_estimator[k]
   316
   317                                                           for k in xrange(self.n_outputs_):
   318                                                               if (predictions[k].sum(axis=1) == 0).any():
   319                                                                   warn("Some inputs do not have OOB scores. "
   320                                                                        "This probably means too few trees were used "
   321                                                                        "to compute any reliable oob estimates.")
   322                                                               decision = predictions[k] \
   323                                                                          / predictions[k].sum(axis=1)[:, np.newaxis]
   324                                                               self.oob_decision_function_.append(decision)
   325
   326                                                               self.oob_score_ += np.mean(y[:, k] \
   327                                                                                  == np.argmax(predictions[k], axis=1))
   328
   329                                                           if self.n_outputs_ == 1:
   330                                                               self.oob_decision_function_ = \
   331                                                                   self.oob_decision_function_[0]
   332
   333                                                           self.oob_score_ /= self.n_outputs_
   334
   335                                                       else:
   336                                                           # Regression:
   337                                                           predictions = np.zeros((n_samples, self.n_outputs_))
   338                                                           n_predictions = np.zeros((n_samples, self.n_outputs_))
   339
   340                                                           for estimator in self.estimators_:
   341                                                               mask = np.ones(n_samples, dtype=np.bool)
   342                                                               mask[estimator.indices_] = False
   343
   344                                                               p_estimator = estimator.predict(X[mask, :])
   345                                                               if self.n_outputs_ == 1:
   346                                                                   p_estimator = p_estimator[:, np.newaxis]
   347
   348                                                               predictions[mask, :] += p_estimator
   349                                                               n_predictions[mask, :] += 1
   350                                                           if (n_predictions == 0).any():
   351                                                               warn("Some inputs do not have OOB scores. "
   352                                                                    "This probably means too few trees were used "
   353                                                                    "to compute any reliable oob estimates.")
   354                                                               n_predictions[n_predictions == 0] = 1
   355                                                           predictions /= n_predictions
   356
   357                                                           self.oob_prediction_ = predictions
   358                                                           if self.n_outputs_ == 1:
   359                                                               self.oob_prediction_ = \
   360                                                                   self.oob_prediction_.reshape((n_samples, ))
   361
   362                                                           self.oob_score_ = 0.0
   363                                                           for k in xrange(self.n_outputs_):
   364                                                               self.oob_score_ += r2_score(y[:, k], predictions[:, k])
   365                                                           self.oob_score_ /= self.n_outputs_
   366
   367                                                   # Sum the importances
   368         1            7      7.0      0.0          if self.compute_importances:
   369                                                       self.feature_importances_ = \
   370                                                           sum(tree.feature_importances_ for tree in self.estimators_) \
   371                                                           / self.n_estimators
   372
   373         1            6      6.0      0.0          return self

GradientBoostingClassifier-arcene¶

Benchmark setup

from sklearn.ensemble import GradientBoostingClassifier
from deps import load_data

kwargs = {}
X, y, X_t, y_t = load_data('arcene')
obj = GradientBoostingClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/GradientBoostingClassifier-arcene-step0-timing.png

Memory usage

_images/GradientBoostingClassifier-arcene-step0-memory.png

Additional output

cProfile

         9021 function calls in 36.665 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000   36.665   36.665 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000   36.665   36.665 <f>:1(<module>)
     1    0.000    0.000   36.665   36.665 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:735(fit)
     1    0.001    0.001   36.665   36.665 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:507(fit)
   100    0.006    0.000   36.564    0.366 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:478(fit_stage)
   100   36.453    0.365   36.457    0.365 {method 'build' of 'sklearn.tree._tree.Tree' objects}
   100    0.017    0.000    0.088    0.001 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:143(update_terminal_regions)
     1    0.000    0.000    0.078    0.078 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:598(argsort)
     1    0.078    0.078    0.078    0.078 {method 'argsort' of 'numpy.ndarray' objects}
   781    0.045    0.000    0.067    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:352(_update_terminal_region)
   205    0.015    0.000    0.015    0.000 {numpy.core.multiarray.array}
   982    0.003    0.000    0.013    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
  1763    0.013    0.000    0.013    0.000 {method 'sum' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.011    0.005 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:325(asfortranarray)
   100    0.002    0.000    0.008    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1774(amax)
   100    0.006    0.000    0.007    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:343(__call__)
   100    0.001    0.000    0.006    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
   100    0.005    0.000    0.006    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:349(negative_gradient)
  1662    0.005    0.000    0.005    0.000 {method 'take' of 'numpy.ndarray' objects}
   203    0.001    0.000    0.005    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
   881    0.004    0.000    0.004    0.000 {numpy.core.multiarray.where}
     1    0.003    0.003    0.003    0.003 {method 'astype' of 'numpy.ndarray' objects}
   984    0.002    0.000    0.002    0.000 {isinstance}
   100    0.002    0.000    0.002    0.000 {method 'apply' of 'sklearn.tree._tree.Tree' objects}
   100    0.001    0.000    0.001    0.000 {method 'max' of 'numpy.ndarray' objects}
   100    0.001    0.000    0.001    0.000 {method 'copy' of 'numpy.ndarray' objects}
   201    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
   100    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:449(isfortran)
   101    0.000    0.000    0.000    0.000 {range}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
   100    0.000    0.000    0.000    0.000 {getattr}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:83(fit)
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:87(predict)
     3    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:757(searchsorted)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:333(__init__)
     2    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
     6    0.000    0.000    0.000    0.000 {hasattr}
     4    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     1    0.000    0.000    0.000    0.000 {max}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:340(init_estimator)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:121(__init__)
     3    0.000    0.000    0.000    0.000 {len}
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py
Function: fit at line 735
Total time: 36.8206 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   735                                               def fit(self, X, y):
   736                                                   """Fit the gradient boosting model.
   737
   738                                                   Parameters
   739                                                   ----------
   740                                                   X : array-like, shape = [n_samples, n_features]
   741                                                       Training vectors, where n_samples is the number of samples
   742                                                       and n_features is the number of features. Use fortran-style
   743                                                       to avoid memory copies.
   744
   745                                                   y : array-like, shape = [n_samples]
   746                                                       Target values (integers in classification, real numbers in
   747                                                       regression)
   748                                                       For classification, labels must correspond to classes
   749                                                       ``0, 1, ..., n_classes_-1``
   750
   751                                                   Returns
   752                                                   -------
   753                                                   self : object
   754                                                       Returns self.
   755                                                   """
   756         1          133    133.0      0.0          self.classes_ = np.unique(y)
   757         1            4      4.0      0.0          self.n_classes_ = len(self.classes_)
   758         1           17     17.0      0.0          y = np.searchsorted(self.classes_, y)
   759         1            2      2.0      0.0          if self.loss == 'deviance':
   760         1            3      3.0      0.0              self.loss = 'mdeviance' if len(self.classes_) > 2 else 'bdeviance'
   761
   762         1     36820484 36820484.0    100.0          return super(GradientBoostingClassifier, self).fit(X, y)

GradientBoostingClassifier-madelon¶

Benchmark setup

from sklearn.ensemble import GradientBoostingClassifier
from deps import load_data

kwargs = {}
X, y, X_t, y_t = load_data('madelon')
obj = GradientBoostingClassifier(**kwargs)

Benchmark statement

obj.fit(X, y)

Execution time

_images/GradientBoostingClassifier-madelon-step0-timing.png

Memory usage

_images/GradientBoostingClassifier-madelon-step0-memory.png

Additional output

cProfile

         9117 function calls in 29.657 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000   29.657   29.657 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/vbench/benchmark.py:286(f)
     1    0.000    0.000   29.657   29.657 <f>:1(<module>)
     1    0.000    0.000   29.657   29.657 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:735(fit)
     1    0.002    0.002   29.656   29.656 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:507(fit)
   100    0.006    0.000   29.460    0.295 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:478(fit_stage)
   100   29.187    0.292   29.193    0.292 {method 'build' of 'sklearn.tree._tree.Tree' objects}
   100    0.032    0.000    0.214    0.002 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:143(update_terminal_regions)
   793    0.093    0.000    0.168    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:352(_update_terminal_region)
     1    0.000    0.000    0.138    0.138 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:598(argsort)
     1    0.138    0.138    0.138    0.138 {method 'argsort' of 'numpy.ndarray' objects}
   100    0.040    0.000    0.043    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:343(__call__)
   100    0.039    0.000    0.039    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:349(negative_gradient)
  1787    0.039    0.000    0.039    0.000 {method 'sum' of 'numpy.ndarray' objects}
   893    0.030    0.000    0.030    0.000 {numpy.core.multiarray.where}
   994    0.003    0.000    0.016    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1379(sum)
   205    0.014    0.000    0.014    0.000 {numpy.core.multiarray.array}
   100    0.010    0.000    0.010    0.000 {method 'apply' of 'sklearn.tree._tree.Tree' objects}
     2    0.000    0.000    0.010    0.005 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:325(asfortranarray)
  1686    0.009    0.000    0.009    0.000 {method 'take' of 'numpy.ndarray' objects}
   100    0.002    0.000    0.008    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1774(amax)
   100    0.001    0.000    0.006    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:32(_wrapit)
   203    0.001    0.000    0.005    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:167(asarray)
     1    0.004    0.004    0.004    0.004 {method 'astype' of 'numpy.ndarray' objects}
   996    0.002    0.000    0.002    0.000 {isinstance}
   100    0.001    0.000    0.001    0.000 {method 'max' of 'numpy.ndarray' objects}
   100    0.001    0.000    0.001    0.000 {method 'copy' of 'numpy.ndarray' objects}
   201    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
   100    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:449(isfortran)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/lib/arraysetops.py:90(unique)
   101    0.000    0.000    0.000    0.000 {range}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:94(check_arrays)
   100    0.000    0.000    0.000    0.000 {getattr}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:83(fit)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:757(searchsorted)
     1    0.000    0.000    0.000    0.000 {method 'searchsorted' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'sort' of 'numpy.ndarray' objects}
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/base.py:553(isspmatrix)
     1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
     2    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/scipy/sparse/sputils.py:116(_isinstance)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:87(predict)
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1044(ravel)
     3    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
     3    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/utils/validation.py:87(_num_samples)
     2    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /home/slave/virtualenvs/cpython-2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:1791(ones)
     2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:333(__init__)
     4    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
     6    0.000    0.000    0.000    0.000 {hasattr}
     1    0.000    0.000    0.000    0.000 {max}
     3    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:340(init_estimator)
     1    0.000    0.000    0.000    0.000 /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py:121(__init__)
     4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
     2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

LineProfiler

   Timer unit: 1e-06 s

File: /tmp/vb_sklearn/sklearn/ensemble/gradient_boosting.py
Function: fit at line 735
Total time: 29.7849 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   735                                               def fit(self, X, y):
   736                                                   """Fit the gradient boosting model.
   737
   738                                                   Parameters
   739                                                   ----------
   740                                                   X : array-like, shape = [n_samples, n_features]
   741                                                       Training vectors, where n_samples is the number of samples
   742                                                       and n_features is the number of features. Use fortran-style
   743                                                       to avoid memory copies.
   744
   745                                                   y : array-like, shape = [n_samples]
   746                                                       Target values (integers in classification, real numbers in
   747                                                       regression)
   748                                                       For classification, labels must correspond to classes
   749                                                       ``0, 1, ..., n_classes_-1``
   750
   751                                                   Returns
   752                                                   -------
   753                                                   self : object
   754                                                       Returns self.
   755                                                   """
   756         1          235    235.0      0.0          self.classes_ = np.unique(y)
   757         1            5      5.0      0.0          self.n_classes_ = len(self.classes_)
   758         1          110    110.0      0.0          y = np.searchsorted(self.classes_, y)
   759         1            5      5.0      0.0          if self.loss == 'deviance':
   760         1            6      6.0      0.0              self.loss = 'mdeviance' if len(self.classes_) > 2 else 'bdeviance'
   761
   762         1     29784551 29784551.0    100.0          return super(GradientBoostingClassifier, self).fit(X, y)

Table Of Contents

Benchmarks for ensemble¶

RandomForestClassifier-arcene¶

RandomForestClassifier-madelon¶

ExtraTreesClassifier-arcene¶

ExtraTreesClassifier-madelon¶

GradientBoostingClassifier-arcene¶

GradientBoostingClassifier-madelon¶