Quick start¶

Scikit-learn-speed is a continuous benchmarking suite that accompanies the scikit-learn project. It is built on top of vbench.

Dependencies¶

In order to run the benchmarks, you must have the following packages installed:

The development version of vbench from Vlad’s repository.

Numpy, Scipy

Vbench itself requires:

git (it should be in the path, old versions might fail)

Pandas (and its dependencies)

sqlalchemy

memory_profiler (optional, for memory usage benchmarking)

line_profiler (optional, for line profiling)

For building the web pages (that you are looking at right now), you additionally need:

Sphinx

Matplotlib

Installation¶

Just fetch the latest code from the Github repository.

Running the benchmarks¶

In order to run the benchmarks on your own machine, you need to:

Clone the repository somewhere, for example ~/code/scikit-learn-speed

Extract the datasets:

cd ~/code/scikit-learn-speed/benchmarks
tar jxvf data.tar.bz2

Create the configuration file ~/.vbench-skl. For example:

[setup]
repo_path = /Users/vene/code/scikit-learn
repo_url = git@github.com:scikit-learn/scikit-learn.git
db_path = /Users/vene/code/scikit-learn-speed/benchmarks/benchmarks.db
tmp_dir = /tmp/vb_sklearn

The values displayed above are hardcoded defaults, and they are used in case the configuration value doesn’t exist, or to override skipped values. Specifically, this means you don’t have to bother to set repo_url and tmp_dir.

From the scikit-learn-speed folder, run make. This will call:

make run, which runs the benchmark suite,

make rst, which generates the Sphinx sources for the reports,

make html, which builds the HTML reports from the sources.

For more details about other make options, type make help. At the moment, the quick flag is passed by default, which means it only runs the linear_model benchmarks, as a test that everything works OK. Another flag is historical, which makes vbench go behind in time and run the suite on more releases, but this takes a long time. To pass this kinds of flags, just set the SKL_SPEED_ARGS environment variable. For example:

SKL_SPEED_ARGS='quick historical' make

The result of make run is a file called benchmarks.db in the benchmarks folder (or wherever you pointed the db_path setting to). You can look inside this file using sqlite3 or better, by instanciating a vbench.db.BenchmarksDB object, like this:

In [1]: from vbench.db import BenchmarkDB

In [2]: db = BenchmarkDB('benchmarks/benchmarks.db')

In [3]: db.get_benchmarks()
Out[3]:
                                                                           name description
checksum
0ff90bcf3a75abe21cede6ede6674aba               LinearRegression-minimadelon-fit        None
1b296252fc235e4b6d1559013263074e           LinearRegression-minimadelon-predict        None
(...)

In [4]: db.get_benchmark_results('0ff90bcf3a75abe21cede6ede6674aba')
Out[4]:
                    revision ncalls    timing  timing_min  timing_max  timing_mean  timing_median  timing_std                                          profile    memory traceback
timestamp
2012-07-23 14:07:14  6aaf15f   None  0.003620    0.001490    0.001876     0.001627       0.001515    0.000176           78 function calls in 0.004 seconds   O  0.121094      None
2012-07-24 12:19:11  af2602e   None  0.002806    0.001489    0.001892     0.001670       0.001630    0.000167           78 function calls in 0.003 seconds   O  0.121094      None

Generating the documentation¶

To only generate the HTML files from the database, navigate to the scikit-learn-speed folder and execute:

make rst   # this generates the Sphinx sources
make html  # this builds the HTML reports

You can view the results by opening scikit-learn-speed/benchmarks/build/html/index.html in your favourite internet browser. An internet connection is recommended, because JQuery and JQueryUI are loaded from Google’s CDN.

Table Of Contents

Quick start¶

Dependencies¶

Installation¶

Running the benchmarks¶

Generating the documentation¶