Quick start¶
Scikit-learn-speed is a continuous benchmarking suite that accompanies the scikit-learn project. It is built on top of vbench.
Dependencies¶
In order to run the benchmarks, you must have the following packages installed:
- The development version of vbench from Vlad’s repository.
- Numpy, Scipy
Vbench itself requires:
- git (it should be in the path, old versions might fail)
- Pandas (and its dependencies)
- sqlalchemy
- memory_profiler (optional, for memory usage benchmarking)
- line_profiler (optional, for line profiling)
For building the web pages (that you are looking at right now), you additionally need:
Installation¶
Just fetch the latest code from the Github repository.
Running the benchmarks¶
In order to run the benchmarks on your own machine, you need to:
Clone the repository somewhere, for example ~/code/scikit-learn-speed
Extract the datasets:
cd ~/code/scikit-learn-speed/benchmarks tar jxvf data.tar.bz2
Create the configuration file ~/.vbench-skl. For example:
[setup] repo_path = /Users/vene/code/scikit-learn repo_url = git@github.com:scikit-learn/scikit-learn.git db_path = /Users/vene/code/scikit-learn-speed/benchmarks/benchmarks.db tmp_dir = /tmp/vb_sklearn
The values displayed above are hardcoded defaults, and they are used in case the configuration value doesn’t exist, or to override skipped values. Specifically, this means you don’t have to bother to set repo_url and tmp_dir.
- From the scikit-learn-speed folder, run make. This will call:
- make run, which runs the benchmark suite,
- make rst, which generates the Sphinx sources for the reports,
- make html, which builds the HTML reports from the sources.
For more details about other make options, type make help. At the moment, the quick flag is passed by default, which means it only runs the linear_model benchmarks, as a test that everything works OK. Another flag is historical, which makes vbench go behind in time and run the suite on more releases, but this takes a long time. To pass this kinds of flags, just set the SKL_SPEED_ARGS environment variable. For example:
SKL_SPEED_ARGS='quick historical' make
The result of make run is a file called benchmarks.db in the benchmarks folder (or wherever you pointed the db_path setting to). You can look inside this file using sqlite3 or better, by instanciating a vbench.db.BenchmarksDB object, like this:
In [1]: from vbench.db import BenchmarkDB
In [2]: db = BenchmarkDB('benchmarks/benchmarks.db')
In [3]: db.get_benchmarks()
Out[3]:
name description
checksum
0ff90bcf3a75abe21cede6ede6674aba LinearRegression-minimadelon-fit None
1b296252fc235e4b6d1559013263074e LinearRegression-minimadelon-predict None
(...)
In [4]: db.get_benchmark_results('0ff90bcf3a75abe21cede6ede6674aba')
Out[4]:
revision ncalls timing timing_min timing_max timing_mean timing_median timing_std profile memory traceback
timestamp
2012-07-23 14:07:14 6aaf15f None 0.003620 0.001490 0.001876 0.001627 0.001515 0.000176 78 function calls in 0.004 seconds O 0.121094 None
2012-07-24 12:19:11 af2602e None 0.002806 0.001489 0.001892 0.001670 0.001630 0.000167 78 function calls in 0.003 seconds O 0.121094 None
Generating the documentation¶
To only generate the HTML files from the database, navigate to the scikit-learn-speed folder and execute:
make rst # this generates the Sphinx sources
make html # this builds the HTML reports
You can view the results by opening scikit-learn-speed/benchmarks/build/html/index.html in your favourite internet browser. An internet connection is recommended, because JQuery and JQueryUI are loaded from Google’s CDN.