Quick start =========== Scikit-learn-speed is a continuous benchmarking suite that accompanies the `scikit-learn `_ project. It is built on top of `vbench `_. Dependencies ------------ In order to run the benchmarks, you must have the following packages installed: - The development version of vbench from `Vlad's repository `_. - `Numpy `_, `Scipy `_ Vbench itself requires: - `git `_ (it should be in the path, old versions might fail) - `Pandas `_ (and its dependencies) - `sqlalchemy `_ - `memory_profiler `_ (optional, for memory usage benchmarking) - `line_profiler `_ (optional, for line profiling) For building the web pages (that you are looking at right now), you additionally need: - `Sphinx `_ - `Matplotlib `_ Installation ------------ Just fetch the latest code from the `Github repository `_. Running the benchmarks ---------------------- In order to run the benchmarks on your own machine, you need to: 1. Clone the repository somewhere, for example ``~/code/scikit-learn-speed`` 2. Extract the datasets:: cd ~/code/scikit-learn-speed/benchmarks tar jxvf data.tar.bz2 3. Create the configuration file ``~/.vbench-skl``. For example:: [setup] repo_path = /Users/vene/code/scikit-learn repo_url = git@github.com:scikit-learn/scikit-learn.git db_path = /Users/vene/code/scikit-learn-speed/benchmarks/benchmarks.db tmp_dir = /tmp/vb_sklearn The values displayed above are hardcoded defaults, and they are used in case the configuration value doesn't exist, or to override skipped values. Specifically, this means you don't have to bother to set ``repo_url`` and ``tmp_dir``. 4. From the ``scikit-learn-speed`` folder, run ``make``. This will call: - ``make run``, which runs the benchmark suite, - ``make rst``, which generates the Sphinx sources for the reports, - ``make html``, which builds the HTML reports from the sources. For more details about other ``make`` options, type ``make help``. At the moment, the ``quick`` flag is passed by default, which means it only runs the ``linear_model`` benchmarks, as a test that everything works OK. Another flag is ``historical``, which makes vbench go behind in time and run the suite on more releases, but this takes a long time. To pass this kinds of flags, just set the ``SKL_SPEED_ARGS`` environment variable. For example:: SKL_SPEED_ARGS='quick historical' make The result of ``make run`` is a file called ``benchmarks.db`` in the ``benchmarks`` folder (or wherever you pointed the ``db_path`` setting to). You can look inside this file using ``sqlite3`` or better, by instanciating a ``vbench.db.BenchmarksDB`` object, like this: .. code-block:: python In [1]: from vbench.db import BenchmarkDB In [2]: db = BenchmarkDB('benchmarks/benchmarks.db') In [3]: db.get_benchmarks() Out[3]: name description checksum 0ff90bcf3a75abe21cede6ede6674aba LinearRegression-minimadelon-fit None 1b296252fc235e4b6d1559013263074e LinearRegression-minimadelon-predict None (...) In [4]: db.get_benchmark_results('0ff90bcf3a75abe21cede6ede6674aba') Out[4]: revision ncalls timing timing_min timing_max timing_mean timing_median timing_std profile memory traceback timestamp 2012-07-23 14:07:14 6aaf15f None 0.003620 0.001490 0.001876 0.001627 0.001515 0.000176 78 function calls in 0.004 seconds O 0.121094 None 2012-07-24 12:19:11 af2602e None 0.002806 0.001489 0.001892 0.001670 0.001630 0.000167 78 function calls in 0.003 seconds O 0.121094 None Generating the documentation ---------------------------- To only generate the HTML files from the database, navigate to the ``scikit-learn-speed`` folder and execute:: make rst # this generates the Sphinx sources make html # this builds the HTML reports You can view the results by opening ``scikit-learn-speed/benchmarks/build/html/index.html`` in your favourite internet browser. An internet connection is recommended, because JQuery and JQueryUI are loaded from Google's CDN.