#################
Command line tool
#################

There are two main issues that may arise with results reported in the literature:

 * Even though the same public datasets are used, the actual evaluation protocol may differ slightly from one paper to another.
 * The implementation of the reported evaluation metrics may also differ.

The first objective of the `pyannote.metrics` library is to address these two problems, and provide a convenient way for researchers to evaluate their approaches in a reproducible and comparable manner.

Here is an example use of the command line interface that is provided to solve this problem.

.. code-block:: bash

    $ pyannote-metrics diarization --subset=development Etape.SpeakerDiarization.TV hypothesis.rttm

    Diarization (collar = 0 ms)               error    purity    coverage     total    correct      %    fa.      %    miss.     %    conf.      %
    --------------------------------------  -------  --------  ----------  --------  ---------  -----  ------  -----  -------  ----  -------  -----
    BFMTV_BFMStory_2011-03-17_175900          14.64     94.74       90.00   2582.08    2300.22  89.08   96.16   3.72    80.14  3.10   201.72   7.81
    LCP_CaVousRegarde_2011-02-17_204700       17.80     89.13       86.90   3280.72    2848.42  86.82  151.78   4.63   208.29  6.35   224.01   6.83
    LCP_EntreLesLignes_2011-03-18_192900      23.46     79.52       79.03   1704.97    1337.80  78.46   32.89   1.93   157.14  9.22   210.03  12.32
    LCP_EntreLesLignes_2011-03-25_192900      26.75     76.97       75.86   1704.13    1292.83  75.86   44.61   2.62   158.38  9.29   252.92  14.84
    LCP_PileEtFace_2011-03-17_192900          10.73     93.33       92.30   1611.49    1487.32  92.30   48.73   3.02    55.49  3.44    68.67   4.26
    LCP_TopQuestions_2011-03-23_213900        18.28     98.25       94.20    727.26     668.65  91.94   74.36  10.22    16.41  2.26    42.20   5.80
    LCP_TopQuestions_2011-04-05_213900        27.97     97.95       79.81    818.03     638.68  78.08   49.45   6.04    17.46  2.13   161.89  19.79
    TV8_LaPlaceDuVillage_2011-03-14_172834    21.43     92.89       89.64    996.12     892.04  89.55  109.36  10.98    11.80  1.18    92.28   9.26
    TV8_LaPlaceDuVillage_2011-03-21_201334    66.23     77.24       70.64   1296.86     691.76  53.34  253.80  19.57    29.16  2.25   575.95  44.41
    TOTAL                                     23.27     88.18       84.55  14721.65   12157.71  82.58  861.14   5.85   734.28  4.99  1829.67  12.43


Tasks
-----

Not only can ``pyannote-metrics`` command line tool be used to compute the diarization error rate using NIST implementation, one can also evaluate the typical four sub-modules used in  most speaker diarization systems:

.. image:: images/pipeline.png

Practically, the first positional argument (e.g. ``diarization``, above) is a flag indicating which task should be evaluated.

Apart from the ``diarization`` flag that is used for evaluating speaker diarization results, other available flags are ``detection`` (speech activity detection), ``segmentation`` (speaker change detection), and ``identification`` (supervised speaker identification).
Depending on the task, a different set of evaluation metrics is computed.

Datasets and protocols
----------------------

``pyannote.metrics`` provides an easy way to ensure the same protocol (i.e. manual groundtruth and training/development/test split) is used for evaluation.

Internally, it relies on a collection of Python packages that all derive from the ``pyannote.database`` main package, that provides a convenient API to define training/development/test splits, along with groundtruth annotations.
In the example above, the `development` set of the `TV` evaluation protocol of the ETAPE dataset is used.

Results are both reported for each file in the selected subset, and aggregated into one final metric value.

As of March 2017, ``pyannote.database`` packages exist for the ETAPE corpus, the REPERE corpus, and the AMI corpus. As more people contribute new ``pyannote.database`` packages, they will be added to the `pyannote` ecosystem.

File formats
------------

Hypothesis files must use the [Rich Transcription Time Marked](https://web.archive.org/web/20170119114252/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf) (RTTM) format.