Changes between Version 20 and Version 21 of DatabaseBasedAnalysis/table2sql


Ignore:
Timestamp:
Oct 23, 2018, 6:55:08 PM (7 months ago)
Author:
tbretz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DatabaseBasedAnalysis/table2sql

    v20 v21  
    168168To do the pre-deleting efficiently, it makes sense to have an INDEX created for a fast access. This can be achieved with the `--index` directive which adds the corresponding statement to the table creation (`CREATE TABLE`)
    169169
     170== Splitting ==
     171
     172Sometimes it is necessary to split the data into several root-trees or ascii-files, for exmple to produce a test and trainings sample. For this, two options exist `--split-sequence` (shortcut `-S`) and `--split-quantile` (shortcut `-Q`).
     173
     174The first defines a fixed sequence, for example `-S 2 -S 1 -S 4` will write the first two events (2) to the first tree/file, the third event (1) to the second tree/file and the fourth to seventh (4) events to the third tree/file. To split even and odd events into two trees/files you have to use `-S 1 -S 1`.
     175
     176To randomly split the data use quantiles. For example, `-Q 0.5` splits the data equally into two samples, `-Q 0.2, -Q 0.5 -Q 0.9` splits the data into four samples of 10%, 20%, 30% and 40%.
     177
     178Note that splitting is defined on the rows which are received(!) from the database, i.e. before rows with NULL entries are excluded.
    170179
    171180== Debugging ==