Ignore:
Timestamp:
10/26/19 12:21:18 (5 years ago)
Author:
tbretz
Message:
Updated help text and added possibility to apply ignore also to the aliased column names.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/FACT++/src/root2csv.cc

    r19794 r19795  
    5757        ("header",         var<uint16_t>(uint16_t(0)),"Type of header line (0: preceeding #, 1: without preceeding #, 2: none)")
    5858        ("add.*",          var<string>(),             "Define an additional column")
    59         ("selector",       var<string>("1"),          "Define a selector for the columns (colums where this evaluates to a value <=0 are discarded)")
     59        ("selector,s",     var<string>("1"),          "Define a selector for the columns (colums where this evaluates to a value <=0 are discarded)")
    6060        ("skip",           po_switch(),               "Discards all default leaves and writes only the columns defined by --add.*")
    6161        ("first",          var<int64_t>(int64_t(0)),  "First event to start with (default: 0), mainly for test purpose")
     
    9797        "refer to the output below to get the abbreviations.\n"
    9898        "\n"
    99         "This is a general purpose tool to fill the contents of a root file into a database "
    100         "as long as this is technically possible and makes sense. Note that root can even "
    101         "write complex data like a TH1F into a database, this is not the purpose of this "
    102         "program.\n"
     99        "Similar functionaliy is also provided by root2sql. In addition to root2sql, "
     100        "this tool is more flexible in the slection of columns and adds the possibility "
     101        "to use formulas (implemented through TTreeFormula) to calculate values for "
     102        "additional columns. Note that root can even write complex data like a TH1F "
     103        "into a file. Here, only numeric columns are supported.\n"
     104        "\n"
     105        "Input files are given as positional arguments or with --file. "
     106        "As files are read by adding them through TChain::Add, wildcards are "
     107        "supported in file names. Note that on the command lines, file names "
     108        "with wildcards have to be escaped in quotation marks if the wildcards "
     109        "should be evaluated by the program and not by the shell. The output base "
     110        "name of the output file(s) is given with --out.\n"
     111        "\n"
     112        "The format of the first line on the file is defined with the --header option:\n"
     113        "   0: '# Col1 Col2 Col3 ...'\n"
     114        "   1: 'Col1 Col2 Col3 ...'\n"
     115        "   2: first data row\n"
     116        "\n"
     117        "As default, existing files are not overwritten. To force overwriting use "
     118        "--force. To append data to existing files use --append. Note that no "
     119        "check is done if this created valid and reasonable files.\n"
    103120        "\n"
    104121        "Each root tree has branches and leaves (the basic data types). These leaves can "
     
    111128        "The name of each column to which data is filled from a leave is obtained from "
    112129        "the leaves' names. The leave names can be checked using --print-leaves. "
    113         "A --print-branches exists for convenience to print only the high-level branches. "
    114         "Sometimes these names might be quite unconvenient like MTime.fTime.fMilliSec or "
     130        "A --print-branches exists for convenience to print only the high-level branches.\n"
     131        "\n"
     132        "Assuming a leaf with name MHillas.fWidth and a leaf with MHillas.fLength, "
     133        "a new column can be added with name Area by\n"
     134        "   --add.Area='TMath::TwoPi()*MHillas.fWidth*MHillas.fLength'\n"
     135        "\n"
     136        "To simplify expression, root allows to define aliases, for example\n"
     137        "   --alias.Width='MHillas.fWidth'\n"
     138        "   --alias.Length='MHillas.fLength'\n"
     139        "\n"
     140        "This can then be used to simplyfy the above expression as\n"
     141        "   --add.Area='TMath::TwoPi()*Width*Length'\n"
     142        "\n"
     143        "Sometimes leaf names might be quite unconvenient like MTime.fTime.fMilliSec or "
    115144        "just MHillas.fWidth. To allow to simplify column names, regular expressions "
    116145        "(using boost's regex) can be defined to change the names. Note that these regular "
    117146        "expressions are applied one by one on each leaf's name. A valid expression could "
    118147        "be:\n"
    119         "   --map=MHillas\\.f/\n"
     148        "   --auto-alias=MHillas\\.f/\n"
    120149        "which would remove all occurances of 'MHillas.f'. This option can be used more than "
    121         "once. They are applied in sequence. A single match does not stop the sequence.\n"
    122         "\n"
    123         "Sometimes it might also be convenient to skip a leaf. This can be done with "
     150        "once. They are applied in sequence. A single match does not stop the sequence. "
     151        "In addition to replacing the column names accordingly, a alias is created "
     152        "automatically allowing to access the columns in a formula with the new name.\n"
     153        "\n"
     154        "Sometimes it might also be convenient to skip a leaf, i.e. not writing the "
     155        "coresponding column in the output file. This can be done with "
    124156        "the --ignore resource. If the given regular expresion yields a match, the "
    125         "leaf will be ignored. Note that the regular expression works on the raw-name "
    126         "of the leaf not the readily mapped SQL column names. Example:\n"
     157        "leaf will be ignored. An automatic alias would still be created and the "
     158        "leaf could still be used in a formula. Example\n"
    127159        "   --ignore=ThetaSq\\..*\n"
    128         "will skip all leaved which start with 'ThetaSq.'. This option can be used"
    129         "more than once.\n"
    130         "\n"
    131         "The data type of each column is kept as close as possible to the leaves' data "
    132         "types. If for some reason this is not wanted, the data type of the SQL column "
    133         "can be overwritten with --sql-type sql-column/sql-ytpe, for example:\n"
    134         "   --sql-type=FileId/UNSIGNED INT\n"
    135         "while the first argument of the name of the SQL column to which the data type "
    136         "should be applied. The second column is the basic SQL data type. The option can "
    137         "be given more than once.\n"
    138         "\n"
    139         "Database interaction:\n"
    140         "\n"
    141         "To drop an existing table, --drop can be used.\n"
    142         "\n"
    143         "To create a table according to theSQL  column names and data types, --create "
    144         "can be used. The query used can be printed with --print-create even --create "
    145         "has not been specified.\n"
    146         "\n"
    147         "To choose the columns which should become primary keys, use --primary, "
    148         "for example:\n"
    149         "   --primary=col1\n"
    150         "To define more than one column as primary key, the option can be given more than "
    151         "once. Note that the combination of these columns must be unique.\n"
    152         "\n"
    153         "All columns are created as NOT NULL as default. To force a database engine "
    154         "and/or a storage format, use --engine and --row-format.\n"
    155         "\n"
    156         "Usually, the INSERT query would fail if the PRIMARY key exists already. "
    157         "This can be avoided using the 'ON DUPLICATE KEY UPDATE' directive. With the "
    158         "--duplicate, you can specify what should be updated in case of a duplicate key. "
    159         "To keep the row untouched, you can just update the primary key "
    160         "with the identical primary key, e.g. --duplicate='MyPrimary=VALUES(MyPrimary)'. "
    161         "The --duplicate resource can be specified more than once to add more expressions "
    162         "to the assignment_list. For more details, see the MySQL manual.\n"
    163         "\n"
    164         "For debugging purpose, or to just create or drop a table, the final insert "
    165         "query can be skipped using --no-insert. Note that for performance reason, "
    166         "all data is collected in memory and a single INSERT query is issued at the "
    167         "end.\n"
    168         "\n"
    169         "Another possibility is to add the IGNORE keyword to the INSERT query by "
    170         "--ignore-errors, which essentially ignores all errors and turns them into "
    171         "warnings which are printed after the query succeeded.\n"
    172         "\n"
    173         "Using a higher verbosity level (-v), an overview of the written columns or all "
    174         "processed leaves is printed depending on the verbosity level. The output looks "
    175         "like the following\n"
    176         "   Leaf name [root data type] (SQL name)\n"
    177         "for example\n"
    178         "   MTime.fTime.fMilliSec [Long64_t] (MilliSec)\n"
    179         "which means that the leaf MTime.fTime.fMilliSec is detected to be a Long64_t "
    180         "which is filled into a column called MilliSec. Leaves with non basic data types "
    181         "are ignored automatically and are marked as (-n/a-). User ignored columns "
    182         "are marked as (-ignored-).\n"
    183         "\n"
    184         "A constant value for the given file can be inserted by using the --const directive. "
    185         "For example --const.mycolumn=42 would insert 42 into a column called mycolumn. "
    186         "The column is created as INT UNSIGNED as default which can be altered by "
    187         "--sql-type. A special case is a value of the form `/regex/format/`. Here, the given "
    188         "regular expression is applied to the filename and it is newly formated with "
    189         "the new format string. Uses the standard formatting rules to replace matches "
    190         "(those used by ECMAScript's replace method).\n"
    191         "\n"
    192         "Usually the previously defined constant values are helpful to create an index "
    193         "which relates unambiguously the inserted data to the file. It might be useful "
    194         "to delete all data which belongs to this particular file before new data is "
    195         "entered. This can be achieved with the `--delete` directive. It deletes all "
    196         "data from the table before inserting new data which fulfills the condition "
    197         "defined by the `--const` directives.\n"
    198         "\n"
    199         "The constant values can also be used for a conditional execution (--conditional). "
    200         "If any row with the given constant values are found, the execution is stopped "
    201         "(note that this happend after the table drop/create but before the delete/insert.\n"
    202         "\n"
    203         "To ensure efficient access for a conditonal execution, it makes sense to have "
    204         "an index created for those columns. This can be done during table creation "
    205         "with the --index option.\n"
    206         "\n"
    207         "To create the index as a UNIQUE INDEX, you can use the --unique option which "
    208         "implies --index.\n"
    209         "\n"
    210         "If a query failed, the query is printed to stderr together with the error message. "
    211         "For the main INSERT query, this is only true if the verbosity level is at least 2 "
    212         "or the query has less than 80*25 bytes.\n"
     160        "will skip all leaved which start with 'ThetaSq.'. This directive can be given "
     161        "more than once. The so defined ignore list is applied entry-wise, first to the "
     162        "raw leaf names, then to the aliased names.\n"
     163        "\n"
     164        "To select only certain extries from the file, a selector (cut) can be defined "
     165        "in the same style as the --add directives, for exmple:\n"
     166        "   --selector='MHillas.fLength*Width<0'\n"
     167        "Note that the selctor is not evaluated to a boolean expression (==0 or !=0) "
     168        "but all positive none zero values are considered 'true' (select the entry) "
     169        "and all negative values are considered 'fales' (discard the entry).\n"
     170        "\n"
     171        "For several purposes, it might be convenient to split the output to several "
     172        "files. This can be achieved using the --split-sequence (-S) "
     173        "and the --split-quantile (-Q) options. If a split sequence is defined as "
     174        "-S 1 -S 2 -S 1 the events are split by 1:2:1 in this sequence order. If "
     175        "quantiles are given as -Q 0.5 -Q 0.6, the first tree will contain 50% of "
     176        "the second one 10% and the third one 40%. The corresponding seed value can "
     177        "be set with --seed. Filenames are then created by adding an index after(!) "
     178        "the extension, e.g. file.csv-0, file.csv-1, ...\n"
    213179        "\n"
    214180        "In case of success, 0 is returned, a value>0 otherwise.\n"
    215181        "\n"
    216         "Usage: root2sql [options] -uri URI rootfile.root\n"
     182        "Usage: root2csv input1.root [input2.root ...] -o output.csv [-t tree] [-u] [-f] [-n] [-vN] [-cN]\n"
    217183        "\n"
    218184        ;
     
    639605                }
    640606            }
     607            for (auto b=_ignore.cbegin(); b!=_ignore.cend(); b++)
     608            {
     609                if (boost::regex_match(name.c_str(), boost::regex(*b)))
     610                {
     611                    found = true;
     612                    if (verbose>2)
     613                        cout << " (-ignored-)";
     614                    break;
     615                }
     616            }
    641617
    642618            if (found)
Note: See TracChangeset for help on using the changeset viewer.