Context Navigation

← Previous Changeset
Next Changeset →

Changeset 19795

Timestamp:

10/26/19 12:21:18 (6 years ago)

Author:

tbretz

Message:

Updated help text and added possibility to apply ignore also to the aliased column names.

File:

: 1 edited

trunk/FACT++/src/root2csv.cc (modified) (4 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/FACT++/src/root2csv.cc

-              r19794
+              r19795
         ("header",         var<uint16_t>(uint16_t(0)),"Type of header line (0: preceeding #, 1: without preceeding #, 2: none)")
         ("add.*",          var<string>(),             "Define an additional column")
         ("selector",       var<string>("1"),          "Define a selector for the columns (colums where this evaluates to a value <=0 are discarded)")
+        ("selector,s",     var<string>("1"),          "Define a selector for the columns (colums where this evaluates to a value <=0 are discarded)")
         ("skip",           po_switch(),               "Discards all default leaves and writes only the columns defined by --add.*")
         ("first",          var<int64_t>(int64_t(0)),  "First event to start with (default: 0), mainly for test purpose")
 …
         "refer to the output below to get the abbreviations.\n"
         "\n"
+        "This is a general purpose tool to fill the contents of a root file into a database "
+        "as long as this is technically possible and makes sense. Note that root can even "
+        "write complex data like a TH1F into a database, this is not the purpose of this "
+        "program.\n"
+        "Similar functionaliy is also provided by root2sql. In addition to root2sql, "
+        "this tool is more flexible in the slection of columns and adds the possibility "
+        "to use formulas (implemented through TTreeFormula) to calculate values for "
+        "additional columns. Note that root can even write complex data like a TH1F "
+        "into a file. Here, only numeric columns are supported.\n"
+        "\n"
+        "Input files are given as positional arguments or with --file. "
+        "As files are read by adding them through TChain::Add, wildcards are "
+        "supported in file names. Note that on the command lines, file names "
+        "with wildcards have to be escaped in quotation marks if the wildcards "
+        "should be evaluated by the program and not by the shell. The output base "
+        "name of the output file(s) is given with --out.\n"
+        "\n"
+        "The format of the first line on the file is defined with the --header option:\n"
+        "   0: '# Col1 Col2 Col3 ...'\n"
+        "   1: 'Col1 Col2 Col3 ...'\n"
+        "   2: first data row\n"
+        "\n"
+        "As default, existing files are not overwritten. To force overwriting use "
+        "--force. To append data to existing files use --append. Note that no "
+        "check is done if this created valid and reasonable files.\n"
         "\n"
         "Each root tree has branches and leaves (the basic data types). These leaves can "
 …
         "The name of each column to which data is filled from a leave is obtained from "
         "the leaves' names. The leave names can be checked using --print-leaves. "
+        "A --print-branches exists for convenience to print only the high-level branches. "
+        "Sometimes these names might be quite unconvenient like MTime.fTime.fMilliSec or "
+        "A --print-branches exists for convenience to print only the high-level branches.\n"
+        "\n"
+        "Assuming a leaf with name MHillas.fWidth and a leaf with MHillas.fLength, "
+        "a new column can be added with name Area by\n"
+        "   --add.Area='TMath::TwoPi()*MHillas.fWidth*MHillas.fLength'\n"
+        "\n"
+        "To simplify expression, root allows to define aliases, for example\n"
+        "   --alias.Width='MHillas.fWidth'\n"
+        "   --alias.Length='MHillas.fLength'\n"
+        "\n"
+        "This can then be used to simplyfy the above expression as\n"
+        "   --add.Area='TMath::TwoPi()*Width*Length'\n"
+        "\n"
+        "Sometimes leaf names might be quite unconvenient like MTime.fTime.fMilliSec or "
         "just MHillas.fWidth. To allow to simplify column names, regular expressions "
         "(using boost's regex) can be defined to change the names. Note that these regular "
         "expressions are applied one by one on each leaf's name. A valid expression could "
         "be:\n"
         "   --map=MHillas\\.f/\n"
+        "   --auto-alias=MHillas\\.f/\n"
         "which would remove all occurances of 'MHillas.f'. This option can be used more than "
+        "once. They are applied in sequence. A single match does not stop the sequence.\n"
+        "\n"
+        "Sometimes it might also be convenient to skip a leaf. This can be done with "
+        "once. They are applied in sequence. A single match does not stop the sequence. "
+        "In addition to replacing the column names accordingly, a alias is created "
+        "automatically allowing to access the columns in a formula with the new name.\n"
+        "\n"
+        "Sometimes it might also be convenient to skip a leaf, i.e. not writing the "
+        "coresponding column in the output file. This can be done with "
         "the --ignore resource. If the given regular expresion yields a match, the "
         "leaf will be ignored. Note that the regular expression works on the raw-name "
         "of the leaf not the readily mapped SQL column names. Example:\n"
+        "leaf will be ignored. An automatic alias would still be created and the "
+        "leaf could still be used in a formula. Example\n"
         "   --ignore=ThetaSq\\..*\n"
+        "will skip all leaved which start with 'ThetaSq.'. This option can be used"
+        "more than once.\n"
+        "\n"
+        "The data type of each column is kept as close as possible to the leaves' data "
+        "types. If for some reason this is not wanted, the data type of the SQL column "
+        "can be overwritten with --sql-type sql-column/sql-ytpe, for example:\n"
+        "   --sql-type=FileId/UNSIGNED INT\n"
+        "while the first argument of the name of the SQL column to which the data type "
+        "should be applied. The second column is the basic SQL data type. The option can "
+        "be given more than once.\n"
+        "\n"
+        "Database interaction:\n"
+        "\n"
+        "To drop an existing table, --drop can be used.\n"
+        "\n"
+        "To create a table according to theSQL  column names and data types, --create "
+        "can be used. The query used can be printed with --print-create even --create "
+        "has not been specified.\n"
+        "\n"
+        "To choose the columns which should become primary keys, use --primary, "
+        "for example:\n"
+        "   --primary=col1\n"
+        "To define more than one column as primary key, the option can be given more than "
+        "once. Note that the combination of these columns must be unique.\n"
+        "\n"
+        "All columns are created as NOT NULL as default. To force a database engine "
+        "and/or a storage format, use --engine and --row-format.\n"
+        "\n"
+        "Usually, the INSERT query would fail if the PRIMARY key exists already. "
+        "This can be avoided using the 'ON DUPLICATE KEY UPDATE' directive. With the "
+        "--duplicate, you can specify what should be updated in case of a duplicate key. "
+        "To keep the row untouched, you can just update the primary key "
+        "with the identical primary key, e.g. --duplicate='MyPrimary=VALUES(MyPrimary)'. "
+        "The --duplicate resource can be specified more than once to add more expressions "
+        "to the assignment_list. For more details, see the MySQL manual.\n"
+        "\n"
+        "For debugging purpose, or to just create or drop a table, the final insert "
+        "query can be skipped using --no-insert. Note that for performance reason, "
+        "all data is collected in memory and a single INSERT query is issued at the "
+        "end.\n"
+        "\n"
+        "Another possibility is to add the IGNORE keyword to the INSERT query by "
+        "--ignore-errors, which essentially ignores all errors and turns them into "
+        "warnings which are printed after the query succeeded.\n"
+        "\n"
+        "Using a higher verbosity level (-v), an overview of the written columns or all "
+        "processed leaves is printed depending on the verbosity level. The output looks "
+        "like the following\n"
+        "   Leaf name [root data type] (SQL name)\n"
+        "for example\n"
+        "   MTime.fTime.fMilliSec [Long64_t] (MilliSec)\n"
+        "which means that the leaf MTime.fTime.fMilliSec is detected to be a Long64_t "
+        "which is filled into a column called MilliSec. Leaves with non basic data types "
+        "are ignored automatically and are marked as (-n/a-). User ignored columns "
+        "are marked as (-ignored-).\n"
+        "\n"
+        "A constant value for the given file can be inserted by using the --const directive. "
+        "For example --const.mycolumn=42 would insert 42 into a column called mycolumn. "
+        "The column is created as INT UNSIGNED as default which can be altered by "
+        "--sql-type. A special case is a value of the form `/regex/format/`. Here, the given "
+        "regular expression is applied to the filename and it is newly formated with "
+        "the new format string. Uses the standard formatting rules to replace matches "
+        "(those used by ECMAScript's replace method).\n"
+        "\n"
+        "Usually the previously defined constant values are helpful to create an index "
+        "which relates unambiguously the inserted data to the file. It might be useful "
+        "to delete all data which belongs to this particular file before new data is "
+        "entered. This can be achieved with the `--delete` directive. It deletes all "
+        "data from the table before inserting new data which fulfills the condition "
+        "defined by the `--const` directives.\n"
+        "\n"
+        "The constant values can also be used for a conditional execution (--conditional). "
+        "If any row with the given constant values are found, the execution is stopped "
+        "(note that this happend after the table drop/create but before the delete/insert.\n"
+        "\n"
+        "To ensure efficient access for a conditonal execution, it makes sense to have "
+        "an index created for those columns. This can be done during table creation "
+        "with the --index option.\n"
+        "\n"
+        "To create the index as a UNIQUE INDEX, you can use the --unique option which "
+        "implies --index.\n"
+        "\n"
+        "If a query failed, the query is printed to stderr together with the error message. "
+        "For the main INSERT query, this is only true if the verbosity level is at least 2 "
+        "or the query has less than 80*25 bytes.\n"
+        "will skip all leaved which start with 'ThetaSq.'. This directive can be given "
+        "more than once. The so defined ignore list is applied entry-wise, first to the "
+        "raw leaf names, then to the aliased names.\n"
+        "\n"
+        "To select only certain extries from the file, a selector (cut) can be defined "
+        "in the same style as the --add directives, for exmple:\n"
+        "   --selector='MHillas.fLength*Width<0'\n"
+        "Note that the selctor is not evaluated to a boolean expression (==0 or !=0) "
+        "but all positive none zero values are considered 'true' (select the entry) "
+        "and all negative values are considered 'fales' (discard the entry).\n"
+        "\n"
+        "For several purposes, it might be convenient to split the output to several "
+        "files. This can be achieved using the --split-sequence (-S) "
+        "and the --split-quantile (-Q) options. If a split sequence is defined as "
+        "-S 1 -S 2 -S 1 the events are split by 1:2:1 in this sequence order. If "
+        "quantiles are given as -Q 0.5 -Q 0.6, the first tree will contain 50% of "
+        "the second one 10% and the third one 40%. The corresponding seed value can "
+        "be set with --seed. Filenames are then created by adding an index after(!) "
+        "the extension, e.g. file.csv-0, file.csv-1, ...\n"
         "\n"
         "In case of success, 0 is returned, a value>0 otherwise.\n"
         "\n"
         "Usage: root2sql [options] -uri URI rootfile.root\n"
+        "Usage: root2csv input1.root [input2.root ...] -o output.csv [-t tree] [-u] [-f] [-n] [-vN] [-cN]\n"
         "\n"
+        ;
 …
+                }
+            }
+            for (auto b=_ignore.cbegin(); b!=_ignore.cend(); b++)
+            {
+                if (boost::regex_match(name.c_str(), boost::regex(*b)))
+                {
+                    found = true;
+                    if (verbose>2)
+                        cout << " (-ignored-)";
+                    break;
+                }
+            }
             if (found)

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 19795

Legend:

trunk/FACT++/src/root2csv.cc

Download in other formats: