Changes between Initial Version and Version 1 of DatabaseBasedAnalysis/table2sql


Ignore:
Timestamp:
08/07/18 16:11:41 (7 years ago)
Author:
tbretz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DatabaseBasedAnalysis/table2sql

    v1 v1  
     1[[TOC]]
     2This page describes two tools which are - apart from their input file format - nearly identical: `root2sql` and `fits2sql`.
     3
     4== Purpose ==
     5
     6Both file types, ROOT and FITS files, have storage engines for the storage of table like data. In ROOT files these are trees with branches and leaves and in FITS files these are FITS tables, the so-called ''extensions''. Although, ''root'' offers a lot of printing and fitting options, sometimes the access through a database has advantages. Therefore, a tool which maps ROOT-files and FITS-files into SQL tables helps.
     7
     8== Introduction ==
     9
     10First get the man page of `root2sql` either with `root2sql --help` or (if installed) with `man root2sql`. This page will mainly outline how `root2sql` works, because `fits2sql` is slightly less complex. In short, the differences will be explained at the end of the document.
     11
     12== Connection ==
     13
     14How a connection to the database is established is outlined in: DatabaseBasedAnalysis/Connection. Per default, your connection will be compressed if the the host-address of the mysql-server is neither '''localhost''' nor '''127.0.0.1'''. To force compression use
     15
     16{{{
     17uri fact:password@ihp-pc45.ethz.ch/factdata?compress=1
     18}}}
     19
     20Generally, compression should be enable if connecting from the world. To test compression, the "''Hello World! ''" example can be used.
     21
     22In the following it is assumed that the database connection is working and tested, so that the resources ({{{uri=...}}}) are omitted.
     23
     24== Basics ==
     25
     26Root to SQL:
     27  -u [ --uri ] arg              Database link as in
     28                                user:password@server[:port]/database[?compress=
     29                                0|1].
     30  --file arg                    The root file to read from
     31  --force                       Force processing even if there is no database
     32                                connection
     33  --create                      Create the database if not existing
     34  --drop                        Drop the table (implies create)
     35  -t [ --tree ] arg (=Events)   Name of the root tree to convert
     36  --table arg                   Name of the table to use (default is the tree
     37                                name)
     38  --map arg                     A regular expression which is applied to the
     39                                leaf name befoee it is used as SQL column name)
     40  --sql-type arg                Allows to overwrite the calculated SQL type for
     41                                a given column e.g. 'sql-column-name/UNSIGNED
     42                                IN'
     43  --ignore arg                  Ignore the given leaf, if the given regular
     44                                expression matches
     45  --primary arg                 List of columns to be used as primary keys
     46                                during table creation (in connection with
     47                                --create)
     48  --first arg (=0)              First event to start with (default: 0), mainly
     49                                for test purpose
     50  --max arg (=0)                Maximum number of events to process (0: all),
     51                                mainly for test purpose
     52  --engine arg                  Database engine to be used when a new table is
     53                                created
     54  --row-format arg              Defines the ROW_FORMAT keyword for table
     55                                creation
     56  --duplicate arg               Specifies an assignment_list for an 'ON
     57                                DUPLICATE KEY UPDATE' expression
     58  --ignore-errors               Adds the IGNORE keyword to the INSERT query
     59                                (turns errors into warnings, ignores rows with
     60                                errors)
     61
     62Debug options:
     63  --no-insert                   Does not insert any data into the table
     64  --dry-run                     Skip any query which changes the databse (might
     65                                result in consecutive failures)
     66  --print-connection            Print database connection information
     67  --print-branches              Print the branches found in the tree
     68  --print-leaves                Print the leaves found in the tree (this is
     69                                what is processed)
     70  --print-insert                Print the INSERT query (note that it contains
     71                                all data)
     72  --print-create                Print the CREATE query
     73  -v [ --verbose ] arg (=1)     Verbosity (0: quiet, 1: default, 2: more, 3,
     74
     75
     76Generally, ROOT-files can store very complicated object as a TH1F. Although, they could be entered in a database as a so called BLOB (a binary object), this is not the idea of the tool. Thus only basic data types are supported.
     77
     78== Input ==
     79
     80The input file is given as the first positional argument
     81
     82{{{
     83root2sql [--file] inputfile.root
     84}}}
     85
     86Each root tree has branches and leaves (the basic data types). These leaves can be read independently of the classes which were used to write the root file. The default tree to read from is '''Events''' but the name can be overwritten using `-t` (an abbreviation of `--tree`). Which would look like this
     87
     88{{{
     89root2sql inputfile.root -t MyTree
     90}}}
     91
     92== Output ==
     93
     94=== Table Name ===
     95
     96The default table which is filled with the data in this tree is expected to have the same name than the tree. If this is not the case, it can be changed with `--table`.
     97
     98{{{
     99root2sql inputfile.root -t MyTree --table MyTable
     100}}}
     101
     102In these three examples, the first one would read from a tree ''Events'' and write to table ''Events''. The second case would read from a tree ''MyTree'' and write to a table ''MyTree'' and the third case would read from a tree ''MyTree'' and write to a table ''MyTable''.
     103
     104=== Column Names ===
     105
     106The name of each column to which data is filled from a leave is obtained from the leaves' names. The leave names can be checked using `--print-leaves`. A `--print-branches` exists for convenience to print only the high-level branches.
     107
     108If the leaf with the basic data type is called ''MTime.fTime.fMilliSec', the column would be called ''MTime.fTime.fMilliSec'' as well.
     109
     110== Changing Column Names ==
     111
     112Sometimes the leaves' names might be quite inconvenient like in the example above. To allow to simplify column names, regular expressions (using boost's regex) can be defined to change the names. Note, that these regular expressions are applied one by one on each leaf's name. A valid expression could be:
     113{{{
     114--map=MHillas\.f/
     115}}}
     116
     117which would remove all occurrences of '''MHillas.f'''.
     118
     119{{{
     120--map MHillas\.f/MHillas_
     121}}}
     122
     123would replace all occurrences of '''MHillas.f''' by '''MHillas_'''.
     124
     125The corresponding call in C++ is `boost::regex_replace(leafname, search, replace)` for `--map search/replace`.
     126
     127As the `--map` option can be given more than once, many and complex replacement rules can be defined.
     128
     129They are applied in sequence, one by one and to all leaves. A single match does not stop the sequence.
     130
     131== Skipping Columns ==
     132
     133Sometimes it might also be convenient to skip a leaf. This can be done with the `--ignore regex` option. If the given regular expression yields a match, the leaf will be ignored.
     134
     135Note that the regular expression works on the raw-name of the leaf not the readily mapped SQL column names. Example:
     136   --ignore=ThetaSq\..*
     137will skip all leaved which start with 'ThetaSq.'. This option can be usedmore than once.
     138
     139=== Column Data Types ===
     140
     141The data type of each column is kept as close as possible to the leaves' basic data types. If for some reason this is not wanted, the data type of the SQL column can be overwritten with `--sql-type sql-column/sql-type`. This works like the following:
     142
     143   --sql-type=FileId/UNSIGNED INT
     144while the first argument of the name of the SQL column to which the data type should be applied. The second column is the basic SQL data type. The option can be given more than once.
     145
     146
     147
     148
     149== Changing column names ==
     150
     151Sometimes the leaves' names might be quite inconvenient like MTime.fTime.fMilliSec or just MHillas.fWidth. To allow to simplify column names, regular expressions (using boost's regex) can be defined to change the names. Note, that these regular expressions are applied one by one on each leaf's name. A valid expression could be:
     152   --map=MHillas\.f/
     153which would remove all occurances of 'MHillas.f'. This option can be used more than once. They are applied in sequence. A single match does not stop the sequence.
     154
     155Sometimes it might also be convenient to skip a leaf. This can be done with the --ignore resource. If the given regular expresion yields a match, the leaf will be ignored. Note that the regular expression works on the raw-name of the leaf not the readily mapped SQL column names. Example:
     156   --ignore=ThetaSq\..*
     157will skip all leaved which start with 'ThetaSq.'. This option can be usedmore than once.
     158
     159The data type of each column is kept as close as possible to the leaves' data types. If for some reason this is not wanted, the data type of the SQL column can be overwritten with --sql-type sql-column/sql-ytpe, for example:
     160   --sql-type=FileId/UNSIGNED INT
     161while the first argument of the name of the SQL column to which the data type should be applied. The second column is the basic SQL data type. The option can be given more than once.
     162
     163Database interaction:
     164
     165To drop an existing table, --drop can be used.
     166
     167To create a table according to theSQL  column names and data types, --create can be used. The query used can be printed with --print-create even --create has not been specified.
     168
     169To choose the columns which should become primary keys, use --primary, for exmaple:
     170   --primary=col1
     171To define more than one column as primary key, the option can be given more than once. Note that the combination of these columns must be unique.
     172
     173All columns are created as NOT NULL as default. To force a database engine and/or a storage format, use --engine and --rot-format.
     174
     175Usually, the INSERT query would fail if the PRIMARY key exists already. This can be avoided using the 'ON DUPLICATE KEY UPDATE' directive. With the --duplicate, you can specify what should be updated in case of a duplicate key. To keep the row untouched, you can just update the primary key with the identical primary key, e.g. --duplicate='MyPrimary=VALUES(MyPrimary)'. The --duplicate resource can be specified more than once to add more expressions to the assignment_list. For more details, see the MySQL manual.
     176
     177For debugging purpose, or to just create or drop a table, the final insert query can be skipped using --no-insert. Note that for performance reason, all data is collected in memory and a single INSERT query is issued at the end.
     178
     179Another possibility is to add the IGNORE keyword to the INSERT query by --ignore-errors, which essentially ignores all errors and turns them into warnings which are printed after the query succeeded.
     180
     181Using a higher verbosity level (-v), an overview of the written columns or all processed leaves is printed depending on the verbosity level. The output looks like the following
     182   Leaf name [root data type] (SQL name)
     183for example
     184   MTime.fTime.fMilliSec [Long64_t] (MilliSec)
     185which means that the leaf MTime.fTime.fMilliSec is detected to be a Long64_t which is filled into a column called MilliSec. Leaves with non basic data types are ignored automatically and are marked as (-n/a-). User ignored columnd are marked as (-ignored-).
     186
     187If a query failed, the query is printed to stderr together with the error message. For the main INSERT query, this is only true if the verbosity level is at least 2 or the query has less than 80*25 bytes.
     188
     189In case of succes, 0 is returned, a value>0 otherwise.
     190
     191
     192
     193
     194