wiki:DatabaseBasedAnalysis/rootifysql

Version 18 (modified by tbretz, 5 months ago) (diff)

--

First get the man page of rootifysql either with rootifysql --help or (if installed) with man rootifysql.

Purpose

The idea of rootifysql is to conveniently request data from a MySQL database and write the contents returned into a tree in a root-file. Root trees offer a lot of nice features how to access the data fast and how to create plots easily.

For details, please refer either to the Draw-function of TTree (e.g. https://root.cern.ch/doc/master/classTTree.html#a73450649dc6e54b5b94516c468523e45) or checkout the root user manual (https://root.cern.ch/guides/users-guide), in particular the section about Trees. A mini-tutorial is given at the end of the page.

In the following basic knowledge on C++, root and root trees is assumed.

Connection

How a connection to the database is established is outlined in: DatabaseBasedAnalysis/Connection. Per default, your connection will be compressed if the the host-address of the mysql-server is neither localhost nor 127.0.0.1. To force compression use

uri fact:password@ihp-pc45.ethz.ch/factdata?compress=1

Generally, compression should be enable if connecting from the world. To test compression, the "Hello World! " example can be used.

In the following it is assumed that the database connection is working and tested, so that the resources (uri=...) are omitted.

Hello World!

To test if everything is working correctly, we run a "Hello World! " example. Credentials should be in the rootifysql.rc which should be write protected.

user@machine> rootifysql -d -n --query="SELECT 'Hello World\!'"
Reading global  options from 'fact++.rc'.
Reading default options from 'rootifysql.rc'.

------------------------ Rootify SQL -------------------------
Connecting to database...
Client Version: 5.7.23
Server Version: 5.7.23-0ubuntu0.18.04.1
Requesting data...
Opening file '/dev/null' [compression=1]...
Writing data to tree 'Result'
Trying to setup 1 branches...
Configured 0 branches.
Filling branches...

# Hello World!
Hello World!


1 rows fetched.
1 rows filled into tree.
0 B written to disk.
File closed.
Execution time: 0.058136s (58.1 ms/row)
--------------------------------------------------------------

Voila!

The -d option (a shortcut for --display) prints the result on standard out, the -n option (a shortcut to --null) redirects all output to /dev/null instead to a file. For benchmarking your query, you can also add --no-fill which does not only write to a root tree at /dev/null but does not fill the root tree at all. -q is a shortcut for --query

Query in the file

Instead of typing your query on the command line, you can store in a text-file, for example

user@machine> echo "SELECT 'Hello World\!'" > query.sql
user@machine> rootifysql -d -n query.sql
[...]
------------------------ Rootify SQL -------------------------
Reading query from file 'query.sql'.
[...]

# Hello World!
Hello World!

[...]
--------------------------------------------------------------

A nice feature of bash is that you can make the query.sql file executable. First add the executable in the first line preceded by #!. It might then look like:

#!/path/to/rootifysql
SELECT 'Hello World!'

Now make it executable for yourself

user@machine> chmod u+x query.sql

and you can call your query like

user@machine> ./query.sql [options]

with the [options] from rootifysql. If you prefix your query file with

#!/path/to/rootifysql -C /path/to/resources.rc

you can even have a dedicated resource file.

Connection information

To get more information on the connection to the database, use --print-connection, for example

user@machine> user@machine> rootifysql -d -n --print-connection query.sql
[...]
------------------------ Rootify SQL -------------------------
[...]
Compression of database connection is ON
Connection to databases is ENCRYPTED (DHE-RSA-AES256-SHA)
[...]
--------------------------------------------------------------

Bytes_sent       398 
Bytes_received   546 

Note that this data is retrieved through additional requests, which in principle slow down the execution time and add additional traffic to the connection.

Writing to an ASCII file

We can redirect the result of the query to an ASCII file by

rootifysql -v 0 -d -n query.sql > output.txt

A verbosity level of 0 (-v 0) is required to turn off all the other output to standard out.

If you want to keep it you can directly write to a file by

rootifysql -n query.sql -w output.txt

Here -w is a shortcut to --write. The default delimiter is a tab (\t). It can be changed with -d (short for --delimiter)

Resources of the type --copy-* allow to copy the sha-bang (--copy-shabang), the header's comments (--copy-header), the query itself (--copy-query) and additional comments (--copy-comments) from the input file to the ASCII file. The last three can be abbreviated with --copy-all.

Writing to a root-file

Basics

Let's try our previous example

user@machine> rootifysql -f -v 2 -q "SELECT 'Hello World\!'"
[...]
------------------------ Rootify SQL -------------------------
[...]
Opening file 'rootify.root' [compression=1]...
Writing data to tree 'Result'
Trying to setup 1 branches...

 - Hello World! [VARCHAR NOT NULL] {V}

Configured 0 branches.
Filling branches...
1 rows fetched.
1 rows filled into tree.
4 kB written to disk.
File closed.
Execution time: 0.0775619s (77.6 ms/row)
--------------------------------------------------------------

The option -f (which is a shortcut for --force) actually overwrites an existing root file (RECREATE). The verbosity option -v 2 prints some information on how the table columns are translated. As the output suggests, 1 branch could have been setup (one column was returned), but no branch was configured. Our "Hello World''" string is returned as a VARCHAR (or CHAR), but only values are supported. The leading - tells us that the column was skipped, the V is an internal abbreviation for the action (and C for CHAR). So second try:

user@machine> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1"
[...]
------------------------ Rootify SQL -------------------------
[...]
Opening file 'rootify.root' [compression=1]...
Writing data to tree 'Result'
Trying to setup 2 branches...

 - Hello World! [VARCHAR NOT NULL] {V}
 + 1 [BIGINT NOT NULL] {n}

Configured 1 branches.
Filling branches...
1 rows fetched.
1 rows filled into tree.
5 kB written to disk.
File closed.
Execution time: 0.064065s (64.1 ms/row)
--------------------------------------------------------------

Now one branch was setup successfully (indicated by the +) because it of a basic (numerical) type (indicated by the n).

All columns of a numerical type (n) get converted to a branch of type double in the root-file. This keeps reading the root-file simple. All SQL columns of types DATE, DATETIME and TIME are converted to UNIX time and also written as a double-branch.

To have another way to skip columns, all variables defined in a query are skipped:

fact@ihp-pc45:~/FACT++/build> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1, @a:=5"
[...]
------------------------ Rootify SQL -------------------------
[...]
Trying to setup 3 branches...

 - Hello World! [VARCHAR NOT NULL] {V}
 + 1 [BIGINT NOT NULL] {n}
 - @a:=5 [BIGINT NOT NULL] {@}

1 branches skipped due to name starting with @.
Configured 1 branches.
[...]
--------------------------------------------------------------

Some general root file options

The output file name can be changed with --out and the tree name with --tree.

fact@ihp-pc45:~/FACT++/build> rootifysql -f -q "SELECT 1" --out newfile.root --tree MyTree
[...]
Opening file 'newfile.root' [compression=1]...
Writing data to tree 'MyTree'
[...]

To update en existing file use --update.

fact@ihp-pc45:~/FACT++/build> rootifysql --update -q "SELECT 1" --out newfile.root --tree MyNewTree
[...]
Opening file 'newfile.root' [compression=1]...
Writing data to tree 'MyTree'
[...]

Note that this will always create a new tree in an existing file. You can not update an existing tree.

The compression can be changed with -c (short for --compression)

fact@ihp-pc45:~/FACT++/build> rootifysql -c 9--query="SELECT 1"
[...]
Opening file 'rootifysql.root' [compression=9]...
[...]

Changing the branch name

This can be circumvented by giving it a name

fact@ihp-pc45:~/FACT++/build> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1, @a:=5 AS MyName"
[...]
------------------------ Rootify SQL -------------------------
[...]
Trying to setup 3 branches...

 - Hello World! [VARCHAR NOT NULL] {V}
 + 1 [BIGINT NOT NULL] {n}
 + MyName [BIGINT NOT NULL] {n}

Configured 2 branches.
[...]
--------------------------------------------------------------

Usually the name of the root-branch coincides with the name of the column, but it can easily be overwritten in the query with the AS directive as illustrated above.

Ignoring columns

Now assuming you want to create a very general query but then skip some column in the root-file (usually, it is more efficient to just don't query them, because they are transferred from the server nevertheless). In this case you can do:

user@machine> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1, @a:=5 AS MyName" --ignore MyName
[...]
------------------------ Rootify SQL -------------------------
[...]
Trying to setup 3 branches...

 - Hello World! [VARCHAR NOT NULL] {V}
 + 1 [BIGINT NOT NULL] {n}
 - MyName [BIGINT NOT NULL] {-}

1 branches skipped due to ignore list.
Configured 1 branches.
[...]
--------------------------------------------------------------

The --ignore option can be specified more than once and it takes regular expression. So --ignore M.* would skip all columns starting with My.

Ignoring NULL

Datanse tables can contain field without an entry (or with NULL as an entry). A root-branch is not able to represent no entry. Therefore, rows which contain any field with NULL are skipped as a default.

user@machine> rootifysql -n -d -q "SELECT NULL AS Col0, 1 AS Col1, 2 AS Col2" --ignore Col0
[...]
------------------------ Rootify SQL -------------------------
[...]
Trying to setup 3 branches...
1 branches skipped due to ignore list.
Configured 2 branches.
Filling branches...

# Col0 Col1 Col2


1 rows fetched.
1 rows skipped due to NULL field.
0 rows filled into tree.
0 B written to disk.
File closed.
Execution time: 0.0586302s (58.6 ms/row)
--------------------------------------------------------------

Note that this is even true for columns which contain NULL in a field which is only requested but not written to the file later on as you can see in the example above. Skipping rows with NULL fields can be turned off by the -i (short for --ignore-null) option.

user@machine> rootifysql -n -d -i -q "SELECT NULL AS Col0, 1 AS Col1, 2 AS Col2" 
[...]
------------------------ Rootify SQL -------------------------
[...]
Opening file '/dev/null' [compression=1]...
Writing data to tree 'Result'
Trying to setup 3 branches...
Configured 3 branches.
Filling branches...

# Col0 Col1 Col2
NULL    1       2


1 rows fetched.
1 rows filled into tree.
0 B written to disk.
File closed.
Execution time: 0.0561399s (56.1 ms/row)
--------------------------------------------------------------

All fields containing NULL will be written with a 0-value.

User Variables

MySQL supports user variables which can be used in a query. They can be set using the --var.* option, for example

user@machine> rootifysql -n -d -v 3 -q "SELECT @myvar" --var.myvar=42
[...]
------------------------ Rootify SQL -------------------------
[...]
Setting user defined variables...

SET
   @myvar:=42

Requesting data...

SELECT @myvar

Opening file '/dev/null' [compression=1]...
Writing data to tree 'Result'
Trying to setup 1 branches...

 - @myvar [BIGINT NULL] {@}

1 branches skipped due to name starting with @.
Configured 0 branches.
Filling branches...

# @myvar
42


1 rows fetched.
1 rows filled into tree.
0 B written to disk.
File closed.
Execution time: 0.0547669s (54.8 ms/row)
--------------------------------------------------------------

The -var.* option can be specified more than once for different variables. This shows another feature. Setting the verbosity level to 3 prints the user queries (not the internal ones).

Environment variables

Instead of using SQL user variables, a replacement can be done. Instead of --var.*, this is done with --env.*. Replaced are $ENV and ${ENV}. Note that the replacement is done linear, so recursive replacement is not foreseen. Here is an example:

user@machine> rootifysql -n -d -v 3 -q "SELECT \$myvar1, \${myvar2}, '\$myvar3'" --env.myvar1=1 --env.myvar2=2 --env.myvar3=3 --env.myvar3=4 
[...]
------------------------ Rootify SQL -------------------------
[...]
Requesting data...

SELECT 1, 2, '3, 4'

Opening file '/dev/null' [compression=1]...
Writing data to tree 'Result'
Trying to setup 3 branches...

 + 1 [BIGINT NOT NULL] {n}
 + 2 [BIGINT NOT NULL] {n}
 - 3, 4 [VARCHAR NOT NULL] {V}

Configured 2 branches.
Filling branches...

# 1 2 3, 4
1       2       3, 4


1 rows fetched.
1 rows filled into tree.
0 B written to disk.
File closed.
Execution time: 0.055017s (55.0 ms/row)
--------------------------------------------------------------

This example also illustrates that if one environment variable is specified more than once, a comma-separated list is created. This is especially useful together with the SQL IN statement. Please be aware that if you specify a query on the command line, you have to escape the $ to \$, otherwise they will be replaced already by the shell with the shell's environment variables.

An alternative to specifying environment variables on the command line is to read them from a file with --list.*. Each line (except empty lines and lines starting with #) are considered a single value. Let's assume the file input.txt contains the following

# This is input.txt
42
21
10.5

you can do

user@machine> rootifysql -n -d -v 3 -q "SELECT \$myvar" --list.myvar=input.txt 
[...]
------------------------ Rootify SQL -------------------------
Found 3 list element(s) for ${myvar}
[...]
Requesting data...

SELECT 42, 21, 10.5

Opening file '/dev/null' [compression=1]...
Writing data to tree 'Result'
Trying to setup 3 branches...

 + 42 [BIGINT NOT NULL] {n}
 + 21 [BIGINT NOT NULL] {n}
 + 10.5 [DECIMAL NOT NULL] {n}

Configured 3 branches.
Filling branches...

# 42 21 10.5
42      21      10.5


1 rows fetched.
1 rows filled into tree.
0 B written to disk.
File closed.
Execution time: 0.0551879s (55.2 ms/row)
--------------------------------------------------------------

Note that --env.* and --list.* can be mixed.

Explain and profiling

The --explain option request the server to explain (EXPLAIN) the query. The result is displayed as JSON and as standard data. In addition the returned optimized query is displayed. Some simple formatting is done for an easier readability. This is preliminary. Please report any problems

The --profiling option turns on profiling (SET PROFILING=1) and request the profile (SHOW PROFILE) of the main query afterwards and shows it on the console.

Returns

In case of success, rootifysql returns 0, a value>0 otherwise.

Analysis with root

Here is a short example how to access the data from the root file, assuming a filename rootify.root, the default tree name Result and two branches with name Col0 and Col1

void example()
{
    // Create chain for the tree Result
    // This is just easier than using TFile/TTree
    TChain c("Result");

    // Add the input file to the
    c.AddFile("rootify.root");

    // Assume you want to convert Col1 from RAD to DEG
    c.SetAlias("col1deg", "Col1*TMath::RadToDeg()");

    // Plot Col0 if Col1>PI
    new TCanvas;
    c.Draw("Col0", "Col1>TMath::Pi()");

    // Plot Col1 in degree versus Col0
    new TCanvas;
    c.Draw("Col0:col1deg", "");
}

or

void example()
{
    // Create chain for the tree Result
    // This is just easier than using TFile/TTree
    TChain c("Result");

    // Add the input file to the
    c.AddFile("rootify.root");

    // All columns are stores as doubles
    double col0, col1;

    // Connect the variables to the coresponding leaves
    c.SetBranchAddress("Col0", &col0);
    c.SetBranchAddress("Col1", &col1);

    for (int i=0; i<c.GetEntries(); i++)
    {
        // read the i-th event from the file
        c.GetEntry(i);

        cout << col0 << " " << col1 << endl;
    }
}

Assuming the code is written to example.C, both are executed with root example.C. Instead of Draw() also Select() can be used to print the data to the console.

For details, please refer either to the Draw-function of TTree (e.g. https://root.cern.ch/doc/master/classTTree.html#a73450649dc6e54b5b94516c468523e45) or checkout the root user manual (https://root.cern.ch/guides/users-guide), in particular the section about Trees.