Version 16 (modified by 6 years ago) ( diff ) | ,
---|
Table of Contents
First get the man page of rootifysql
either with rootifysql --help
or (if installed) with man rootifysql
.
Purpose
The idea of rootifysql
is to conveniently request data from a MySQL database and write the contents returned into a tree in a root-file. Root trees offer a lot of nice features how to access the data fast and how to create plots easily.
For details, please refer either to the Draw-function of TTree (e.g. https://root.cern.ch/doc/master/classTTree.html#a73450649dc6e54b5b94516c468523e45) or checkout the root user manual (https://root.cern.ch/guides/users-guide), in particular the section about Trees. A mini-tutorial is given at the end of the page.
In the following basic knowledge on C++, root
and root trees is assumed.
Connection
How a connection to the database is established is outlined in: DatabaseBasedAnalysis/Connection. Per default, your connection will be compressed if the the host-address of the mysql-server is neither localhost nor 127.0.0.1. To force compression use
uri fact:password@ihp-pc45.ethz.ch/factdata?compress=1
Generally, compression should be enable if connecting from the world. To test compression, the "Hello World! " example can be used.
In the following it is assumed that the database connection is working and tested, so that the resources (uri=...
) are omitted.
Hello World!
To test if everything is working correctly, we run a "Hello World! " example. Credentials should be in the rootifysql.rc which should be write protected.
user@machine> rootifysql -d -n --query="SELECT 'Hello World\!'" Reading global options from 'fact++.rc'. Reading default options from 'rootifysql.rc'. ------------------------ Rootify SQL ------------------------- Connecting to database... Client Version: 5.7.23 Server Version: 5.7.23-0ubuntu0.18.04.1 Requesting data... Opening file '/dev/null' [compression=1]... Writing data to tree 'Result' Trying to setup 1 branches... Configured 0 branches. Filling branches... # Hello World! Hello World! 1 rows fetched. 1 rows filled into tree. 0 B written to disk. File closed. Execution time: 0.058136s (58.1 ms/row) --------------------------------------------------------------
Voila!
The -d
option (a shortcut for --display
) prints the result on standard out, the -n
option (a shortcut to --null
) redirects all output to /dev/null
instead to a file. For benchmarking your query, you can also add --no-fill
which does not only write to a root tree at /dev/null
but does not fill the root tree at all. -q
is a shortcut for --query
Query in the file
Instead of typing your query on the command line, you can store in a text-file, for example
user@machine> echo "SELECT 'Hello World\!'" > query.sql user@machine> rootifysql -d -n query.sql [...] ------------------------ Rootify SQL ------------------------- Reading query from file 'query.sql'. [...] # Hello World! Hello World! [...] --------------------------------------------------------------
A nice feature of bash is that you can make the query.sql file executable. First add the executable in the first line preceded by #!
. It might then look like:
#!/path/to/rootifysql SELECT 'Hello World!'
Now make it executable for yourself
user@machine> chmod u+x query.sql
and you can call your query like
user@machine> ./query.sql [options]
with the [options] from rootifysql
. If you prefix your query file with
#!/path/to/rootifysql -C /path/to/resources.rc
you can even have a dedicated resource file.
Connection information
To get more information on the connection to the database, use --print-connection
, for example
user@machine> user@machine> rootifysql -d -n --print-connection query.sql [...] ------------------------ Rootify SQL ------------------------- [...] Compression of database connection is ON Connection to databases is ENCRYPTED (DHE-RSA-AES256-SHA) [...] -------------------------------------------------------------- Bytes_sent 398 Bytes_received 546
Note that this data is retrieved through additional requests, which in principle slow down the execution time and add additional traffic to the connection.
Writing to an ASCII file
We can redirect the result of the query to an ASCII file by
rootifysql -v 0 -d -n query.sql > output.txt
A verbosity level of 0 (-v 0
) is required to turn off all the other output to standard out.
If you want to keep it you can directly write to a file by
rootifysql -n query.sql -w output.txt
Here -w
is a shortcut to --write
. The default delimiter is a tab (\t
). It can be changed with -d
(short for --delimiter
)
Writing to a root-file
Basics
Let's try our previous example
user@machine> rootifysql -f -v 2 -q "SELECT 'Hello World\!'" [...] ------------------------ Rootify SQL ------------------------- [...] Opening file 'rootify.root' [compression=1]... Writing data to tree 'Result' Trying to setup 1 branches... - Hello World! [VARCHAR NOT NULL] {V} Configured 0 branches. Filling branches... 1 rows fetched. 1 rows filled into tree. 4 kB written to disk. File closed. Execution time: 0.0775619s (77.6 ms/row) --------------------------------------------------------------
The option -f
(which is a shortcut for --force
) actually overwrites an existing root file (RECREATE). The verbosity option -v 2
prints some information on how the table columns are translated. As the output suggests, 1 branch could have been setup (one column was returned), but no branch was configured. Our "Hello World''" string is returned as a VARCHAR (or CHAR), but only values are supported. The leading -
tells us that the column was skipped, the V
is an internal abbreviation for the action (and C
for CHAR
). So second try:
user@machine> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1" [...] ------------------------ Rootify SQL ------------------------- [...] Opening file 'rootify.root' [compression=1]... Writing data to tree 'Result' Trying to setup 2 branches... - Hello World! [VARCHAR NOT NULL] {V} + 1 [BIGINT NOT NULL] {n} Configured 1 branches. Filling branches... 1 rows fetched. 1 rows filled into tree. 5 kB written to disk. File closed. Execution time: 0.064065s (64.1 ms/row) --------------------------------------------------------------
Now one branch was setup successfully (indicated by the +
) because it of a basic (numerical) type (indicated by the n
).
All columns of a numerical type (n
) get converted to a branch of type double in the root-file. This keeps reading the root-file simple. All SQL columns of types DATE
, DATETIME
and TIME
are converted to UNIX time and also written as a double-branch.
To have another way to skip columns, all variables defined in a query are skipped:
fact@ihp-pc45:~/FACT++/build> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1, @a:=5" [...] ------------------------ Rootify SQL ------------------------- [...] Trying to setup 3 branches... - Hello World! [VARCHAR NOT NULL] {V} + 1 [BIGINT NOT NULL] {n} - @a:=5 [BIGINT NOT NULL] {@} 1 branches skipped due to name starting with @. Configured 1 branches. [...] --------------------------------------------------------------
Some general root file options
The output file name can be changed with --out
and the tree name with --tree
.
fact@ihp-pc45:~/FACT++/build> rootifysql -f -q "SELECT 1" --out newfile.root --tree MyTree [...] Opening file 'newfile.root' [compression=1]... Writing data to tree 'MyTree' [...]
To update en existing file use --update
.
fact@ihp-pc45:~/FACT++/build> rootifysql --update -q "SELECT 1" --out newfile.root --tree MyNewTree [...] Opening file 'newfile.root' [compression=1]... Writing data to tree 'MyTree' [...]
Note that this will always create a new tree in an existing file. You can not update an existing tree.
The compression can be changed with -c
(short for --compression
)
fact@ihp-pc45:~/FACT++/build> rootifysql -c 9--query="SELECT 1" [...] Opening file 'rootifysql.root' [compression=9]... [...]
Changing the branch name
This can be circumvented by giving it a name
fact@ihp-pc45:~/FACT++/build> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1, @a:=5 AS MyName" [...] ------------------------ Rootify SQL ------------------------- [...] Trying to setup 3 branches... - Hello World! [VARCHAR NOT NULL] {V} + 1 [BIGINT NOT NULL] {n} + MyName [BIGINT NOT NULL] {n} Configured 2 branches. [...] --------------------------------------------------------------
Usually the name of the root-branch coincides with the name of the column, but it can easily be overwritten in the query with the AS
directive as illustrated above.
Ignoring columns
Now assuming you want to create a very general query but then skip some column in the root-file (usually, it is more efficient to just don't query them, because they are transferred from the server nevertheless). In this case you can do:
user@machine> rootifysql -f -v 2 -q "SELECT 'Hello World\!', 1, @a:=5 AS MyName" --ignore MyName [...] ------------------------ Rootify SQL ------------------------- [...] Trying to setup 3 branches... - Hello World! [VARCHAR NOT NULL] {V} + 1 [BIGINT NOT NULL] {n} - MyName [BIGINT NOT NULL] {-} 1 branches skipped due to ignore list. Configured 1 branches. [...] --------------------------------------------------------------
The --ignore
option can be specified more than once and it takes regular expression. So --ignore M.*
would skip all columns starting with My.
Ignoring NULL
Datanse tables can contain field without an entry (or with NULL as an entry). A root-branch is not able to represent no entry. Therefore, rows which contain any field with NULL are skipped as a default.
user@machine> rootifysql -n -d -q "SELECT NULL AS Col0, 1 AS Col1, 2 AS Col2" --ignore Col0 [...] ------------------------ Rootify SQL ------------------------- [...] Trying to setup 3 branches... 1 branches skipped due to ignore list. Configured 2 branches. Filling branches... # Col0 Col1 Col2 1 rows fetched. 1 rows skipped due to NULL field. 0 rows filled into tree. 0 B written to disk. File closed. Execution time: 0.0586302s (58.6 ms/row) --------------------------------------------------------------
Note that this is even true for columns which contain NULL in a field which is only requested but not written to the file later on as you can see in the example above. Skipping rows with NULL fields can be turned off by the -i
(short for --ignore-null
) option.
user@machine> rootifysql -n -d -i -q "SELECT NULL AS Col0, 1 AS Col1, 2 AS Col2" [...] ------------------------ Rootify SQL ------------------------- [...] Opening file '/dev/null' [compression=1]... Writing data to tree 'Result' Trying to setup 3 branches... Configured 3 branches. Filling branches... # Col0 Col1 Col2 NULL 1 2 1 rows fetched. 1 rows filled into tree. 0 B written to disk. File closed. Execution time: 0.0561399s (56.1 ms/row) --------------------------------------------------------------
All fields containing NULL will be written with a 0-value.
User Variables
MySQL supports user variables which can be used in a query. They can be set using the --var.*
option, for example
user@machine> rootifysql -n -d -v 3 -q "SELECT @myvar" --var.myvar=42 [...] ------------------------ Rootify SQL ------------------------- [...] Setting user defined variables... SET @myvar:=42 Requesting data... SELECT @myvar Opening file '/dev/null' [compression=1]... Writing data to tree 'Result' Trying to setup 1 branches... - @myvar [BIGINT NULL] {@} 1 branches skipped due to name starting with @. Configured 0 branches. Filling branches... # @myvar 42 1 rows fetched. 1 rows filled into tree. 0 B written to disk. File closed. Execution time: 0.0547669s (54.8 ms/row) --------------------------------------------------------------
The -var.*
option can be specified more than once for different variables. This shows another feature. Setting the verbosity level to 3 prints the user queries (not the internal ones).
Environment variables
Instead of using SQL user variables, a replacement can be done. Instead of --var.*
, this is done with --env.*
. Replaced are $ENV and ${ENV}. Note that the replacement is done linear, so recursive replacement is not foreseen. Here is an example:
user@machine> rootifysql -n -d -v 3 -q "SELECT \$myvar1, \${myvar2}, '\$myvar3'" --env.myvar1=1 --env.myvar2=2 --env.myvar3=3 --env.myvar3=4 [...] ------------------------ Rootify SQL ------------------------- [...] Requesting data... SELECT 1, 2, '3, 4' Opening file '/dev/null' [compression=1]... Writing data to tree 'Result' Trying to setup 3 branches... + 1 [BIGINT NOT NULL] {n} + 2 [BIGINT NOT NULL] {n} - 3, 4 [VARCHAR NOT NULL] {V} Configured 2 branches. Filling branches... # 1 2 3, 4 1 2 3, 4 1 rows fetched. 1 rows filled into tree. 0 B written to disk. File closed. Execution time: 0.055017s (55.0 ms/row) --------------------------------------------------------------
This example also illustrates that if one environment variable is specified more than once, a comma-separated list is created. This is especially useful together with the SQL IN
statement. Please be aware that if you specify a query on the command line, you have to escape the $
to \$
, otherwise they will be replaced already by the shell with the shell's environment variables.
An alternative to specifying environment variables on the command line is to read them from a file with --list.*
. Each line (except empty lines and lines starting with #
) are considered a single value. Let's assume the file input.txt contains the following
# This is input.txt 42 21 10.5
you can do
user@machine> rootifysql -n -d -v 3 -q "SELECT \$myvar" --list.myvar=input.txt [...] ------------------------ Rootify SQL ------------------------- Found 3 list element(s) for ${myvar} [...] Requesting data... SELECT 42, 21, 10.5 Opening file '/dev/null' [compression=1]... Writing data to tree 'Result' Trying to setup 3 branches... + 42 [BIGINT NOT NULL] {n} + 21 [BIGINT NOT NULL] {n} + 10.5 [DECIMAL NOT NULL] {n} Configured 3 branches. Filling branches... # 42 21 10.5 42 21 10.5 1 rows fetched. 1 rows filled into tree. 0 B written to disk. File closed. Execution time: 0.0551879s (55.2 ms/row) --------------------------------------------------------------
Note that --env.*
and --list.*
can be mixed.
Explain and profiling
The --explain
option request the server to explain (EXPLAIN
) the query. The result is displayed as JSON and as standard data. In addition the returned optimized query is displayed. Some simple formatting is done for an easier readability. This is preliminary. Please report any problems
The --profiling
option turns on profiling (SET PROFILING=1
) and request the profile (SHOW PROFILE
) of the main query afterwards and shows it on the console.
Returns
In case of success, rootifysql
returns 0, a value>0 otherwise.
Analysis with root
Here is a short example how to access the data from the root file, assuming a filename rootify.root, the default tree name Result and two branches with name Col0 and Col1
void example() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("rootify.root"); // Assume you want to convert Col1 from RAD to DEG c.SetAlias("col1deg", "Col1*TMath::RadToDeg()"); // Plot Col0 if Col1>PI new TCanvas; c.Draw("Col0", "Col1>TMath::Pi()"); // Plot Col1 in degree versus Col0 new TCanvas; c.Draw("Col0:col1deg", ""); }
or
void example() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("rootify.root"); // All columns are stores as doubles double col0, col1; // Connect the variables to the coresponding leaves c.SetBranchAddress("Col0", &col0); c.SetBranchAddress("Col1", &col1); for (int i=0; i<c.GetEntries(); i++) { // read the i-th event from the file c.GetEntry(i); cout << col0 << " " << col1 << endl; } }
Assuming the code is written to example.C, both are executed with root example.C
. Instead of Draw()
also Select()
can be used to print the data to the console.
For details, please refer either to the Draw-function of TTree (e.g. https://root.cern.ch/doc/master/classTTree.html#a73450649dc6e54b5b94516c468523e45) or checkout the root user manual (https://root.cern.ch/guides/users-guide), in particular the section about Trees.