wiki:DatabaseBasedAnalysis

Version 16 (modified by tbretz, 6 years ago) ( diff )

--

Connecting to the Database

Host

The database is hosted at ihp-pc45.ethz.ch

User

First, you need a user. For the moment, a user 'fact' is available with the standard password. The user 'fact' is allowed to connect from everywhere if an encrypted connection is used. Usually, all reasonably recent mysql clients are using encrypted connections. So in most cases, a simple should be enough

> mysql -C -h ihp-pc45.ethz.ch -u fact -p factdata

To enforce encryption, --ssl (oder clients) or --ssl-mode=REQUIRED can be used. If you have problems with the connection, you can also try --protocol=TCP.

If you access the database from outside of ETH, it is wise to enable compression with the -C option. Inside ETH (in particular on ihp-pc45), enabling -C is certainly a performance drawback and should be avoided.

Note that the mysql client libraries at ISDC are too old and do not allow for encrypted connections. Thus no connection from ISDC is possible without tunnel. How to tunnel your connection is explained in the following. Note that it requires an account on ihp-pc45 (which I think should not be generally available). Thus this is mainly meant as a solution for automatic processes running at ISDC, for example, to update the database.

Forward Tunnel

If you are logged in at ISDC as 'user' and you have an account 'ethz' at ihp-pc45, you can use a tunnel. To setup a tunnel use

ISDC> ssh -x -C -n -N -q -L 10000:localhost:3306 ethz@ihp-pc45.ethz.ch

(It is wise to enable compression of the connection with the -C option)

Note that after log-in this process seems to stall (nothing happens anymore). This is correct. The tunnel is open. It will forward the local port 10000 from the ISDC machine to the port 3306 on a machine which is accessible as 'locahost' from ihp-pc45.

The mysql call would now look like

> mysql -h 127.0.0.1 -P 10000 -u fact -p factdata

Note that you need to use the IP address instead of localhost, otherwise the mysql client tries to use a socket connection (which will fail). You could also use --protocol=TCP.

As the mysql connection now comes via the loopback interface and not via the external IP, the connection of the mysql client is allowed to be unencrypted.

Backward Tunnel

Assume that you are already logged into ihp-pc45.ethz.ch and want to execute a mysql at ISDC accessing ihp-pc45, a backward tunnel can be used:

ihp-pc45> ssh -x -C -n -N -q -R 10000:localhost:3306 user@isdc-nx.isdc.unige.ch

(It is wise to enable compression of the connection with the -C option)

This command will log you into isdc-nx and (in parallel) create a tunnel from port 10000 at isdc-nx to port 3306 of a machine which is called 'localhost' from where you started the ssh connection (ihp-pc45).

You can now do

ISDC> mysql -h localhost -P 10000 -u fact -p factdata

Note that you need to use the IP address instead of localhost, otherwise the mysql client tries to use a socket connection (which will fail). You could also use --protocol=TCP.

As the mysql connection now comes via the loopback interface and not via the external IP, the connection of the mysql client is allowed to be unencrypted.

rootifysql

A convenient way to retrieve data is the rootifysql tool which is part of the FACT++ package. More details can be found either calling it with the --help option or at https://www.fact-project.org/logbook/showthread.php?tid=4192. The same access rules as if the native mysql client is used apply.

To simplify the usage, it is wise to write a rootifysql.rc file with the following contents:

uri=fact:password@iph-pc45.ethz.ch/factdata

The following tutorial assumes that such a file exists.

Other alternatives

Many possibilities exist to access a mysql database as a C API, MySQL++, Python (MySQL.Connector), PHP and others. You are free to use whatever tool you like. In the following, an analysis will be outlined using the rootifysql client and because it is most convenient.

PhpMyAdmin

To get a fast glimpse on the accessible databases and tables, you can log-in to PhpMyAdmin at http://iph-pc45.ethz.ch/phpmyadmin

The Analyis

Data Selection

For data selection only run-wise information should be relevant which are stored in the table RunInfo. The reason is that if you select data on a more fine grained level (e.g. event-wise zenith angle), right now there is no easy method to determine the corresponding observation time. So whenever data is selected event-wise make sure that you do not cut the data in a variable which cuts out events systematically and not randomly. For example, an event-wise cut in zenith angel usually keeps or discards two consecutive events because their zenith angle is correlated. For a cut in any image parameter (Width, Length, Size, ...), the result on two consecutive events is random because their image parameters are not correlated.

As an example we analyse the Crab data from our public sample (01/11/2013 - 06/11/2013).

Let's first have a look at the total observation time of all Crab data in this period:

SELECT 
   COUNT(*), 
   SUM(TIME_TO_SEC(TIMEDIFF(fRunStop,fRunStart))*fEffectiveOn/3600) AS EffOnTime, 
   MIN(fZenithDistanceMin) AS MinZd, 
   MAX(fZenithDistanceMax) AS MaxZd,
   MIN(fR750Cor/fR750Ref)  AS MinQ,
   MAX(fR750Cor/fR750Ref)  AS MaxQ
FROM 
   RunInfo 
WHERE 
   fSourceKey=5 
AND 
   fRunTypeKey=1 
AND 
   FileId BETWEEN 131101000 AND 131107000

The result (in mysql) is

+----------+-------------+-------+-------+---------+---------+
| COUNT(*) | EffOnTime   | MinZd | MaxZd | MinQ    | MaxQ    |
+----------+-------------+-------+-------+---------+---------+
|      435 | 32.53992354 |  6.36 | 67.89 | 0.01477 | 1.10584 |
+----------+-------------+-------+-------+---------+---------+
1 row in set (0.01 sec)

So we have 435 data runs from Crab with a total effective observation time of 32.5 hours in a zenith angle range between 6° and 68° and a bad weather factor between 0.01 (really bad) to 1.1 (extremely good).

Taking only good data by adding "AND fR750Cor/fR750Ref>0.9" to the WHERE-clause gives us

+----------+-------------+-------+-------+---------+---------+
| COUNT(*) | EffOnTime   | ZdMin | ZdMax | MinQ    | MaxQ    |
+----------+-------------+-------+-------+---------+---------+
|      328 | 24.58955887 |  6.36 | 67.86 | 0.90084 | 1.10584 |
+----------+-------------+-------+-------+---------+---------+
1 row in set (0.00 sec)

But we also want to restrict ourselves to "good" zenith angles (zenith angles at which there is no significnat efficiency loss). So we add "AND fZenithDistanceMax<35" to the WHERE-clause which yields

+----------+-------------+-------+-------+---------+---------+
| COUNT(*) | EffOnTime   | ZdMin | ZdMax | MinQ    | MaxQ    |
+----------+-------------+-------+-------+---------+---------+
|      244 | 19.06608557 |  6.36 | 34.90 | 0.90084 | 1.10584 |
+----------+-------------+-------+-------+---------+---------+
1 row in set (0.00 sec)

Now we need to get a list of these runs with

SELECT 
   FileId
FROM 
   RunInfo 
WHERE 
   fSourceKey=5 
AND 
   fRunTypeKey=1 
AND 
   FileId BETWEEN 131101000 AND 131107000
AND 
   fR750Ref/fR750Cor>0.9
AND 
   fZenithDistanceMax<35

This can later be JOINed with the following queries.

Let's write the list of runs into a file. There are plenty of options. Here are a few (assuming the query is in a file query.sql)

mysql -N [...] < query.sql > Crab.txt
cat query.sql | mysql -N [...] > Crab.txt
rootifysql [...] -n -v0 -d query.sql > Crab.txt
rootifysql [...] -n -w Crab.txt query.sql

The [...] is placeholder for additional options, in particular the login credentials (ideally they are kept in a file which can not be read by everyone).

An alternative is to put

#!/path/to/rootifysql --config=/path/to/resources.rc

in the first line of your query.sql. Make it executable "chmod u+x query.sql" and put your credentials (uri=) into resources.rc. Now you can call it directly

./query.sql -n -w Crab.txt

Data statistics

Now we want to get some statistics about the Crab data between 01/11/2013 and 06/11/2013 and see if we can do simple plots. For this we write the table into a root file.

rootifysql [...] --query " \
   SELECT               \
      *                 \
   FROM                 \
      RunInfo           \
   WHERE                \
      fSourceKey=5      \
   AND                  \
      fRunTypeKey=1     \
   AND                  \
      FileId BETWEEN 131101000 AND 131107000 \
   AND                  \
      fR750Ref/fR750Cor>0.9 \
   AND                  \
      fZenithDistanceMax<35 \
"

If the file already exists, either give it a different name (see --help for details) or overwrite it with --force.

The output should look similar to this

Reading global  options from 'fact++.rc'.
Reading default options from 'rootifysql.rc'.

------------------------ Rootify SQL -------------------------
Connecting to database...
Client Version: 5.7.23
Server Version: 5.7.23-0ubuntu0.18.04.1
Requesting data...
Opening file 'rootify.root' [compression=1]...
Trying to setup 120 branches...
Configured 115 branches.
Filling branches...
317 rows fetched.
317 rows skipped due to NULL field.
0 rows filled into tree.
10 kB written to disk.
File closed.
Execution time: 0.0537751s (169.6 us/row)
--------------------------------------------------------------

Per default rows which contain any NULL are not written to the file because all values are converted to a DOUBLE and there is no representation for a NULL-value in double. So, we need to force the output with --ignore-null and will get something like:

Reading global  options from 'fact++.rc'.
Reading default options from 'rootifysql.rc'.

------------------------ Rootify SQL -------------------------
Connecting to database...
Client Version: 5.7.23
Server Version: 5.7.23-0ubuntu0.18.04.1
Requesting data...
Opening file 'rootify.root' [compression=1]...
Trying to setup 120 branches...
Configured 115 branches.
Filling branches...
317 rows fetched.
317 rows filled into tree.
86 kB written to disk.
File closed.
Execution time: 0.072247s (227.9 us/row)
--------------------------------------------------------------

Now we can open the file in root and do plots. The easiest ist to use the tree viewer:

root rootify.root
root [0] 
Attaching file rootify.root as _file0...
root [1] TTree *T = (TTree*)_file0->Get("Result");
root [2] T->StartViewer();
root [3]

Or plot the zenith angle distribution directly:

root rootify.root
root [0] 
Attaching file rootify.root as _file0...
root [1] TTree *T = (TTree*)_file0->Get("Result");
root [2] T->Draw("fZenithDistanceMean");
root [3]

or zenith angle versus time (Hint: DATETIME columns are converted to Unix-time in seconds):

root rootify.root
root [0] 
Attaching file rootify.root as _file0...
root [1] TTree *T = (TTree*)_file0->Get("Result");
root [2] T->Draw("fZenithDistanceMean:fRunStart");
root [3]

Data retrieval

The events themselves are stored in a table named Events

Some statistics:

Simple Analysis

Note: See TracWiki for help on using the wiki.