Version 21 (modified by 6 years ago) ( diff ) | ,
---|
Connecting to the Database
Host
The database is hosted at ihp-pc45.ethz.ch
User
First, you need a user. For the moment, a user 'fact' is available with the standard password. The user 'fact' is allowed to connect from everywhere if an encrypted connection is used. Usually, all reasonably recent mysql clients are using encrypted connections. So in most cases, a simple should be enough
> mysql -C -h ihp-pc45.ethz.ch -u fact -p factdata
To enforce encryption, --ssl (oder clients) or --ssl-mode=REQUIRED can be used. If you have problems with the connection, you can also try --protocol=TCP.
If you access the database from outside of ETH, it is wise to enable compression with the -C option. Inside ETH (in particular on ihp-pc45), enabling -C is certainly a performance drawback and should be avoided.
Note that the mysql client libraries at ISDC are too old and do not allow for encrypted connections. Thus no connection from ISDC is possible without tunnel. How to tunnel your connection is explained in the following. Note that it requires an account on ihp-pc45 (which I think should not be generally available). Thus this is mainly meant as a solution for automatic processes running at ISDC, for example, to update the database.
Forward Tunnel
If you are logged in at ISDC as 'user' and you have an account 'ethz' at ihp-pc45, you can use a tunnel. To setup a tunnel use
ISDC> ssh -x -C -n -N -q -L 10000:localhost:3306 ethz@ihp-pc45.ethz.ch
(It is wise to enable compression of the connection with the -C option)
Note that after log-in this process seems to stall (nothing happens anymore). This is correct. The tunnel is open. It will forward the local port 10000 from the ISDC machine to the port 3306 on a machine which is accessible as 'locahost' from ihp-pc45.
The mysql call would now look like
> mysql -h 127.0.0.1 -P 10000 -u fact -p factdata
Note that you need to use the IP address instead of localhost, otherwise the mysql client tries to use a socket connection (which will fail). You could also use --protocol=TCP.
As the mysql connection now comes via the loopback interface and not via the external IP, the connection of the mysql client is allowed to be unencrypted.
Backward Tunnel
Assume that you are already logged into ihp-pc45.ethz.ch and want to execute a mysql at ISDC accessing ihp-pc45, a backward tunnel can be used:
ihp-pc45> ssh -x -C -n -N -q -R 10000:localhost:3306 user@isdc-nx.isdc.unige.ch
(It is wise to enable compression of the connection with the -C option)
This command will log you into isdc-nx and (in parallel) create a tunnel from port 10000 at isdc-nx to port 3306 of a machine which is called 'localhost' from where you started the ssh connection (ihp-pc45).
You can now do
ISDC> mysql -h localhost -P 10000 -u fact -p factdata
Note that you need to use the IP address instead of localhost, otherwise the mysql client tries to use a socket connection (which will fail). You could also use --protocol=TCP.
As the mysql connection now comes via the loopback interface and not via the external IP, the connection of the mysql client is allowed to be unencrypted.
rootifysql
A convenient way to retrieve data is the rootifysql tool which is part of the FACT++ package. More details can be found either calling it with the --help option or at https://www.fact-project.org/logbook/showthread.php?tid=4192. The same access rules as if the native mysql client is used apply.
To simplify the usage, it is wise to write a rootifysql.rc file with the following contents:
uri=fact:password@iph-pc45.ethz.ch/factdata
The following tutorial assumes that such a file exists.
Other alternatives
Many possibilities exist to access a mysql database as a C API, MySQL++, Python (MySQL.Connector), PHP and others. You are free to use whatever tool you like. In the following, an analysis will be outlined using the rootifysql client and because it is most convenient.
PhpMyAdmin
To get a fast glimpse on the accessible databases and tables, you can log-in to PhpMyAdmin at http://iph-pc45.ethz.ch/phpmyadmin
The Analyis
Data Selection
For data selection only run-wise information should be relevant which are stored in the table RunInfo. The reason is that if you select data on a more fine grained level (e.g. event-wise zenith angle), right now there is no easy method to determine the corresponding observation time. So whenever data is selected event-wise make sure that you do not cut the data in a variable which cuts out events systematically and not randomly. For example, an event-wise cut in zenith angel usually keeps or discards two consecutive events because their zenith angle is correlated. For a cut in any image parameter (Width, Length, Size, ...), the result on two consecutive events is random because their image parameters are not correlated.
As an example we analyse the Crab data from our public sample (01/11/2013 - 06/11/2013).
Let's first have a look at the total observation time of all Crab data in this period:
SELECT COUNT(*), SUM(TIME_TO_SEC(TIMEDIFF(fRunStop,fRunStart))*fEffectiveOn/3600) AS EffOnTime, MIN(fZenithDistanceMin) AS MinZd, MAX(fZenithDistanceMax) AS MaxZd, MIN(fR750Cor/fR750Ref) AS MinQ, MAX(fR750Cor/fR750Ref) AS MaxQ FROM RunInfo WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000
The result (in mysql) is
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | MinZd | MaxZd | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 435 | 32.53992354 | 6.36 | 67.89 | 0.01477 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.01 sec)
So we have 435 data runs from Crab with a total effective observation time of 32.5 hours in a zenith angle range between 6° and 68° and a bad weather factor between 0.01 (really bad) to 1.1 (extremely good).
Taking only good data by adding "AND fR750Cor/fR750Ref>0.9
" to the WHERE-clause gives us
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 328 | 24.58955887 | 6.36 | 67.86 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec)
But we also want to restrict ourselves to "good" zenith angles (zenith angles at which there is no significnat efficiency loss). So we add "AND fZenithDistanceMax<35
" to the WHERE-clause which yields
+----------+-------------+-------+-------+---------+---------+ | COUNT(*) | EffOnTime | ZdMin | ZdMax | MinQ | MaxQ | +----------+-------------+-------+-------+---------+---------+ | 244 | 19.06608557 | 6.36 | 34.90 | 0.90084 | 1.10584 | +----------+-------------+-------+-------+---------+---------+ 1 row in set (0.00 sec)
Now we need to get a list of these runs with
SELECT FileId FROM RunInfo WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fR750Ref/fR750Cor>0.9 AND fZenithDistanceMax<35
This can later be JOINed with the following queries.
Let's write the list of runs into a file. There are plenty of options. Here are a few (assuming the query is in a file query.sql)
mysql -N [...] < query.sql > Crab.txt cat query.sql | mysql -N [...] > Crab.txt rootifysql [...] -n -v0 -d query.sql > Crab.txt rootifysql [...] -n -w Crab.txt query.sql
The [...] is placeholder for additional options, in particular the login credentials (ideally they are kept in a file which can not be read by everyone).
An alternative is to put
#!/path/to/rootifysql --config=/path/to/resources.rc
in the first line of your query.sql. Make it executable "chmod u+x query.sql
" and put your credentials (uri=) into resources.rc. Now you can call it directly
./query.sql -n -w Crab.txt
Data statistics
Now we want to get some statistics about the Crab data between 01/11/2013 and 06/11/2013 and see if we can do simple plots. For this we write the table into a root file.
rootifysql [...] --query " \ SELECT \ * \ FROM \ RunInfo \ WHERE \ fSourceKey=5 \ AND \ fRunTypeKey=1 \ AND \ FileId BETWEEN 131101000 AND 131107000 \ AND \ fR750Ref/fR750Cor>0.9 \ AND \ fZenithDistanceMax<35 \ "
If the file already exists, either give it a different name (see --help for details) or overwrite it with --force.
The output should look similar to this
Reading global options from 'fact++.rc'. Reading default options from 'rootifysql.rc'. ------------------------ Rootify SQL ------------------------- Connecting to database... Client Version: 5.7.23 Server Version: 5.7.23-0ubuntu0.18.04.1 Requesting data... Opening file 'rootify.root' [compression=1]... Trying to setup 120 branches... Configured 115 branches. Filling branches... 317 rows fetched. 317 rows skipped due to NULL field. 0 rows filled into tree. 10 kB written to disk. File closed. Execution time: 0.0537751s (169.6 us/row) --------------------------------------------------------------
Per default rows which contain any NULL are not written to the file because all values are converted to a DOUBLE and there is no representation for a NULL-value in double. So, we need to force the output with --ignore-null
and will get something like:
Reading global options from 'fact++.rc'. Reading default options from 'rootifysql.rc'. ------------------------ Rootify SQL ------------------------- Connecting to database... Client Version: 5.7.23 Server Version: 5.7.23-0ubuntu0.18.04.1 Requesting data... Opening file 'rootify.root' [compression=1]... Trying to setup 120 branches... Configured 115 branches. Filling branches... 317 rows fetched. 317 rows filled into tree. 86 kB written to disk. File closed. Execution time: 0.072247s (227.9 us/row) --------------------------------------------------------------
Now we can open the file in root and do plots. The easiest ist to use the tree viewer:
root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->StartViewer(); root [3]
Or plot the zenith angle distribution directly:
root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->Draw("fZenithDistanceMean"); root [3]
or zenith angle versus time (Hint: DATETIME columns are converted to Unix-time in seconds):
root rootify.root root [0] Attaching file rootify.root as _file0... root [1] TTree *T = (TTree*)_file0->Get("Result"); root [2] T->Draw("fZenithDistanceMean:fRunStart"); root [3]
Data retrieval
The events themselves are stored in a table named Events. The position of the source in camera coordinates is stored in Position. To get them, you can run the following query
SELECT Events.*, Position.X, Position.Y FROM RunInfo LEFT JOIN Events USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Ref/fR750Cor>0.9
or with the list you wrote before
SELECT Events.*, Position.X, Position.Y LEFT JOIN Events USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) WHERE FileId IN ($MyList)
using --list.MyList=Crab.txt
as command-line option to rootifysql. Both queries are similar in execution time.
Let's assume the output file is crab-data-only.root (rootifysql --out=crab-data-only.root
). Requesting the data and writing the file took me about 60s.
To run an analysis on the data you can use the following root macro "ganymed.C". It produces a theta-square plot. Its execution took about 5s (root ganymed.C++
)
#include <iostream> #include <TMath.h> #include <TH1.h> #include <TChain.h> #include <TStopwatch.h> void ganymed() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("crab-data-only.root"); // Define variables for all leaves to be accessed // By definition rootifysql writes only doubles double X, Y, MeanX, MeanY, Width, Length, CosDelta, SinDelta, M3Long, SlopeLong, Leakage1, SlopeSpreadWeighted, Size, ConcCore, ConcCOG, NumIslands, NumUsedPixels; // Connect the variables to the cordesponding leaves c.SetBranchAddress("X", &X); c.SetBranchAddress("Y", &Y); c.SetBranchAddress("MeanX", &MeanX); c.SetBranchAddress("MeanY", &MeanY); c.SetBranchAddress("Width", &Width); c.SetBranchAddress("Length", &Length); c.SetBranchAddress("CosDelta", &CosDelta); c.SetBranchAddress("SinDelta", &SinDelta); c.SetBranchAddress("M3Long", &M3Long); c.SetBranchAddress("SlopeLong", &SlopeLong); c.SetBranchAddress("Leakage1", &Leakage1); c.SetBranchAddress("NumIslands", &NumIslands); c.SetBranchAddress("NumUsedPixels", &NumUsedPixels); c.SetBranchAddress("SlopeSpreadWeighted", &SlopeSpreadWeighted); c.SetBranchAddress("Size", &Size); c.SetBranchAddress("ConcCOG", &ConcCOG); c.SetBranchAddress("ConcCore", &ConcCore); // Set some constants (they could be included in the database // in the future) double mm2deg = +0.0117193246260285378; double abberation = 1.02; // -------------------- Source dependent parameter calculation ------------------- // Create a histogram for on- and off-data TH1F hon("on", "", 55, 0, 1); TH1F hoff("off", "", 55, 0, 1); // Loop over all events TStopwatch clock; for (int i=0; i<c.GetEntries(); i++) { // read the i-th event from the file c.GetEntry(i); // First calculate all cuts to speedup the analysis double area = TMath::Pi()*Width*Length; bool cutq = NumIslands<3.5 && NumUsedPixels>5.5 && Leakage1<0.1; bool cut0 = log10(area)<log10(Size)*898-1535; if (!cutq || !cut0) continue; // Loop over all wobble positions in the camera for (int angle=0; angle<360; angle+=60) { // -------------------- Source dependent parameter calculation ------------------- double cr = cos(angle*TMath::DegToRad()); double sr = sin(angle*TMath::DegToRad()); double px = cr*X-sr*Y; double py = cr*Y+sr*X; double dx = MeanX - px/abberation; double dy = MeanY - py/abberation; double dist = sqrt(dx*dx + dy*dy); double alpha = asin((CosDelta*dy - SinDelta*dx)/dist); double sgn = TMath::Sign(1., (CosDelta*dx + SinDelta*dy)/dist); // ------------------------------- Application ---------------------------------- double m3l = M3Long*sgn*mm2deg; double slope = SlopeLong*sgn/mm2deg; // --------------------------------- Analysis ----------------------------------- double xi = 1.39252 + 0.154247*slope + 1.67972 *(1-1/(1+4.86232*Leakage1)); double sign1 = m3l+0.07; double sign2 = (dist*mm2deg-0.5)*7.2-slope; double disp = (sign1<0 || sign2<0 ? -xi : xi)*(1-Width/Length)/mm2deg; double thetasq = (disp*disp + dist*dist - 2*disp*dist*cos(alpha))*mm2deg*mm2deg; // Fill the on- and off-histograms if (angle==0) hon.Fill(thetasq); else hoff.Fill(thetasq, 1./5); } } clock.Print(); // Plot the result hon.SetMinimum(0); hon.DrawCopy(); hoff.DrawCopy("same"); }
You can of course include all the calculations into your query already
SELECT Events.*, Angle, weight, @PX := cosa*X - sina*Y, @PY := cosa*Y + sina*X, @DX := MeanX-@PX/1.02, @DY := MeanY-@PY/1.02, @Norm := SQRT(@DX*@DX + @DY*@DY), @Dist := @Norm*0.0117193246260285378 AS Dist, PI()*Width*Length*0.0117193246260285378*0.0117193246260285378 AS Area, @LX := TRUNCATE((CosDelta*@DY - SinDelta*@DX)/@Norm, 6), @LY := TRUNCATE((CosDelta*@DX + SinDelta*@DY)/@Norm, 6), @Alpha := ASIN(@LX) AS Alpha, @Sign := SIGN(@LY) AS Sign, @M3L := M3Long*@Sign*0.0117193246260285378, @Slope := SlopeLong*@Sign/0.0117193246260285378 AS Slope, @Xi := 1.39 + 0.154*@Slope + 1.679*(1-1/(1+4.86*Leakage1)), @Sign1 := @M3L+0.07, @Sign2 := (@Dist-0.5)*7.2-@Slope, @Disp := IF (SIGN(@Sign1)<0 || SIGN(@Sign2)<0, -@Xi, @Xi) * (1-Width/Length), @ThetaSq := (@Disp*@Disp + @Dist*@Dist - 2*@Disp*@Dist*SQRT(1-@LX*@LX)) AS ThetaSq FROM RunInfo LEFT JOIN Events USING (FileId) LEFT JOIN Position USING (FileId, EvtNumber) CROSS JOIN ( SELECT 0 AS Angle UNION ALL SELECT 60 AS Angle UNION ALL SELECT 120 AS Angle UNION ALL SELECT 180 AS Angle UNION ALL SELECT 240 AS Angle UNION ALL SELECT 300 AS Angle ) Wobble WHERE fSourceKey=5 AND fRunTypeKey=1 AND FileId BETWEEN 131101000 AND 131107000 AND fZenithDistanceMax<35 AND fR750Ref/fR750Cor>0.9
Or you use the existing table for the standard 60° Wobble positions and do just CROSS JOIN Wobble
.
This will give you all you need in crab.root (rootifysql --out=crab.root
), but significantly increases computing time and the output file will be about six times larger.
A simple macro just applying all the cuts would then be enough to do a theta-square plot
void ganymed3() { // Create chain for the tree Result // This is just easier than using TFile/TTree TChain c("Result"); // Add the input file to the c.AddFile("crab.root"); // Set some constants (they could be included in the database // in the future) c.SetAlias("mm2deg", "+0.0117193246260285378"); // Define the cuts c.SetAlias("CutQ", "NumIslands<3.5 && NumUsedPixels>5.5 && Leakage1<0.1"); c.SetAlias("Cut0", "log10(Area)<log10(Size)*898-1535"); // Do one plot for each wobble position c.Draw("ThetaSq", "(ThetaSq<1 && CutQ && Cut0 && Angle==0)*( weight)"); c.Draw("ThetaSq", "(ThetaSq<1 && CutQ && Cut0 && Angle!=0)*(-weight)", "same"); }