Index: fact/tools/pyscripts/sandbox/dneise/fact_compress/basic.py
===================================================================
--- fact/tools/pyscripts/sandbox/dneise/fact_compress/basic.py	(revision 14206)
+++ fact/tools/pyscripts/sandbox/dneise/fact_compress/basic.py	(revision 14206)
@@ -0,0 +1,36 @@
+#!/usr/bin/python -tti
+#
+# 
+
+from pyfact   import RawData
+import sys
+from ROOT import TFile, TCanvas, TH2F, TTree, TStyle, TObject
+import numpy as np
+
+data_filename = '/media/DomsStick/20120223_205.fits.gz'
+calib_filename = '/media/DomsStick/20120223_206.drs.fits.gz'
+
+run = RawData(data_filename, calib_filename, use_CalFactFits = False, do_calibration = False)
+offset = run.blm / (2000./4096.)
+offset = offset.astype(int)
+roi = run.nroi
+npix = run.npix
+
+rootfile = TFile('test.root', "RECREATE")
+h = TH2F('h', 'diffs', npix, -0.5, npix-0.5, 2401, -1200.5, 1200.5)
+
+for event in run:
+    index = event['event_id'].value
+    print index, '/', run.nevents
+    data = event['data']
+    
+    cal_data = data.copy()
+    
+    for pixel in range(npix):
+        sc = event['start_cells'][pixel]
+        cal_data[pixel,:] -= offset[pixel,sc:sc+roi]
+        for d in np.diff(cal_data[pixel,:]):
+            h.Fill(pixel, d)
+
+h.Write()
+rootfile.Close()
Index: fact/tools/pyscripts/sandbox/dneise/fact_compress/notes.txt
===================================================================
--- fact/tools/pyscripts/sandbox/dneise/fact_compress/notes.txt	(revision 14206)
+++ fact/tools/pyscripts/sandbox/dneise/fact_compress/notes.txt	(revision 14206)
@@ -0,0 +1,118 @@
+uncompressed FITS file:
+
+human readable ASCII HEADER.
+80 character lines - no endline character
+
+line which looks like:
+END   <space chars fill up to 80>
+
+marks the end of the header, but then 
+still I see more empty lines...
+
+funny enough, the first non space characters
+start at adress: 0x00004ec0 = 20160 = 19kB
+(this was for file 20120117_016.fits.gz)
+so it seems, after the 'END    ' line the header is filled with
+spaces so, the data starts at the next full kB address.
+
+an event with ROI 1024 has the size:
+2952510 bytes
+this is made from the following fieds:
+event_num       4 byte
+Trigger_num     4 byte
+trigger_type    2 byte
+numboards       4 byte
+erorrs          4 x 1 byte
+softtrig        4 byte
+unixtime        2x 4byte
+boardtime       40x 4byte 
+startcells      1440x 2byte
+startcellTIM    160x 2byte
+data            1474560 x 2byte  = 1440 x 1024 x 2byte
+timeMarker      0x 2byte ????
+
+the sum is 2952510 bytes, so the Timemarker field is really zero bytes.
+the header size is 3390 bytes, while the data size is 2880kB
+So the header is about 1.15 permille of the data.
+
+in case of roi = 300 this is about 4 permille.
+
+For the example of an pedestal file. All uncompressed 
+header information is about 3329.5 kb for 
+a file size of 2.75Gb.
+
+all the header information should be copied uncompressed
+into the output file.
+**BUT** since the compression algo does deliver 
+unequal sized events the size of an event needs to be stored
+as well, so that we can still jump around in the file.
+
+We have two choices:
+1
+we store a table of addresses in the header of an event.
+A file might store something like 1000000 events 
+and a jump adress is 4 bytes long, so this table 
+might be as large as 4 Mb or so,
+2
+we prepend each event, with the address of its successor.
+This means in order to find event 10045 one has to read 10044 events first :-(
+
+
+Now the compression algorithm should work like this:
+open a calibration file and retrieve only the mean offset 
+data. convert back to ADC counts.
+size of offset = 1440 x 1024 x 2 byte = 2880kB 
+
+subtract offset from raw data.
+
+calculate (signed) diffs of subsequent slices.
+analyse diffs:
+    find groups of |diffs| < 127
+    find groups of |diffs| < 31
+define which groups are ideal :-)
+
+store compression group header:
+start value: 2byte
+size of followers = 4bits, 6 bits, 8 bits, or 16bits. --> 1byte
+number of followers = 0...1023 --> 2 bytes
+
+The last follower should maybe be the start value
+of the next group, so we can check for sanity.
+but on the other hand, this is just stupid.
+
+shift diff bits together.
+
+copy diff bits of followers behind the group header.
+
+The grouping algo should be smart, so that there are not 
+a lot of small groups. There will be always some 
+spikes of jumps in the data, so we can't assume to survive 
+with only one group.
+
+I guess it will look like this
+5-byte header: 16bit group 5 followers
+10byte following.
+5-byte header: 6bit group 600 followers
+450 byte following.
+5-byte header: 16bit group 3 followers -- spike
+6byte following
+5-byte header: 8bit group 80 followers
+80 byte following.
+5-byte header: 16bit group 7 followers -- jump
+14byte following
+5-byte header: 6bit group 323 followers
+243 byte following.
+
+total event size: 833 bytes
+this is 41% of the uncompressed file size.
+
+thus one could end at 1.2GB compared to 1.7GB,
+
+but this has to be checked of course, and therefor
+the grouping algo needs to be implemented.
+
+shit
+
+
+
+
