source: fact/tools/pyscripts/sandbox/dneise/fact_compress/notes.txt

Last change on this file was 14206, checked in by neise, 12 years ago
initial
File size: 3.4 KB
Line 
1uncompressed FITS file:
2
3human readable ASCII HEADER.
480 character lines - no endline character
5
6line which looks like:
7END <space chars fill up to 80>
8
9marks the end of the header, but then
10still I see more empty lines...
11
12funny enough, the first non space characters
13start at adress: 0x00004ec0 = 20160 = 19kB
14(this was for file 20120117_016.fits.gz)
15so it seems, after the 'END ' line the header is filled with
16spaces so, the data starts at the next full kB address.
17
18an event with ROI 1024 has the size:
192952510 bytes
20this is made from the following fieds:
21event_num 4 byte
22Trigger_num 4 byte
23trigger_type 2 byte
24numboards 4 byte
25erorrs 4 x 1 byte
26softtrig 4 byte
27unixtime 2x 4byte
28boardtime 40x 4byte
29startcells 1440x 2byte
30startcellTIM 160x 2byte
31data 1474560 x 2byte = 1440 x 1024 x 2byte
32timeMarker 0x 2byte ????
33
34the sum is 2952510 bytes, so the Timemarker field is really zero bytes.
35the header size is 3390 bytes, while the data size is 2880kB
36So the header is about 1.15 permille of the data.
37
38in case of roi = 300 this is about 4 permille.
39
40For the example of an pedestal file. All uncompressed
41header information is about 3329.5 kb for
42a file size of 2.75Gb.
43
44all the header information should be copied uncompressed
45into the output file.
46**BUT** since the compression algo does deliver
47unequal sized events the size of an event needs to be stored
48as well, so that we can still jump around in the file.
49
50We have two choices:
511
52we store a table of addresses in the header of an event.
53A file might store something like 1000000 events
54and a jump adress is 4 bytes long, so this table
55might be as large as 4 Mb or so,
562
57we prepend each event, with the address of its successor.
58This means in order to find event 10045 one has to read 10044 events first :-(
59
60
61Now the compression algorithm should work like this:
62open a calibration file and retrieve only the mean offset
63data. convert back to ADC counts.
64size of offset = 1440 x 1024 x 2 byte = 2880kB
65
66subtract offset from raw data.
67
68calculate (signed) diffs of subsequent slices.
69analyse diffs:
70 find groups of |diffs| < 127
71 find groups of |diffs| < 31
72define which groups are ideal :-)
73
74store compression group header:
75start value: 2byte
76size of followers = 4bits, 6 bits, 8 bits, or 16bits. --> 1byte
77number of followers = 0...1023 --> 2 bytes
78
79The last follower should maybe be the start value
80of the next group, so we can check for sanity.
81but on the other hand, this is just stupid.
82
83shift diff bits together.
84
85copy diff bits of followers behind the group header.
86
87The grouping algo should be smart, so that there are not
88a lot of small groups. There will be always some
89spikes of jumps in the data, so we can't assume to survive
90with only one group.
91
92I guess it will look like this
935-byte header: 16bit group 5 followers
9410byte following.
955-byte header: 6bit group 600 followers
96450 byte following.
975-byte header: 16bit group 3 followers -- spike
986byte following
995-byte header: 8bit group 80 followers
10080 byte following.
1015-byte header: 16bit group 7 followers -- jump
10214byte following
1035-byte header: 6bit group 323 followers
104243 byte following.
105
106total event size: 833 bytes
107this is 41% of the uncompressed file size.
108
109thus one could end at 1.2GB compared to 1.7GB,
110
111but this has to be checked of course, and therefor
112the grouping algo needs to be implemented.
113
114shit
115
116
117
118
Note: See TracBrowser for help on using the repository browser.