3 rabin \- rabin fingerprinting
5 .EX
6 include "rabin.m";
7 rabin := load Rabin Rabin->PATH;
8 Rcfg, Rfile: import rabin;
10 init: fn(bufio: Bufio);
11 open: fn(rcfg: ref Rcfg, b: ref Iobuf, min, max: int): (ref Rfile, string);
13 Rcfg: adt {
14  mk: fn(prime, width, mod: int): (ref Rcfg, string);
15 };
17 Rfile: adt {
18  read: fn(r: self ref Rfile): (array of byte, big, string);
19 };
20 .EE
22 .B Rabin
23 implements a data fingerprinting algorithm. A rolling checksum is calculated while reading data. Certain checksum values are taken to be data boundaries and used for splitting the data into chunks.
24 .PP
25 .B Rcfg
26 represents the parameters to the algorithm,
27 .B
28 creates a new instance.
29 .I Prime
30 should be a prime number.
31 .I Width
32 is the width of the rolling checksum window in bytes. A wider window results in more diverse boundary patterns. A window of 30 bytes should be reasonable for most uses.
33 .I Mod
34 effectively sets the mean desired chunk size. The rolling checksum is calculated modulo
35 .IR mod .
36 All three parameters influence where chunk boundaries will be found.
37 .PP
38 .B Rfile
39 represents a file to read chunks from.
40 .B Open
41 returns an initialised Rfile or an error string.
42 .I Min
43 and
44 .I max
45 are the minimum and maximum size in bytes of chunks that will be returned. Only the last chunk in a file can be smaller than the minimum chunk size. Note that the mean chunk size may be off due to these parameters.
46 Data is read from
47 .B Iobuf
48 .IR b .
49 .B
50 returns subsequent chunks of data and the file offset at which they were found, or an error message. After end of file, the returned chunks are zero bytes long.
52 .B /appl/lib/rabin.b