1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
|
# faultreiber
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fbloodstalker%2Ffaultreiber.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fbloodstalker%2Ffaultreiber?ref=badge_shield)
`faultreiber` generates a parser library in C for a structured (binary) file format. The input is an XML file that describes the format.<br/>
The C source code will be in the form of multiple source and header files.<br/>
The headers have header guards and are already `extern "C"`ed.<br/>
## Demo
For a practical example, look at the example XML file under `resources`. The XML file describes the format of a WASM object file:<br/>
To run the demo, run `run.sh`, go to the `test` direcotory and run `make test` to run the executable.<br/>
To run `valgrind --leak-check=yes` run `make valgrind`.<br/>
## Memory Leaks
Code generated by faultreiber should not leak any memory if everything went according to plan during code-gen. If that's not the case let me knkow.<br/>
## How to Use
A function named `read_aggr_{name}` will be generated that takes an `int _fd` file descriptor for the file that it will read.`{name}` is what you pass to faultreiber with the `--name` option.<br/>
The return type will be a C structure with type `{name}_lib_ret_t`. The struct is defined as:<br/>
```C
typedef struct {
name_obj_t obj;
void** void_train;
uint64_t* current_void_size;
uint64_t* current_void_count;
}name_lib_ret_t
```
`{name}_obj_t` is a C structure defined in `aggregate.h` that holds all the read modules.<br/>
A function named `release_all_{name}` will be generated in `aggregate.c` that releases almost all the memory.<br/>
The proper order of realeasing the memory in the client code will be like below assuming the return value of `read_aggr_{name}` is stored in `lib_ret` and `--name` was passed a value of `wasm`:<br/>
```C
release_all_wasm(lib_ret->void_train, lib_ret->current_void_count);
free(lib_ret->obj);
free(lib_ret);
```
## faultreiber XML file
The root node should have two childs, named exactly `READ` and `DEFINITION`(order not important).<br/>
The `READ` node will include the actual structures that the parser will read and can return.<br/>
The `DEFINITION` node includes the definitions for the structures that are aggregate.<br/>
## Rules:
Any child node of either `DEFINITION` or `READ` will have to at least have the attributes `name` and `type` defined. The presence of the attribute `count` is optional but if it's not present faultreiber will assume that the count is one.<br/>
The presence of the attribute `isaggregate` signifies the fact that the data structure is composed of other smaller parts. faultreiber will only read the children of a node that is the child of either the `DEFINITION` or `READ` node(unless a child node has the attribute `conditional` set). If a data structure requires more children then you should add a new node under `DEFINITION` and reference that node from it's parent. In other words, an aggregate node can't itself have child nodes that are aggregate.<br/>
`count`, `size`, `type` and `condition` attributes can reference a child node of the `DEFINITION` node. To do that, you should use `self::TAG`.<br/>
the tag names of the nodes that are on the same level should be unique. The `name` attribute of the nodes on the same level need to be unique as well.<br/>
The order of the nodes that appear as children of the `DEFINITION` node, even when the child nodes are referencing each other, is unimportant to faultreiber.<br/>
Tags should follow the naming convention for naming XML nodes. The `name` attributes should follow the C identifier naming convention(if the value of the `name` attribute is invalid in C as as identifier you're going to end up with code that won't even build).<br/>
The following values are valid values for the `type` attribute:<br/>
* int8
* uint8
* int16
* uint16
* int32
* uint32
* int64
* uint64
* int128
* uint128
* float
* double
* string
* FT::conditional
* self::TAG
For string nodes, the node should either have a non-empty `size` attribute or have a `delimiter` attribute. In case a `delimiter` attribute is selected the value of the delimiter should be provided as the value of the `delimiter` attribute to the node.<br/>
Strings read through a `delimiter` node will have their delimiter attached to the end of the string(null-terminated or otherwise). String reads that have a `size` attribute will be forcefully null-terminated even if the original string was not null-terminated.<br/>
Child nodes of `READ` node that have the `unordered` attribute set, will be regarded as such, meaning they can appear in the file sporaically. Such nodes will have to have a child node with attriute `sign`.The value of the sign attribute is used to check for the presence of the parent node in the file.<br/>
`unorderedbegin` and `unorderedend` attributes denote the begenning and end of an unordered section in the `READ` node. For every unordered section, only one node needs to define the begin and end attributes. All the other nodes, including the nodes that define the `unorderedbegin` and `unorderedend` attributes, shall have the `unordered` attribute defined.<br/>
Any child of the `READ` node that is not inside an unordered block or doesnt have the `unordered` attribute set, will be regarded as ordered.<br/>
Whether `int128` or `uint128` are defined depends on your the C implementation you are using on your host. If 128-bit integers are not supported or you need to read in bigger integers, you can simply use a smaller int type and increase the `count` attribute accordingly.<br/>
The `FT::conditional` tag for a type means that the actual content of the node will depend on a value. The attribute `condition` will provide what that condition is. The value for the condition should be provided as text for the different nodes that define what the actual contents should be.<br/>
`size` attribute is currently only meaningful when the `type` attribute is set as `string` in which case it denotes the size of the string.<br/>
## Options
```bash
-h, --help show this help message and exit
--targetname TARGETNAME
main target name
--outdir OUTDIR path to output dir
--structs STRUCTS the structs json file
--structsinclude STRUCTSINCLUDE
the path to the header that's going to be included by
structs.h before structure declarations.
--xml XML paht to the xml file
--dbg debug
--datetime print date and time in autogen files
--inline inlines reader funcs
--static statics reader funcs
--verbose verbose
--forcenullterm terminate all strings with null even if they are not
originally null-terminated
--strbuffersize STRBUFFERSIZE
the size of the buffer for string reads
--strbuffgrowfactor STRBUFFGROWFACTOR
the factor by which the strbuffer will grow
--voidbuffersize VOIDBUFFERSIZE
the size of the buffer for void* buffer
--voidbuffgrowfactor VOIDBUFFGROWFACTOR
the factor by which the voidbuffer will grow
--singlefile the generated code will be put in a single file
--singlefilename SINGLEFILENAME
name of the single file
--name will be used in generating some code identifiers
```
## limitations
Big-Endian reads are not supported.<br/>
None-byte-sized raw reads are not supported.<br/>
## makefile
That would be on you but there is an example makefile in the `test` directory so you can take a look if you want.<br/>
You can also get generic ones from [here](https://github.com/bloodstalker/lazymakefiles). They're licensed under the Unlicense.<br/>
## TODO
All the items under limitations.<br/>
Figure out what the license of the generated code is.<br/>
## Projects
The list of the projects that use faulreiber:<br/>
* [bruiser](https://github.com/bloodstalker/mutator/tree/master/bruiser)<br/>
## License
`faultreiber` is provided under MIT. I'm assuming(I'm not a lawyer) the generated code is considered "derived work". If it is, then the generated code will also fall under MIT.<br/>
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fbloodstalker%2Ffaultreiber.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Fbloodstalker%2Ffaultreiber?ref=badge_large)
|