Commit ac61c072 authored by Tomek Mrugalski's avatar Tomek Mrugalski 🛰
Browse files

[5112] Several text corrections

parent bf57ac56
......@@ -9,14 +9,14 @@
@section parserIntro Parser background
Kea's format of choice is JSON, which is used in configuration files, in the
command channel and also when communicating between DHCP servers and DHCP-DDNS
component. It is almost certain that it will be used as the syntax for any
upcoming features.
Kea's data format of choice is JSON (https://tools.ietf.org/html/rfc7159), which
is used in configuration files, in the command channel and also when
communicating between DHCP servers and DHCP-DDNS component. It is almost certain
it will be used as the data format for any new features.
Historically, Kea used @ref isc::data::Element::fromJSON and @ref
isc::data::Element::fromJSONFile methods to parse received data that is expected
to be in JSON syntax. This in-house parser was developed back in early BIND10
to be in JSON syntax. This in-house parser was developed back in the early BIND10
days. Its two main advantages were that it didn't have any external dependencies
and that it was already available in the source tree when the Kea project
started. On the other hand, it was very difficult to modify (several attempts to
......@@ -49,9 +49,9 @@ and here: http://kea.isc.org/wiki/SimpleParser.
To solve the issue of phase 1 mentioned earlier, a new parser has been developed
that is based on flex and bison tools. The following text uses DHCPv6 as an
example, but the same principle applies to DHCPv4 and D2 and CA will likely to
follow. The new parser consists of two core elements (the following description
is slightly oversimplified to convey the intent, more detailed description
is available in the following sections):
follow. The new parser consists of two core elements with a wrapper around them
(the following description is slightly oversimplified to convey the intent, more
detailed description is available in the following sections):
-# Flex lexer (src/bin/dhcp6/dhcp6_lexer.ll) that is essentially a set of
regular expressions with C++ code that creates new tokens that represent whatever
......@@ -87,20 +87,23 @@ is available in the following sections):
(a token with a value of 100), RCURLY_BRACKET, RCURLY_BRACKET, END
-# Parser context. As there is some information that needs to be passed between
parser and lexer, @ref isc::dhcp::Parser6Context is a convenient to wrapper
parser and lexer, @ref isc::dhcp::Parser6Context is a convenience wrapper
around those two bundled together. It also works as a nice encapsulation,
hiding all the flex/bison details underneath.
@section parserBuild Building flex/bison code
The only input file used by flex is the .ll file. The only input file used
by bison is the .yy file. When processed, those two tools will generate
a number of .hh and .cc files. The major ones are names the same as their
.ll and .yy counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h),
but there's a number of additional files created: location.hh, position.hh
and stack.hh. Those are internal bison headers that are needed. To avoid every
user to have flex and bison installed, we chose to generate the files and
add them to the Kea repository. To generate those files, do the following:
The only input file used by flex is the .ll file. The only input file used by
bison is the .yy file. When making changes to the lexer or parser, only those
two files are edited. When processed, those two tools will generate a number of
.hh and .cc files. The major ones are named the same as their .ll and .yy
counterparts (e.g. dhcp6_lexer.cc, dhcp6_parser.cc and dhcp6_parser.h), but
there's a number of additional files created: location.hh, position.hh and
stack.hh. Those are internal bison headers that are needed for compilation.
To avoid every user to have flex and bison installed, we chose to generate the
files and add them to the Kea repository. To generate those files, do the
following:
@code
./configure --enable-generate-parser
......@@ -120,7 +123,9 @@ generated may be different and cause unnecessarily large diffs, may cause
coverity/cpp-check issues appear and disappear and cause general unhappiness.
To avoid those problems, we will introduce a requirement to generate flex/bison
files on one dedicated machine. This machine will likely be docs. Currently Ops
is working on installing the necessary versions of flex/bison required
is working on installing the necessary versions of flex/bison required, but
for the time being we can use the versions installed in Francis' home directory
(export PATH=/home/fdupont/bin:$PATH).
Note: the above applies only to the code being merged on master. It is probably
ok to generate the files on your development branch with whatever version you
......@@ -145,10 +150,10 @@ documented, but the docs for it may be a bit cryptic. When developing new
parsers, it's best to start by copying whatever we have for DHCPv6 and tweak as
needed.
Second addition are flex conditions. They're defined with %x and they define a
Second addition are flex conditions. They're defined with %%x and they define a
state of the lexer. A good example of a state may be comment. Once the lexer
detects that a comment has started, it switches to certain condition (by calling
BEGIN(COMMENT) for example) and the code should ignore whatever follows
detects that a comment's beginning, it switches to a certain condition (by calling
BEGIN(COMMENT) for example) and the code then ignores whatever follows
(especially strings that look like valid tokens) until the comment is closed
(when it returns to the default condition by calling BEGIN(INITIAL)). This is
something that is not frequently used and the only use cases for it are the
......@@ -157,7 +162,7 @@ forementioned comments and file inclusions.
Second addition are parser contexts. Let's assume we have a parser that uses
"ip-address" regexp that would return IP_ADDRESS token. Whenever we want to
allow "ip-address", the grammar allows IP_ADDRESS token to appear. When the
lexer is called, it will match the regexp, will generate IP_ADDRESS token and
lexer is called, it will match the regexp, will generate the IP_ADDRESS token and
the parser will carry out its duty. This works fine as long as you have very
specific grammar that defines everything. Sadly, that's not the case in DHCP as
we have hooks. Hook libraries can have parameters that are defined by third
......@@ -193,7 +198,7 @@ in src/bin/dhcp6/dhcp6_parser.yy. Here's a simplified excerpt of it:
dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;
// This defines all parameters that may appear in the Dhcp6 object.
// It can either contain a global_param (defined below) or a
// It can either contain a global_param (defined below) or a
// global_params list, followed by a comma followed by a global_param.
// Note this definition is recursive and can expand to a single
// instance of global_param or multiple instances separated by commas.
......@@ -201,7 +206,7 @@ dhcp6_object: DHCP6 COLON LCURLY_BRACKET global_params RCURLY_BRACKET;
global_params: global_param
| global_params COMMA global_param
;
// These are the parameters that are allowed in the top-level for
// Dhcp6.
global_param: preferred_lifetime
......@@ -222,9 +227,9 @@ global_param: preferred_lifetime
| server_id
| dhcp4o6_port
;
renew_timer: RENEW_TIMER COLON INTEGER;
// Many other definitions follow.
@endcode
......@@ -244,7 +249,7 @@ rule.
The "leaf" rules that don't contain any other rules, must be defined by a
series of tokens. An example of such a rule is renew_timer above. It is defined
as a series of 3 tokens: RENEW_TIMER, COLON and INTEGER.
as a series of 3 tokens: RENEW_TIMER, COLON and INTEGER.
Speaking of integers, it is worth noting that some tokens can have values. Those
values are defined using %token clause. For example, dhcp6_parser.yy has the
......@@ -272,7 +277,7 @@ renew_timer with some extra code:
@code
renew_timer: RENEW_TIMER {
cout << "renew-timer token detected, so far so good" << endl;
} COLON {
} COLON {
cout << "colon detected!" << endl;
} INTEGER {
uint32_t timer = $3;
......@@ -298,11 +303,11 @@ ncr_protocol: NCR_PROTOCOL {
ctx.enter(ctx.NCR_PROTOCOL); (1)
} COLON ncr_protocol_value {
ctx.stack_.back()->set("ncr-protocol", $4); (3)
ctx.leave();
ctx.leave(); (4)
};
ncr_protocol_value:
UDP { $$ = ElementPtr(new StringElement("UDP", ctx.loc2pos(@1))); }
UDP { $$ = ElementPtr(new StringElement("UDP", ctx.loc2pos(@1))); }
| TCP { $$ = ElementPtr(new StringElement("TCP", ctx.loc2pos(@1))); } (2)
;
@endcode
......@@ -358,8 +363,8 @@ The first line creates an instance of IntElement with a value of the token. The
second line adds it to the current map (current = the last on the stack). This
approach has a very nice property of being generic. This rule can be referenced
from global and subnet scope (and possibly other scopes as well) and the code
will add the IntElement object to whatever is last on the stack, be it
global, subnet or perhaps even something else (maybe we will allow preferred
will add the IntElement object to whatever is last on the stack, be it global,
subnet or perhaps even something else (maybe one day we will allow preferred
lifetime to be defined on a per pool or per host basis?).
@section parserSubgrammar Parsing partial grammar
......@@ -385,6 +390,9 @@ This trick is also implemented in the lexer. There's a flag called start_token_f
When initially set to true, it will cause the lexer to emit an artificial
token once, before parsing any input whatsoever.
This optional feature can be skipped altogether if you don't plan to parse parts
of the configuration.
@section parserBisonExtend Extending grammar
Adding new parameters to existing parsers is very easy once you get hold of the
......@@ -402,7 +410,7 @@ Here's the complete set of necessary changes.
@code
SUBNET_4O6_INTERFACE_ID "4o6-interface-id"
@endcode
This defines a token called SUBNET_4O6_INTERFACE_ID that, when needed to
This defines a token called SUBNET_4O6_INTERFACE_ID that, when needed to
be printed, will be represented as "4o6-interface-id".
2. Tell lexer how to recognize the new parameter:
......@@ -439,7 +447,7 @@ Here's the complete set of necessary changes.
weird that happens to match our reserved keywords. Therefore we switch to
no keyword context. This tells the lexer to interpret everything as string,
integer or float.
4. Finally, extend the existing subnet4_param that defines all allowed parameters
in Subnet4 scope to also cover our new parameter (the new line marked with *):
@code
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment