ipc-high.txt 14.8 KB
Newer Older
1 2 3 4 5 6
The IPC protocol
================

While the cc-protocol.txt describes the low-level primitives, here we
describe how the whole IPC should work and how to use it.

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Definitions
-----------

system::
  The system that moves data between the users and does bookkeeping.
  In our current implementation, it is implemented as the MsgQ daemon,
  which the users connect to and it routes the data.
user::
  Usually a process; generally an entity that wants to communicate
  with the other users.
session::
  Session is the interface by which the user communicates with the
  system. Single user may have multiple sessions, a session belongs to
  single user.
message::
  A data blob sent by one user. The recipient might be the system
23 24 25
  itself, other session or set of sessions (called group, see below,
  it is possibly empty). Message is either a response or an original
  message (TODO: Better name?).
26 27 28 29 30 31
group::
  A named set of sessions. Conceptually, all the possible groups
  exist, there's no explicit creation and deletion of groups.
session id::
  Unique identifier of a session. It is not reused for the whole
  lifetime of the system. Historically called `lname` in the code.
32
undelivery signal::
33
  While sending an original message, a client may request an
34
  undelivery signal. If the recipient specification yields no
35 36 37 38 39 40 41 42 43
  sessions to deliver the message to, the system informs user about
  the situation.
sequence number::
  Each message sent through the system carries a sequence number. The
  number should be unique per sender. It can be used to pair a
  response to the original message, since the response specifies which
  sequence number had the message it response to. Even responses and
  messages not expecting answer have their sequence number, but it is
  generally unused.
44 45 46 47 48 49 50 51 52
non-blocking operation::
  Operation that will complete without waiting for anything.
fast operation::
  Operation that may wait for other process, but only for a very short
  time. Generally, this includes communication between the user and
  system, but not between two clients. It can be expected to be fast
  enough to use this inside an interactive session, but may be too
  heavy in the middle of query processing, for example. Every
  non-blocking operation is considered fast.
53 54

The session
55 56
-----------

57 58 59 60 61 62 63
The session interface allows for several operations interacting with
the system. In the code, it is represented by a class.

Possible operations include:

Opening a session::
  The session is created and connects to the system. This operation is
64
  fast. The session receives session id from the system.
65 66 67

Group management::
  A user may subscribe (become member) of a group, or unsubscribe from
68
  a group. These are fast operations.
69 70 71

Send::
  A user may send a message, addressed to the system, or other
72 73 74
  session(s). This operation is expected to be non-blocking
  (current implementation is based on assumption of how OS handles the
  sends, which may need to be revisited if it turns out to be false).
75

76 77 78 79
Receive synchronously::
  User may wait for an incoming message in blocking mode. It is
  possible to specify the kind of message to wait for, either original
  message or response to a message. This interface has a timeout.
80

81 82 83 84
Receive asynchronously::
  Similar to previous, but non-blocking. It terminates immediately.
  The user provides a callback that is invoked when the requested
  message arrives.
85

86 87 88
Terminate::
  A session may be terminated. No more messages are sent or received
  over it, the session is automatically unsubscribed from all the
89 90
  groups. This operation is non-blocking. A session is terminated
  automatically if the user exits.
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105

Assumptions
-----------

We assume reliability and order of delivery. Messages sent from user A
to B are all delivered unchanged in original order as long as B
exists.

All above operations are expected to always succeed. If there's an
error reported, it should be considered fatal and user should
exit. In case a user still wants to continue, the session must be
considered terminated and a new one must be created. Care must be
taken not to use any information obtained from the previous session,
since the state in other users and the system may have changed during
the reconnect.
106

107 108 109
Addressing
----------

110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
Addressing happens in three ways:

By group name::
  The message is routed to all the sessions subscribed to this group.
  It is legal to address an empty group; such message is then
  delivered to no sessions.
By session ID::
  The message is sent to the single session, if it is still alive.
By an alias::
  A session may have any number of aliases - well known names. Only
  single session may hold given alias (but it is not yet enforced by
  the system). The message is delivered to the one session owning the
  alias, if any. Internally, the aliases are implemented as groups
  with single subscribed session, so it is the same as the first
  option on the protocol level, but semantically it is different.

The system
----------

The system performs these goals:

 * Maintains the open sessions and allows creating new ones.
 * Keeps information about groups and which sessions are subscribed to
   which group.
 * Routes the messages between users.

Also, the system itself is a user of the system. It can be reached by
the alias `Msgq` and provides following high-level services (see
below):

Notifications about sessions::
  When a session is opened to the system or when a session is
  terminated, a notification is sent to interested users. The
  notification contains the session ID of the session in question.
  The termination notification is probably more useful (if a user
  communicated with a given session before, it might be interested it
  is no longer available), the opening notification is provided mostly
  for completeness.
Notifications about group subscriptions::
  When a session subscribes to a group or unsubscribes from a group, a
  notification is sent to interested users. The notification contains
  both the session ID of the session subscribing/unsubscribing and
152 153
  name of the group. This includes notifications about aliases (since
  aliases are groups internally).
154 155 156 157 158 159 160
Commands to list sessions::
  There's a command to list session IDs of all currently opened sessions
  and a command to list session IDs of all sessions subscribed to a
  given group. Note that using these lists might need some care, as
  the information might be outdated at the time it is delivered to the
  user.

161 162 163 164 165 166 167 168 169
User shows interest in notifications about sessions and group
subscriptions by subscribing to a group with well-known name (as with
any notification).

Note that due to implementation details, the `Msgq` alias is not yet
available during early stage of the bootstrap of bind10 system. This
means some very core services can't rely on the above services of the
system. The alias is guaranteed to be working before the first
non-core module is started.
170 171 172 173 174 175 176 177 178 179

Higher-level services
---------------------

While the system is able to send any kind of data, the payload sent by
users in bind10 is structured data encoded as JSON. The messages sent
are of three general types:

Command::
  A message sent to single destination, with the undeliverable
180
  signal turned on and expecting an answer. This is a request
181 182 183 184 185 186 187 188 189 190 191
  to perform some operation on the recipient (it can have side effects
  or not). The command is identified by a name and it can have
  parameters. A command with the same name may behave differently (or
  have different parameters) on different receiving users.
Reply::
  An answer to the `Command`. It is sent directly to the session where
  the command originated from, does not expect further answer and the
  undeliverable notification is not set. It either confirms the
  command was run successfully and contains an optional result, or
  notifies the sender of failure to run the command. Success and
  failure differ only in the payload sent through the system, not in
192
  the way it is sent. The undeliverable signal is failure
193 194 195 196 197 198 199 200 201
  reply sent by the system on behalf of the missing recipient.
Notification::
  A message sent to any number of destinations (eg. sent to a group),
  not expecting an answer. It notifies other users about an event or
  change of state.

Details of the higher-level
---------------------------

202 203 204
While there are libraries implementing the communication in convenient
way, it is useful to know what happens inside.

205 206 207 208 209 210 211 212
The notifications are probably the simplest. Users interested in
receiving notifications of some family subscribe to corresponding
group. Then, a client sends a message to the group. For example, if
clients `receiver-A` and `receiver-B` want to receive notifications
about changes to zone data, they'd subscribe to the
`Notifications/ZoneUpdates` group. Then, other client (let's say
`XfrIn`, with session ID `s12345`) would send something like:

213
  s12345 -> notifications/ZoneUpdates
214 215 216 217 218 219 220 221 222 223
  {"notification": ["zone-update", {
      "class": "IN",
      "origin": "example.org.",
      "serial": 123456
  }]}

Both receivers would receive the message and know that the
`example.org` zone is now at version 123456. Note that multiple users
may produce the same kind of notification. Also, single group may be
used to send multiple notification names (but they should be related;
224
in our example, the `notifications/ZoneUpdates` could be used for
225 226 227 228 229 230 231 232
`zone-update`, `zone-available` and `zone-unavailable` notifications
for change in zone data, configuration of new zone in the system and
removal of a zone from configuration).

Sending a command to single recipient is slightly more complex. The
sending user sends a message to the receiving one, addressed either by
session ID or by an alias (group to which at most one session may be
subscribed). The message contains the name of the command and
233
parameters. It is sent with the undeliverable signals turned on.
234 235 236 237 238 239
The user also starts a timer (with reasonably long timeout). The
sender also subscribes to notifications about terminated sessions or
unsubscription from the alias group.

The receiving user gets the message, runs the command and sends a
response back, with the result. The response has the undeliverable
240
signal turned off and it is marked as response to the message
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288
containing the command. The sending user receives the answer and pairs
it with the command.

There are several things that may go wrong.

* There might be an error on the receiving user (bad parameters, the
  operation failed, the recipient doesn't know command of that name).
  The receiving side sends the response as previous, the only
  difference is the content of the payload. The sending user is
  notified about it, without delays.
* The recipient user doesn't exist (either the session ID is wrong or
  terminated already, or the alias is empty). The system sends a
  failure response and the sending user knows immediately the command
  failed.
* The recipient disconnects while processing the command (possibly
  crashes). The sender gets a notification about disconnection or
  unsubscription from the alias group and knows the answer won't come.
* The recipient ``blackholes'' the command. It receives it, but never
  answers. The timeout in sender times out. As this is a serious
  programmer error in the recipient and should be rare, the sender
  should at least log an error to notify about the case.

One example would be asking the question of life, universe and
everything (all the examples assume the sending user is already
subscribed to the notifications):

  s12345 -> DeepThought
  {"command": ["question", {
      "what": ["Life", "Universe", "*"]
  }]}
  s23456 -> s12345
  {"reply": [0, 42]}

The deep thought had an alias. But the answer is sent from its session
ID. The `0` in the reply means ``success''.

Another example might be asking for some data at a bureau and getting
an error:

  s12345 -> Burreau
  {"command": ["provide-information", {
      "about": "me",
      "topic": "taxes"
  }]}
  s23456 -> s12345
  {"reply": [1, "You need to fill in other form"]}

And, in this example, the sender is trying to reach an non-existent
289 290
session. The `msgq` here is not the alias `Msgq`, but a special
``phantom'' session ID that is not listed anywhere.
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343

  s12345 -> s0
  {"command": ["ping"]}
  msgq -> s12345
  {"reply": [-1, "No such recipient"]}

Last, an example when the other user disconnects while processing the
command.

  s12345 -> s23456
  {"command": ["shutdown"]}
  msgq -> s12345
  {"notification": ["disconnected", {
    "lname": "s23456"
  }]}

The system does not support sending a command to multiple users
directly. It can be accomplished as this:

* The sending user calls a command on the system to get list of
  sessions in given group. This is command to alias, so it can be done
  by the previous way.
* After receiving the list of session IDs, multiple copies of the
  command are sent by the sending user, one to each of the session
  IDs.
* Successes and failures are handled the same as above, since these
  are just single-recipient commands.

So, this would be an example with unhelpful war council.

  s12345 -> Msgq
  {"command": ["get-subscriptions", {
      "group": "WarCouncil"
  }]}
  msgq -> s12345
  {"reply": [0, ["s1", "s2", "s3"]]}
  s12345 -> s1
  {"command": ["advice", {
      "topic": "Should we attack?"
  }]}
  s12345 -> s2
  {"command": ["advice", {
      "topic": "Should we attack?"
  }]}
  s12345 -> s3
  {"command": ["advice", {
      "topic": "Should we attack?"
  }]}
  s1 -> s12345
  {"reply": [0, true]}
  s2 -> s12345
  {"reply": [0, false]}
  s3 -> s12345
344
  {"reply": [1, "Advice feature not implemented"]}
Michal 'vorner' Vaner's avatar
Michal 'vorner' Vaner committed
345

346 347 348
Users
-----

349
While there's a lot of flexibility for the behaviour of a user, it
350
usually comes to something like this (during the lifetime of the
351
user):
352

353 354 355 356
* The user starts up.
* Then it creates one or more sessions (there may be technical reasons
  to have more than one session, such as threads, but it is not
  required by the system).
357 358 359 360 361 362 363 364
* It subscribes to some groups to receive notifications in future.
* It binds to some aliases if it wants to be reachable by others by a
  nice name.
* It invokes some start-up commands (to get the configuration, for
  example).
* During the lifetime, it listens for notifications and answers
  commands. It also invokes remote commands and sends notifications
  about things that are happening.
365
* Eventually, the user terminates, closing all the sessions it had
366 367
  opened.

Michal 'vorner' Vaner's avatar
Michal 'vorner' Vaner committed
368 369 370 371 372 373
Known limitations
-----------------

It is meant mostly as signalling protocol. Sending millions of
messages or messages of several tens of megabytes is probably a bad
idea. While there's no architectural limitation with regards of the
374
number of transferred messages and the maximum size of message is 4GB,
Josh Soref's avatar
Josh Soref committed
375
the code is not optimized and it would probably be very slow.
376 377

We currently expect the system not to be at heavy load. Therefore, we
378
expect the system to keep up with users sending messages. The
379 380 381 382
libraries write in blocking mode, which is no problem if the
expectation is true, as the write buffers will generally be empty and
the write wouldn't block, but if it turns out it is not the case, we
might need to reconsider.