Commit 98a9bd4f authored by Marcin Siodelski's avatar Marcin Siodelski
Browse files

[master] Merge branch 'trac5675'

parents 988ee746 a01ff996
......@@ -137,7 +137,7 @@
</para>
</section>
<section>
<section xml:id="ha-server-states">
<title>Server States</title>
<para>The DHCP server operating within an HA setup runs a state machine
and the state of the server can be retrieved by its peers using the
......@@ -1047,6 +1047,189 @@
</para>
</section>
<section xml:id="ha-pause-state-machine">
<title>Pausing HA State Machine</title>
<para>The high availability state machine includes many different
states described in detail in <xref linkend="ha-server-states"/>.
The server enters each state when certain conditions are met, most
often taking into account the partner server's state. In some states
the server performs specific actions, e.g. synchronization of the
lease database in the <command>syncing</command> state or responding
to DHCP queries according to the configured mode of operation in the
<command>load-balancing</command> and <command>hot-standby</command>
states.
</para>
<para>
By default, transitions between the states are performed
automatically and the server administrator has no direct control
when the transitions take place and, in most cases, the
administrator doesn't need such control. In some situations,
however, the administrator may want to "pause" the HA state
machine in a selected state to perform some additional administrative
actions before the server transitions to the next state.
</para>
<para>Consider a server failure which results in a loss of the entire
lease database. Typically, the server will rebuild its lease database
when it enters the <command>syncing</command> state by querying
the partner server for leases, but it is possible that the
partner was also experiencing a failure and lacks lease information.
In this case, it may be required to reconstruct lease databases on
both servers from some external source, e.g. a backup server. If the
lease database is going to be reconstructed via RESTful API, the
servers should be started in the initial, i.e. <command>waiting</command>
state and remain in this state while leases are being added. In
particular, the servers should not attempt to synchronize their lease
databases nor start serving DHCP clients.
</para>
<para>The HA hooks library provides configuration parameters and a
command to control when the HA state machine should be paused and
resumed. The following configuration will cause the HA state machine
to pause in the <command>waiting</command> state after server startup.
<screen>
"Dhcp4": {
...
"hooks-libraries": [
{
"library": "/usr/lib/hooks/libdhcp_lease_cmds.so",
"parameters": { }
},
{
"library": "/usr/lib/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [ {
"this-server-name": "server1",
"mode": "load-balancing",
"peers": [
{
"name": "server1",
"url": "http://192.168.56.33:8080/",
"role": "primary"
},
{
"name": "server2",
"url": "http://192.168.56.66:8080/",
"role": "secondary"
}
],
"state-machine": {
"states": [
{
"state": "waiting",
"pause": "once"
}
]
}
} ]
}
}
],
...
}
</screen>
</para>
<para>The <command>pause</command> parameter value <command>once</command>
denotes that the state machine should be paused upon the first transition
to the <command>waiting</command> state. Later transitions to this state
won't cause the state machine to pause. Two other supported values of the
<command>pause</command> parameter are: <command>always</command> and
<command>never</command>. The latter is the default value for each state,
which instructs the server to never pause the state machine.
</para>
<para>In order to "unpause" the state machine the <command>ha-continue</command>
command must be sent to the paused server. This command does not take
any arguments. See <xref linkend="ha-control-commands"/> for details
about commands specific for HA hooks library.
</para>
<para>It is possible to configure the state machine to pause in more than
one state. Consider the following configuration.
<screen>
"Dhcp4": {
...
"hooks-libraries": [
{
"library": "/usr/lib/hooks/libdhcp_lease_cmds.so",
"parameters": { }
},
{
"library": "/usr/lib/hooks/libdhcp_ha.so",
"parameters": {
"high-availability": [ {
"this-server-name": "server1",
"mode": "load-balancing",
"peers": [
{
"name": "server1",
"url": "http://192.168.56.33:8080/",
"role": "primary"
},
{
"name": "server2",
"url": "http://192.168.56.66:8080/",
"role": "secondary"
}
],
"state-machine": {
"states": [
{
"state": "ready",
"pause": "always"
},
{
"state": "partner-down",
"pause": "once"
}
]
}
} ]
}
}
],
...
}
</screen>
</para>
<para>This configuration instructs the server to pause the state
machine every time it transitions to the <command>ready</command> state
and upon the first transition to the <command>partner-down</command>
state.</para>
<para>Refer to the <xref linkend="ha-server-states"/> for a complete of
list of server states. The state machine can be paused in any of the
supported states, however it is not practical for the
<command>backup</command> and <command>terminated</command> states because
the server never transitions out of these states anyway.
</para>
<note><para>In the <command>syncing</command> state the server is paused
before it makes an attempt to synchronize lease database with a partner.
In order to pause the state machine after lease database synchronization,
use the <command>ready</command> state instead.
</para></note>
<note><para>The state of the HA state machine depends on the state of the
cooperating server. Therefore, it must be taken into account that
pausing the state machine of one server may affect the operation of the
partner server. For example: if the primary server is paused in the
<command>waiting</command> state the partner server will also remain in
the <command>waiting</command> state until the state machine of the
primary server is resumed and that server transitions to the
<command>ready</command> state.</para></note>
</section>
<section xml:id="ha-ctrl-agent-config">
<title>Control Agent Configuration</title>
<para>The <xref linkend="kea-ctrl-agent"/> describes in detail the
......@@ -1198,6 +1381,19 @@
</para>
</section> <!-- ha-scopes-command -->
<section xml:id="ha-continue-command">
<title>ha-continue command</title>
<para>This command is used to resume the operation of the paused HA
state machine as described in the <xref linkend="ha-pause-state-machine"/>.
It takes no arguments, so the command structure is as simple as:
<screen>
{
"command": "ha-continue"
}
</screen>
</para>
</section> <!-- ha-continue-command -->
</section> <!-- ha-control-commands -->
</section> <!-- end of high-availability-library -->
......@@ -190,6 +190,19 @@ int scopes_command(CalloutHandle& handle) {
return (0);
}
/// @brief ha-continue command handler implementation.
int continue_command(CalloutHandle& handle) {
try {
impl->continueHandler(handle);
} catch (const std::exception& ex) {
LOG_ERROR(ha_logger, HA_CONTINUE_HANDLER_FAILED)
.arg(ex.what());
}
return (0);
}
/// @brief This function is called when the library is loaded.
///
/// @param handle library handle
......@@ -208,6 +221,7 @@ int load(LibraryHandle& handle) {
handle.registerCommandCallout("ha-heartbeat", heartbeat_command);
handle.registerCommandCallout("ha-sync", sync_command);
handle.registerCommandCallout("ha-scopes", scopes_command);
handle.registerCommandCallout("ha-continue", continue_command);
} catch (const std::exception& ex) {
LOG_ERROR(ha_logger, HA_CONFIGURATION_FAILED)
......
......@@ -390,5 +390,12 @@ HAImpl::scopesHandler(hooks::CalloutHandle& callout_handle) {
callout_handle.setArgument("response", response);
}
void
HAImpl::continueHandler(hooks::CalloutHandle& callout_handle) {
ConstElementPtr response = service_->processContinue();
callout_handle.setArgument("response", response);
}
} // end of namespace isc::ha
} // end of namespace isc
......@@ -137,6 +137,11 @@ public:
/// @param callout_handle Callout handle provided to the callout.
void scopesHandler(hooks::CalloutHandle& callout_handle);
/// @brief Implements handler for the ha-continue command.
///
/// @param callout_handle Callout handle provided to the callout.
void continueHandler(hooks::CalloutHandle& callout_handle);
protected:
/// @brief Holds parsed configuration.
......
......@@ -107,6 +107,11 @@ are administratively disabled and will not be issued in the HA state to
which the server has transitioned. The sole argument specifies the state
into which the server has transitioned.
% HA_CONTINUE_HANDLER_FAILED ha-continue command failed: %1
This error message is issued to indicate that the ha-continue command handler
failed while processing the command. The argument provides the reason for
failure.
% HA_DEINIT_OK unloading High Availability hooks library successful
This informational message indicates that the High Availability hooks library
has been unloaded successfully.
......
......@@ -123,6 +123,9 @@ HAService::backupStateHandler() {
if (doOnEntry()) {
query_filter_.serveNoScopes();
adjustNetworkState();
// Log if the state machine is paused.
conditionalLogPausedState();
}
// There is nothing to do in that state. This server simply receives
......@@ -138,6 +141,9 @@ HAService::normalStateHandler() {
if (doOnEntry()) {
query_filter_.serveDefaultScopes();
adjustNetworkState();
// Log if the state machine is paused.
conditionalLogPausedState();
}
scheduleHeartbeat();
......@@ -193,6 +199,9 @@ HAService::partnerDownStateHandler() {
query_filter_.serveDefaultScopes();
}
adjustNetworkState();
// Log if the state machine is paused.
conditionalLogPausedState();
}
scheduleHeartbeat();
......@@ -238,6 +247,9 @@ HAService::readyStateHandler() {
if (doOnEntry()) {
query_filter_.serveNoScopes();
adjustNetworkState();
// Log if the state machine is paused.
conditionalLogPausedState();
}
scheduleHeartbeat();
......@@ -300,6 +312,9 @@ HAService::syncingStateHandler() {
if (doOnEntry()) {
query_filter_.serveNoScopes();
adjustNetworkState();
// Log if the state machine is paused.
conditionalLogPausedState();
}
if (isModelPaused()) {
......@@ -375,6 +390,9 @@ HAService::terminatedStateHandler() {
// In the terminated state we don't send heartbeat.
communication_state_->stopHeartbeat();
// Log if the state machine is paused.
conditionalLogPausedState();
LOG_ERROR(ha_logger, HA_TERMINATED);
}
......@@ -389,6 +407,9 @@ HAService::waitingStateHandler() {
if (doOnEntry()) {
query_filter_.serveNoScopes();
adjustNetworkState();
// Log if the state machine is paused.
conditionalLogPausedState();
}
// Only schedule the heartbeat for non-backup servers.
......@@ -511,14 +532,6 @@ HAService::verboseTransition(const unsigned state) {
.arg(new_state_name);
}
}
// Inform the administrator if the state machine is paused.
if (isModelPaused()) {
std::string state_name = stateToString(state);
boost::to_upper(state_name);
LOG_INFO(ha_logger, HA_STATE_MACHINE_PAUSED)
.arg(state_name);
}
}
bool
......@@ -531,6 +544,17 @@ HAService::unpause() {
return (false);
}
void
HAService::conditionalLogPausedState() const {
// Inform the administrator if the state machine is paused.
if (isModelPaused()) {
std::string state_name = stateToString(getCurrState());
boost::to_upper(state_name);
LOG_INFO(ha_logger, HA_STATE_MACHINE_PAUSED)
.arg(state_name);
}
}
void
HAService::serveDefaultScopes() {
query_filter_.serveDefaultScopes();
......@@ -1384,6 +1408,14 @@ HAService::processScopes(const std::vector<std::string>& scopes) {
return (createAnswer(CONTROL_RESULT_SUCCESS, "New HA scopes configured."));
}
data::ConstElementPtr
HAService::processContinue() {
if (unpause()) {
return (createAnswer(CONTROL_RESULT_SUCCESS, "HA state machine continues."));
}
return (createAnswer(CONTROL_RESULT_SUCCESS, "HA state machine is not paused."));
}
ConstElementPtr
HAService::verifyAsyncResponse(const HttpResponsePtr& response) {
// The response must cast to JSON type.
......
......@@ -255,6 +255,16 @@ public:
/// machine was not paused when this method was invoked.
bool unpause();
protected:
/// @brief Logs if the server is paused in the current state.
///
/// This method is internally called by the state handlers upon
/// entry to a new state.
void conditionalLogPausedState() const;
public:
/// @brief Instructs the HA service to serve default scopes.
///
/// This method is mostly useful for unit testing. The scopes need to be
......@@ -610,6 +620,11 @@ public:
/// @return Pointer to the response to the ha-scopes command.
data::ConstElementPtr processScopes(const std::vector<std::string>& scopes);
/// @brief Processes ha-continue command and returns a response.
///
/// @return Pointer to the response to the ha-continue command.
data::ConstElementPtr processContinue();
protected:
/// @brief Checks if the response is valid or contains an error.
......
......@@ -509,5 +509,29 @@ TEST_F(HAImplTest, synchronizeHandler) {
}
// Tests ha-continue command handler.
TEST_F(HAImplTest, continueHandler) {
HAImpl ha_impl;
ASSERT_NO_THROW(ha_impl.configure(createValidJsonConfiguration()));
// Starting the service is required prior to running any callouts.
NetworkStatePtr network_state(new NetworkState(NetworkState::DHCPv4));
ASSERT_NO_THROW(ha_impl.startService(io_service_, network_state,
HAServerType::DHCPv4));
ConstElementPtr command = Element::fromJSON("{ \"command\": \"ha-continue\" }");
CalloutHandlePtr callout_handle = HooksManager::createCalloutHandle();
callout_handle->setArgument("command", command);
ASSERT_NO_THROW(ha_impl.continueHandler(*callout_handle));
ConstElementPtr response;
callout_handle->getArgument("response", response);
ASSERT_TRUE(response);
checkAnswer(response, CONTROL_RESULT_SUCCESS, "HA state machine is not paused.");
}
}
......@@ -2320,6 +2320,50 @@ TEST_F(HAServiceTest, processScopes) {
EXPECT_FALSE(service.query_filter_.amServingScope("server3"));
}
// This test verifies that the ha-continue command is processed successfully.
TEST_F(HAServiceTest, processContinue) {
HAConfigPtr config_storage = createValidConfiguration();
// State machine is to be paused in the waiting state.
ASSERT_NO_THROW(config_storage->getStateMachineConfig()->
getStateConfig(HA_WAITING_ST)->setPausing("always"));
TestHAService service(io_service_, network_state_, config_storage);
// Pause the state machine.
EXPECT_NO_THROW(service.transition(HA_WAITING_ST, HAService::NOP_EVT));
EXPECT_TRUE(service.isModelPaused());
// Process ha-continue command that should unpause the state machine.
ConstElementPtr rsp;
ASSERT_NO_THROW(rsp = service.processContinue());
// The server should have responded.
ASSERT_TRUE(rsp);
checkAnswer(rsp, CONTROL_RESULT_SUCCESS, "HA state machine continues.");
// State machine should have been unpaused as a result of processing the
// command.
EXPECT_FALSE(service.isModelPaused());
// Response should include no arguments.
EXPECT_FALSE(rsp->get("arguments"));
// Sending ha-continue command again which should return success but
// slightly different textual status.
ASSERT_NO_THROW(rsp = service.processContinue());
// The server should have responded.
ASSERT_TRUE(rsp);
checkAnswer(rsp, CONTROL_RESULT_SUCCESS, "HA state machine is not paused.");
// The state machine should not be paused.
EXPECT_FALSE(service.isModelPaused());
// Response should include no arguments.
EXPECT_FALSE(rsp->get("arguments"));
}
/// @brief HA partner to the server under test.
///
/// This is a wrapper class around @c HttpListener which simulates a
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment