Feature Flags
Overview
In a mixed version cluster (e.g. some versions are 3.11.x and some are 3.12.x) during an upgrade, some nodes will support a different set of features, behave differently in certain scenarios, and otherwise not act exactly the same: they are different versions after all.
Feature flags are a mechanism that controls what features are considered to be enabled or available on all cluster nodes. If a feature flag is enabled, so is its associated feature (or behavior). If not then all nodes in the cluster will disable the feature (behavior).
The feature flag subsystem allows RabbitMQ nodes with different versions to determine if they are compatible and then communicate together, despite having different versions and thus potentially having different feature sets or implementation details.
This subsystem was introduced to allow for rolling upgrades of cluster members without shutting down the entire cluster.
Feature flags are not meant to be used as a form of cluster configuration. After a successful rolling upgrade, users should enable all feature flags.
All feature flags become mandatory (graduate) at some point. For example, RabbitMQ 3.12 requires feature flags introduced in the 3.11 series to be enabled prior to the upgrade, RabbitMQ 3.11 graduates all 3.8 flags, and so on.
Quick summary (TL;DR)
Feature Flag Ground Rules
- A feature flag can be enabled only if all nodes in the cluster support it
- A node can join or re-join a cluster only if:
- it supports all the feature flags enabled in the cluster and
- if every other cluster member supports all the feature flags enabled on that node
- Once enabled, a feature flag cannot be disabled
For example, RabbitMQ 3.13.x and 3.12.x nodes are compatible as long as no 3.13.x-specific feature flags are enabled.
Key CLI Tool Commands
- To list feature flags:
rabbitmqctl list_feature_flags
- To enable a feature flag (or all currently disabled flags):
rabbitmqctl enable_feature_flag <all | name>
It is also possible to list and enable feature flags from the Management plugin UI, in "Admin > Feature flags".
Examples
Example 1: Compatible Nodes
- If nodes A and B are not clustered, they can be clustered.
- If nodes A and B are clustered:
- "Coffee maker" can be enabled.
- "Juicer machine" cannot be enabled because it is unsupported by node B.
Example 2: Incompatible Nodes
- If nodes A and B are not clustered, they cannot be clustered because "Juicer machine" is unsupported on node B.
- If nodes A and B are clustered and "Juicer machine" was enabled while node B was stopped, node B cannot re-join the cluster on restart.
Feature Flags and RabbitMQ Versions
As covered earlier, the feature flags subsystem's primary goal is to allow upgrades regardless of the version of cluster members, to the extent possible.
Feature flags make it possible to safely perform a rolling upgrade to the next patch or minor release, except if it is stated otherwise in the release notes. Indeed, there are some changes which cannot be implemented as feature flags.
However, note that only upgrading from one minor to the next minor or major is supported. To upgrade from e.g. 3.9.16 to 3.12.3, it is necessary to upgrade to 3.9.29 first, then to the latest 3.10 patch release, then the latest 3.11 release, then 3.12.3. After certain steps in the upgrade process it will also be necessary to enable all stable feature flags available in that version. For example, 3.12.0 is a release that requires all feature flags to be enabled before a node can be upgraded to it.
Likewise if there is one or more minor release branches between the minor version used and the next major release. That might work (i.e. there could be no incompatible changes between major releases), but this scenario is unsupported by design for the following reasons:
- Skipping minor versions is not tested in CI.
- Non-sequential releases may or may not support the same set of feature flags. Feature flags present for several minor branches can be marked as required and their associated feature/behavior is now implicitly enabled by default. The compatibility code is removed in the process, preventing clustering with older nodes. Remember their purpose is to allow upgrades, they are not a configuration mechanism.
Their is no policy defining the life cycle of a feature flag in general. E.g. there is no guaranty that a feature flag will go from "stable" to "required" after N minor releasees. Because new code builds on top of existing code, feature flags are marked as required and the compatibility code is removed whenever it is needed.
How to List Supported Feature Flags
When a node starts for the first time, all stable feature flags are enabled by default. When a node is upgraded to a newer version of RabbitMQ, new feature flags are left disabled.
To list the feature flags, use rabbitmqctl list_feature_flags
:
rabbitmqctl list_feature_flags
# => Listing feature flags ...
# => name state
# => empty_basic_get_metric enabled
# => implicit_default_bindings enabled
# => quorum_queue enabled
For improved table readability, switch to the pretty_table
formatter:
rabbitmqctl -q --formatter pretty_table list_feature_flags \
name state provided_by desc doc_url
which would produce a table that looks like this:
┌───────────────────────────┬─────────┬───────────────────────────┬───────┬────────────┐
│ name │ state │ provided_by │ desc │ doc_url │
├───────────────────────────┼─────────┼───────────────────────────┼───────┼────────────┤
│ empty_basic_get_metric │ enabled │ rabbitmq_management_agent │ (...) │ │
├───────────────────────────┼─────────┼───────────────────────────┼───────┼────────────┤
│ implicit_default_bindings │ enabled │ rabbit │ (...) │ │
├───────────────────────────┼─────────┼───────────────────────────┼───────┼────────────┤
│ quorum_queue │ enabled │ rabbit │ (...) │ http://... │
└───────────────────────────┴─────────┴───────────────────────────┴───────┴────────────┘
As shown in the example above, the list_feature_flags
command accepts
a list of columns to display. The available columns are:
name
: the name of the feature flag.state
: enabled or disabled if the feature flag is enabled or disabled, unsupported if one or more nodes in the cluster do not know this feature flag (and therefore it cannot be enabled).provided_by
: the RabbitMQ component or plugin which provides the feature flag.desc
: the description of the feature flag.doc_url
: the URL to a webpage to learn more about the feature flag.stability
: indicates if the feature flag is required, stable or experimental.
How to Enable Feature Flags
After upgrading one node or the entire cluster, it will be possible to enable new feature flags. Note that it will be impossible to roll back the version or add a cluster member using the old version once new feature flags are enabled.
To enable a feature flag, use:
rabbitmqctl enable_feature_flag <name>
To enable all stable feature flags, use:
rabbitmqctl enable_feature_flag all
The rabbitmqctl enable_feature_flag all
command enables stable feature flags
only and not experimental ones.
The list_feature_flags
command can be used again to verify the feature
flags' states. Assuming all feature flags were disabled initially, here
is the state after enabling the quorum_queue
feature flag:
rabbitmqctl -q --formatter pretty_table list_feature_flags
┌─────────────────────── ────┬──────────┐
│ name │ state │
├───────────────────────────┼──────────┤
│ empty_basic_get_metric │ disabled │
├───────────────────────────┼──────────┤
│ implicit_default_bindings │ disabled │
├───────────────────────────┼──────────┤
│ quorum_queue │ enabled │
└───────────────────────────┴──────────┘
It is also possible to list and enable feature flags from the Management Plugin UI, in "Admin > Feature flags":
How to Disable Feature Flags
It is impossible to disable a feature flag once it is enabled.
How to Override the List of Feature Flags to Enable on Initial Startup
By default a new and unclustered node will start with all stable feature flags enabled, but this setting can be overridden.
Since enabled, feature flags cannot be disabled, using this feature is only safe to use for enabling more flags. Providing a list of flags identical to the currently enabled list is also safe, of course.
This mechanism is only useful to allow a user to expand an existing cluster with a node running a newer version of RabbitMQ compared to the rest of the cluster. The compatibility with the new node is still verified and adding it to the cluster may still fail if it is incompatible.
There are two ways to do this:
-
Using the
RABBITMQ_FEATURE_FLAGS
environment variable:# enables all feature flags in 4.0.2 except for khepri_db
RABBITMQ_FEATURE_FLAGS="delete_ra_cluster_mqtt_node,virtual_host_metadata,stream_single_active_consumer,quorum_queue,classic_mirrored_queue_version,rabbit_mqtt_qos0_queue,implicit_default_bindings,empty_basic_get_metric,'rabbitmq_4.0.0',message_containers,user_limits,queue_master_locator,detailed_queues_endpoint,stream_sac_coordinator_unblock_group,stream_update_config_command,stream_queue,stream_filtering,rabbit_exchange_type_local_random,quorum_queue_non_voters,tracking_records_in_ets,direct_exchange_routing_v2,amqp_address_v1,transient_nonexcl_queues,message_containers_deaths_v2,classic_queue_mirroring,management_metrics_collection,maintenance_mode_status,listener_records_in_ets,feature_flags_v2,global_qos,classic_queue_type_delivery_support,mqtt_v5,ram_node_type,drop_unroutable_metric,restart_streams" -
Using the
forced_feature_flags_on_init
setting inadvanced.config
:{rabbit, [
%% enables all feature flags in 4.0.2 except for khepri_db
{forced_feature_flags_on_init, [
maintenance_mode_status,
direct_exchange_routing_v2,
user_limits,
transient_nonexcl_queues,
amqp_address_v1,stream_filtering,
implicit_default_bindings,
quorum_queue_non_voters,
'rabbitmq_4.0.0',
tracking_records_in_ets,
delete_ra_cluster_mqtt_node,
classic_queue_type_delivery_support,
restart_streams,
message_containers_deaths_v2,
feature_flags_v2,empty_basic_get_metric,
classic_queue_mirroring,
rabbit_exchange_type_local_random,
detailed_queues_endpoint,
stream_queue,
classic_mirrored_queue_version,
quorum_queue,
management_metrics_collection,
message_containers,
ram_node_type,
stream_sac_coordinator_unblock_group,
drop_unroutable_metric,
stream_single_active_consumer,
virtual_host_metadata,
listener_records_in_ets,
stream_update_config_command,
global_qos,
queue_master_locator,
rabbit_mqtt_qos0_queue,mqtt_v5
]}
]},
%% ...
The environment variable has precedence over the configuration parameter.
Obviously, required feature flags will always be enabled, regardless of this.
Feature Flag Maturation and Graduation Process
After their initial introduction into RabbitMQ, feature flags are optional, that is, they only serve the purpose of allowing for a safe rolling cluster upgrade.
Over time, however, features become more mature and future development of RabbitMQ assumes that a certain set of features is available and can be relied on by the users and developers alike. When that happens, feature flags graduate to core (required) features in the next minor feature release.
It is very important to enable all feature flags after performing a rolling cluster upgrade: in the future these flags will become mandatory, and proactively enabling them will allow for a smoother upgrade experience in the future.
List of Feature Flags
The feature flags listed below are provided by RabbitMQ core or one of the tier-1 plugins bundled with RabbitMQ.
Column Required
shows the RabbitMQ version before which a feature flag
MUST have been enabled. For example, if a feature flag is required in 3.12.0,
this feature flag must be enabled in 3.11.x (or earlier) before upgrading to
3.12.x. Otherwise, if a RabbitMQ node is upgraded to 3.12.x while this feature
flag is disabled, the RabbitMQ node will refuse to start in 3.12.x.
Column Stable
shows the RabbitMQ version that introduced a feature flag. For
example, if a feature flag is stable in 3.11.0, that feature flag SHOULD be
enabled promptly after upgrading all nodes in a RabbitMQ cluster to version
3.11.x.
Core Feature Flags
The following feature flags are provided by RabbitMQ core.
Most feature flags listed below have very brief descriptions. This is because most feature flags only exist to avoid potentially unsafe operations in mixed-version clusters, correct a behavior that must be consistent across all cluster nodes, and so on.
khepri_db
is one exception because of it scope.
Required | Stable | Feature flag name | Description |
---|---|---|---|
4.0 | rabbitmq_4.0.0 | Enables multiple features and changes introduced in RabbitMQ 4.0. RabbitMQ 4.0 uses a single flag to control multiple features and changes. If you upgrade to RabbitMQ 4.0, it will be running in backwards-compatible mode until this feature flag is enabled. For example, new quorum queue features and the new AMQP-1.0 flow control mechanism will not be available. | |
4.0 | khepri_db | Enables Khepri, a Raft-based schema data store with vastly superior (namely more predictable) node and network failure recovery characteristics compared to Mnesia. info Khepri is fully supported (just like Mnesia) starting with RabbitMQ 4.0. This feature flag must be explicitly enabled (opt-in) due to its scope. important Due to extensive Khepri schema changes in RabbitMQ 4.0, 3.13.x clusters that have Khepri enabled won't be upgradeable in-place to 4.0. Such clusters should use Blue-Green deployment upgrade strategy. Make sure to first test Khepri with appropriate workloads in non-production environments before adopting it in production. | |
4.0 | rabbit_exchange_type_local_random | Use by the Local Random Exchange. | |
3.13.1 | quorum_queue_non_voters | Support for the non-voter quorum queue replica state | |
3.13.0 | message_containers | Enables a new AMQP 1.0-based message format used internally | |
3.13.0 | detailed_queues_endpoint | Introduces the | |
4.0.0 | 3.13.0 | stream_filtering | Stream filtering support |
4.0.0 | 3.13.0 | stream_update_config_command | Removes |
4.0.0 | 3.12.0 | restart_streams | Support for restarting streams with optional preferred next leader argument. Used to implement stream leader rebalancing |
4.0.0 | 3.12.0 | stream_sac_coordinator_unblock_group | Bug fix to unblock a group of consumers in a super stream partition |
3.12.0 | 3.11.0 | direct_exchange_routing_v2 | v2 direct exchange routing implementation |
3.12.0 | 3.11.0 | feature_flags_v2 | Feature flags subsystem v2 |
3.12.0 | 3.11.0 | listener_records_in_ets | Store listener records in ETS instead of Mnesia |
3.12.0 | 3.11.0 | stream_single_active_consumer | Single active consumer for streams |
3.12.0 | 3.11.0 | tracking_records_in_ets | Store tracking records in ETS instead of Mnesia |
3.12.0 | 3.10.9 | classic_queue_type_delivery_support | Bug fix for classic queue deliveries using mixed versions |
3.12.0 | 3.9.0 | stream_queue | Support queues of type stream |
3.11.0 | 3.8.10 | user_limits | Configure connection and channel limits for a user |
3.11.0 | 3.8.8 | maintenance_mode_status | Maintenance mode status |
3.11.0 | 3.8.0 | implicit_default_bindings | Default bindings are now implicit, instead of being stored in the database |
3.11.0 | 3.8.0 | quorum_queue | Support queues of type quorum |
3.11.0 | 3.8.0 | virtual_host_metadata | Virtual host metadata (description, tags, etc.) |
rabbitmq_management_agent Feature Flags
The following feature flags are provided by plugin rabbimq_management_agent.
Required | Stable | Feature flag name | Description |
---|---|---|---|
3.12.0 | 3.8.10 | drop_unroutable_metric | Count unroutable publishes to be dropped in stats |
3.12.0 | 3.8.10 | empty_basic_get_metric | Count AMQP basic.get on empty queues in stats |
rabbitmq_rabbitmq_mqtt Feature Flags
The following feature flags are provided by plugin rabbimq_mqtt.
Required | Stable | Feature flag name | Description |
---|---|---|---|
3.13.0 | mqtt_v5 | Support MQTT 5.0 | |
3.12.0 | delete_ra_cluster_mqtt_node | Delete Ra cluster mqtt_node since MQTT client IDs are tracked locally | |
3.12.0 | rabbit_mqtt_qos0_queue | Support pseudo queue type for MQTT QoS 0 subscribers omitting a queue process |
How Do Feature Flags Work?
From an Operator Point of View
Node and Version Compatibility
There are two times when an operator has to consider feature flags:
- When extending an existing cluster by adding nodes using a different version of RabbitMQ (older or newer), the operator needs to pay attention to feature flags: they might prevent clustering.
- After upgrading a cluster, the operator should take a look at the new feature flags and perhaps enable them.
A node compares its own list of feature flags with remote nodes' list of feature flags to determine if it can join a cluster. The rules are defined as:
- All feature flags enabled locally must be supported remotely.
- All feature flags enabled remotely must be supported locally.
It is important to understand the difference between enabled and supported:
- A supported feature flag is one which is known by the node. It can be enabled or disabled, but its state is irrelevant at this point.
- An enabled feature flag is one which is activated and used by the node. Per the definition above, it is implicitly a supported feature flag.
If one of those two conditions is not verified, the node cannot join or re-join the cluster.
However, if it can join the cluster, the state of enabled feature flags is synchronized between nodes: if a feature flag is enabled on one node, it is enabled on all other nodes.
Scope of the Feature Flags
The feature flags subsystem covers inter-node communication only. This means the following scenarios are not covered and may not work as initially expected.
Using rabbitmqctl
on a remote node
Controlling a remote node with rabbitmqctl
is only supported if the
remote node is running the same version of RabbitMQ asrabbitmqctl
.
If CLI tools from a different minor/major version of RabbitMQ is used on a remote node, they may fail to work as expected or even have unexpected side effects on the node.
Load-balancing Requests to the HTTP API
If a request sent to the HTTP API exposed by the Management plugin goes through a load balancer, including one from the management plugin UI, the API's behavior and its response may be different, depending on the version of the node which handled the request. This is exactly the same if the domain name of the HTTP API resolves to multiple IP addresses.
This situation may happen during a rolling upgrade if the management UI is open in a browser with periodic automatic refresh.
For example, if the management UI was loaded from a RabbitMQ 3.11.x node but it then queries a RabbitMQ 3.12.x node, the JavaScript code running in the browser may fail with exceptions due to HTTP API changes.
What Happens When a Feature Flag is Enabled
When a feature flag is enabled with rabbitmqctl
, here is what happens
internally:
- RabbitMQ verifies if the feature flag is already enabled. If yes, it stops.
- It verifies if the feature flag is supported. If no, it stops.
- It marks the feature flag state as state_changing. This is an internal transitional state to inform consumers of this feature flag. Most of the time, it means that components depending on this particular feature flag will be blocked until the state changes to enabled or disabled.
- It enables all feature flags this one depends on. Therefore for each one of them, we go through this same procedure.
- It executes the migration function, if there is one. This function is responsible for preparing or converting various resources, such as changing the schema of a database.
- If all the steps above succeed, the feature flag state becomes enabled. Otherwise, it is reverted back to disabled.
As an operator, the most important part of this procedure to remember is that if the migration takes time, some components and thus some operations in RabbitMQ might be blocked during the migration.
From a Developer Point of View
When working on a plugin or a RabbitMQ core contribution, feature flags should be used to make the new version of the code compatible with older versions of RabbitMQ.
When to Use a Feature Flag
It is developer's responsibility to look at the list of existing and
future (i.e. those added to the main
branch) feature flags and see
if the new code can be adapted to take advantage of them.
Here is an example. When developing a plugin which used to use the
#amqqueue{}
record defined in rabbit_common/include/rabbit.hrl
, the
plugin has to be adapted to use the new amqqueue
API which hides the
previous record (which is private now). However, there is no need to
query feature flags for that: the plugin will be ABI-compatible (i.e. no
need to recompile it) with RabbitMQ 3.8.0 and later. It should also be
ABI-compatible with RabbitMQ 3.7.x once the amqqueue
appears in that
branch.
However if the plugin targets quorum queues introduced in RabbitMQ
3.8.0, it may have to query feature flags to determine what it can do.
For instance, can it declare a quorum queue? Can it even expect the new
fields added to amqqueue
as part of the quorum queues implementation?
If the plugin carefully checks feature flags to avoid any incorrect expectations, it will be compatible with many versions of RabbitMQ: the user will not have to recompile anything or download another version-specific copy of the plugin.
When to Declare a Feature Flag
If a plugin or core broker change modifies one of the following aspects:
- record definitions
- replicated database schemas
- the format of Erlang messages passed between nodes
- modules and functions called from remote nodes
Then compatibility with older versions of RabbitMQ becomes a concern. This is where a new feature flag can help ensure a smoother upgrade experience.
The two most important parts of a feature flag are:
- the declaration as a module attribute
- the migration function
The declaration is a module attribute which looks like this:
-rabbit_feature_flag(
{quorum_queue,
#{desc => "Support queues of type quorum",
doc_url => "https://www.rabbitmq.com/docs/quorum-queues",
stability => stable,
migration_fun => {?MODULE, quorum_queue_migration}
}}).
The migration function is a stateless function which looks like this:
quorum_queue_migration(FeatureName, _FeatureProps, enable) ->
Tables = ?quorum_queue_tables,
rabbit_table:wait(Tables),
Fields = amqqueue:fields(amqqueue_v2),
migrate_to_amqqueue_with_type(FeatureName, Tables, Fields);
quorum_queue_migration(_FeatureName, _FeatureProps, is_enabled) ->
Tables = ?quorum_queue_tables,
rabbit_table:wait(Tables),
Fields = amqqueue:fields(amqqueue_v2),
mnesia:table_info(rabbit_queue, attributes) =:= Fields andalso
mnesia:table_info(rabbit_durable_queue, attributes) =:= Fields.
More implementation docs can be found in the rabbit_feature_flags
module
source
code.
Erlang's edoc
reference can be generated locally from a RabbitMQ
repository clone or source archive:
gmake edoc
# => ... Ignore warnings and errors...
# Now open `doc/rabbit_feature_flags.html` in the browser.
How to Adapt and Run Testsuites with mixed-version clusters
When a feature or behavior depends on a feature flag (either in the core broker or in a plugin), the associated testsuites must be adapted to take this feature flag into account. It means that before running the actual testcase, the setup code must verify if the feature flag is supported and either enable it if it is, or skip the testcase. This is the same for setup code running at the group or suite level.
There are helper functions in rabbitmq-ct-heleprs
to ease that check.
Here is an example, taken from the dynamic_qq_SUITE.erl
testsuite in
rabbitmq-server:
init_per_testcase(Testcase, Config) ->
% (...)
% 1.
% The broker or cluster is started: we rely on this to query feature
% flags.
Config1 = rabbit_ct_helpers:run_steps(
Config,
rabbit_ct_broker_helpers:setup_steps() ++
rabbit_ct_client_helpers:setup_steps()),
% 2.
% We try to enable the `quorum_queue` feature flag. The helper is
% responsible for checking if the feature flag is supported and
% enabling it.
case rabbit_ct_broker_helpers:enable_feature_flag(Config1, quorum_queue) of
ok ->
% The feature flag is enabled at this point. The setup can
% continue to play with `Config1` and the cluster.
Config1;
Skip ->
% The feature flag is unavailable/unsupported. The setup
% calls `end_per_testcase()` to stop the node/cluster and
% skips the testcase.
end_per_testcase(Testcase, Config1),
Skip
end.
It is possible to run testsuites locally in the context of a
mixed-version cluster. If configured to do so, rabbitmq-ct-helpers
will use a second version of RabbitMQ to start half of the nodes when
starting a cluster:
- Node 1 will be on the primary copy (the one used to start the testsuite)
- Node 2 will be on the secondary copy (the one provided explicitly to
rabbitmq-ct-helpers
) - Node 3 will be on the primary copy
- Node 4 will be on the secondary copy
- ...
To run a testsuite in the context of a mixed-version cluster:
-
Clone the
rabbitmq-public-umbrella
repository and checkout the appropriate branch or tag. This will be the secondary Umbrella. In this example, thev3.12.x
branch is used:git clone https://github.com/rabbitmq/rabbitmq-server.git secondary-umbrella
cd secondary-umbrella
git checkout v3.12.x
make co -
Compile RabbitMQ or the plugin being tested in the secondary Umbrella. The
rabbitmq-federation
plugin is used as an example:cd secondary-umbrella/deps/rabbitmq_federation
make dist -
Go to RabbitMQ or the same plugin in the primary copy:
cd /path/to/primary/rabbitmq_federation
-
Run the testsuite. Here, two environment variables are specified to configure the "mixed-version cluster" mode:
SECONDARY_UMBRELLA=/path/to/secondary-umbrella \
RABBITMQ_FEATURE_FLAGS= \
make testsThe first environment variable,
SECONDARY_UMBRELLA
, tellsrabbitmq-ct-helpers
where to find the secondary Umbrella, as the name suggests. This is how the mixed-version cluster mode is enabled.The secondary environment variable,
RABBITMQ_FEATURE_FLAGS
, is set to the empty string and tells RabbitMQ to start with all feature flags disabled: this is mandatory to have a newer node compatible with an older one.