Package org.apache.cassandra.net
Class MessagingService
- java.lang.Object
-
- org.apache.cassandra.net.MessagingServiceMBeanImpl
-
- org.apache.cassandra.net.MessagingService
-
- All Implemented Interfaces:
MessageDelivery
,MessagingServiceMBean
public class MessagingService extends MessagingServiceMBeanImpl implements MessageDelivery
MessagingService implements all internode communication - with the exception of SSTable streaming (for now). Specifically, it's responsible for dispatch of outbound messages to other nodes and routing of inbound messages to their appropriateIVerbHandler
.Using MessagingService: sending requests and responses
The are two ways to send aMessage
, and you should pick one depending on the desired behaviour: 1. To send a request that expects a response back, usesendWithCallback(Message, InetAddressAndPort, RequestCallback)
method. Once a response message is received,RequestCallback.onResponse(Message)
method will be invoked on the provided callback - in case of a success response. In case of a failure response (seeVerb.FAILURE_RSP
), or if a response doesn't arrive within verb's configured expiry time,RequestCallback.onFailure(InetAddressAndPort, RequestFailureReason)
will be invoked instead. 2. To send a response back, or a message that expects no response, usesend(Message, InetAddressAndPort)
method. See also:Message.out(Verb, Object)
,Message.responseWith(Object)
, andMessage.failureResponse(RequestFailureReason)
.Using MessagingService: handling a request
As described in the previous section, to handle responses you only need to implementRequestCallback
interface - so long as your response verb handler is the defaultResponseVerbHandler
. There are two steps you need to perform to implement request handling: 1. Create aIVerbHandler
to process incoming requests and responses for the new type (if applicable). 2. Add a newVerb
to the enum for the new request type, and, if applicable, one for the response message. MessagingService will now automatically invoke your handler whenever aMessage
with this verb arrives.Architecture of MessagingService
QOS
Since our messaging protocol is TCP-based, and also doesn't yet support interleaving messages with each other, we need a way to prevent head-of-line blocking adversely affecting all messages - in particular, large messages being in the way of smaller ones. To achive that (somewhat), we maintain three messaging connections to and from each peer: - one for large messages - defined as being larger thanOutboundConnections.LARGE_MESSAGE_THRESHOLD
(65KiB by default) - one for small messages - defined as smaller than that threshold - and finally, a connection for urgent messages - usually small and/or that are important to arrive promptly, e.g. gossip-related onesWire format and framing
Small messages are grouped together into frames, and large messages are split over multiple frames. Framing provides application-level integrity protection to otherwise raw streams of data - we use CRC24 for frame headers and CRC32 for the entire payload. LZ4 is optionally used for compression. You can find the on-wire format description of individual messages in the comments forMessage.Serializer
, alongside with format evolution notes. For the list and descriptions of available frame decoders seeFrameDecoder
comments. You can find wire format documented in the javadoc ofFrameDecoder
implementations: seeFrameDecoderCrc
andFrameDecoderLZ4
in particular.Architecture of outbound messaging
OutboundConnection
is the core class implementing outbound connection logic, withOutboundConnection.enqueue(Message)
being its main entry point. The connections are initiated byOutboundConnectionInitiator
. Netty pipeline for outbound messaging connections generally consists of the following handlers: [(optional) SslHandler] <- [FrameEncoder]OutboundConnection
handles the entire lifetime of a connection: from the very first handshake to any necessary reconnects if necessary. Message-delivery flow varies depending on the connection type. ForConnectionType.SMALL_MESSAGES
andConnectionType.URGENT_MESSAGES
,Message
serialization and delivery occurs directly on the event loop. SeeOutboundConnection.EventLoopDelivery
for details. ForConnectionType.LARGE_MESSAGES
, to ensure that servicing large messages doesn't block timely service of other requests, message serialization is offloaded to a companion thread pool (SocketFactory.synchronousWorkExecutor
). Most of the work will be performed byAsyncChannelOutputPlus
. Please seeOutboundConnection.LargeMessageDelivery
for details. To prevent fast clients, or slow nodes on the other end of the connection from overwhelming a host with enqueued, unsent messages on heap, we impose strict limits on how much memory enqueued, undelivered messages can claim. Every individual connection gets an exclusive permit quota to use - 4MiB by default; every endpoint (group of large, small, and urgent connection) is capped at, by default, at 128MiB of undelivered messages, and a global limit of 512MiB is imposed on all endpoints combined. On an attempt toOutboundConnection.enqueue(Message)
, the connection will attempt to allocate permits for message-size number of bytes from its exclusive quota; if successful, it will add the message to the queue; if unsuccessful, it will need to allocate remainder from both endpoint and lobal reserves, and if it fails to do so, the message will be rejected, and its callbacks, if any, immediately expired. For a more detailed description please see the docs and comments ofOutboundConnection
.Architecture of inbound messaging
InboundMessageHandler
is the core class implementing inbound connection logic, paired withFrameDecoder
. Inbound connections are initiated byInboundConnectionInitiator
. The primary entry points to these classes areFrameDecoder.channelRead(ShareableBytes)
andAbstractMessageHandler.process(FrameDecoder.Frame)
. Netty pipeline for inbound messaging connections generally consists of the following handlers: [(optional) SslHandler] -> [FrameDecoder] -> [InboundMessageHandler]FrameDecoder
is responsible for decoding incoming frames and work stashing;InboundMessageHandler
then takes decoded frames from the decoder and processes the messages contained in them. The flow differs between small and large messages. Small ones are deserialized immediately, and only then scheduled on the right thread pool for theVerb
for execution. Large messages, OTOH, aren't deserialized until they are just about to be executed on the appropriateStage
. Similarly to outbound handling, inbound messaging imposes strict memory utilisation limits on individual endpoints and on global aggregate consumption, and implements simple flow control, to prevent a single fast endpoint from overwhelming a host. Every individual connection gets an exclusive permit quota to use - 4MiB by default; every endpoint (group of large, small, and urgent connection) is capped at, by default, at 128MiB of unprocessed messages, and a global limit of 512MiB is imposed on all endpoints combined. On arrival of a message header, the handler will attempt to allocate permits for message-size number of bytes from its exclusive quota; if successful, it will proceed to deserializing and processing the message. If unsuccessful, the handler will attempt to allocate the remainder from its endpoint and global reserve; if either allocation is unsuccessful, the handler will cease any further frame processing, and tellFrameDecoder
to stop reading from the network; subsequently, it will put itself on a specialAbstractMessageHandler.WaitQueue
, to be reactivated once more permits become available. For a more detailed description please see the docs and comments ofInboundMessageHandler
andFrameDecoder
.Observability
MessagingService exposes diagnostic counters for both outbound and inbound directions - received and sent bytes and message counts, overload bytes and message count, error bytes and error counts, and many more. SeeInternodeInboundMetrics
andInternodeOutboundMetrics
for JMX-exposed counters. We also providesystem_views.internode_inbound
andsystem_views.internode_outbound
virtual tables - implemented inInternodeInboundTable
andInternodeOutboundTable
respectively.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
MessagingService.FailureResponseException
static class
MessagingService.Version
-
Field Summary
Fields Modifier and Type Field Description RequestCallbacks
callbacks
static int
current_version
InboundSink
inboundSink
LatencySubscribers
latencySubscribers
static int
maximum_version
static int
minimum_version
OutboundSink
outboundSink
SocketFactory
socketFactory
static int
VERSION_30
Deprecated.See CASSANDRA-18816static int
VERSION_3014
Deprecated.See CASSANDRA-18816static int
VERSION_40
static int
VERSION_50
-
Fields inherited from class org.apache.cassandra.net.MessagingServiceMBeanImpl
channelManagers, MBEAN_NAME, messageHandlers, metrics, versions
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
closeOutbound(InetAddressAndPort to)
Only to be invoked once we believe the endpoint will never be contacted again.static int
getVersionOrdinal(int version)
This is an optimisation to speed up the translation of the serialization version to theMessagingService.Version
enum ordinal.static MessagingService
instance()
void
interruptOutbound(InetAddressAndPort to)
Closes any current open channel/connection to the endpoint, but does not cause any message loss, and we will try to re-establish connections immediatelyvoid
listen()
io.netty.util.concurrent.Future<java.lang.Void>
maybeReconnectWithNewIp(InetAddressAndPort address, InetAddressAndPort preferredAddress)
Reconnect to the peer using the givenaddr
.void
removeInbound(InetAddressAndPort from)
Only to be invoked once we believe the connections will never be used again.<V> void
respond(V response, Message<?> message)
Send a message to a given endpoint.void
send(Message message, InetAddressAndPort to)
Send a message to a given endpoint.void
send(Message message, InetAddressAndPort to, ConnectionType specifyConnection)
void
sendWithCallback(Message message, InetAddressAndPort to, RequestCallback cb)
Send a non-mutation message to a given endpoint.void
sendWithCallback(Message message, InetAddressAndPort to, RequestCallback cb, ConnectionType specifyConnection)
<REQ,RSP>
Future<Message<RSP>>sendWithResult(Message<REQ> message, InetAddressAndPort to)
void
sendWriteWithCallback(Message message, Replica to, AbstractWriteResponseHandler<?> handler)
Send a mutation message or a Paxos Commit to a given endpoint.void
shutdown()
Wait for callbacks and don't allow anymore to be created (since they could require writing hints)void
shutdown(long timeout, java.util.concurrent.TimeUnit units, boolean shutdownGracefully, boolean shutdownExecutors)
void
shutdownAbrubtly()
void
waitUntilListening()
-
Methods inherited from class org.apache.cassandra.net.MessagingServiceMBeanImpl
getBackPressurePerHost, getDroppedMessages, getGossipMessageCompletedTasks, getGossipMessageCompletedTasksWithPort, getGossipMessageDroppedTasks, getGossipMessageDroppedTasksWithPort, getGossipMessagePendingTasks, getGossipMessagePendingTasksWithPort, getLargeMessageCompletedTasks, getLargeMessageCompletedTasksWithPort, getLargeMessageDroppedTasks, getLargeMessageDroppedTasksWithPort, getLargeMessagePendingTasks, getLargeMessagePendingTasksWithPort, getSmallMessageCompletedTasks, getSmallMessageCompletedTasksWithPort, getSmallMessageDroppedTasks, getSmallMessageDroppedTasksWithPort, getSmallMessagePendingTasks, getSmallMessagePendingTasksWithPort, getTimeoutsPerHost, getTimeoutsPerHostWithPort, getTotalTimeouts, getVersion, isBackPressureEnabled, reloadSslCertificates, setBackPressureEnabled
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.cassandra.net.MessageDelivery
respondWithFailure
-
-
-
-
Field Detail
-
VERSION_30
@Deprecated(since="5.0") public static final int VERSION_30
Deprecated.See CASSANDRA-18816- See Also:
- Constant Field Values
-
VERSION_3014
@Deprecated(since="5.0") public static final int VERSION_3014
Deprecated.See CASSANDRA-18816- See Also:
- Constant Field Values
-
VERSION_40
public static final int VERSION_40
- See Also:
- Constant Field Values
-
VERSION_50
public static final int VERSION_50
- See Also:
- Constant Field Values
-
minimum_version
public static final int minimum_version
- See Also:
- Constant Field Values
-
maximum_version
public static final int maximum_version
- See Also:
- Constant Field Values
-
current_version
public static final int current_version
-
socketFactory
public final SocketFactory socketFactory
-
latencySubscribers
public final LatencySubscribers latencySubscribers
-
callbacks
public final RequestCallbacks callbacks
-
inboundSink
public final InboundSink inboundSink
-
outboundSink
public final OutboundSink outboundSink
-
-
Method Detail
-
getVersionOrdinal
public static int getVersionOrdinal(int version)
This is an optimisation to speed up the translation of the serialization version to theMessagingService.Version
enum ordinal.- Parameters:
version
- the serialization version- Returns:
- a
MessagingService.Version
ordinal value
-
instance
public static MessagingService instance()
-
sendWithResult
public <REQ,RSP> Future<Message<RSP>> sendWithResult(Message<REQ> message, InetAddressAndPort to)
- Specified by:
sendWithResult
in interfaceMessageDelivery
-
sendWithCallback
public void sendWithCallback(Message message, InetAddressAndPort to, RequestCallback cb)
Send a non-mutation message to a given endpoint. This method specifies a callback which is invoked with the actual response.- Specified by:
sendWithCallback
in interfaceMessageDelivery
- Parameters:
message
- message to be sent.to
- endpoint to which the message needs to be sentcb
- callback interface which is used to pass the responses or suggest that a timeout occurred to the invoker of the send().
-
sendWithCallback
public void sendWithCallback(Message message, InetAddressAndPort to, RequestCallback cb, ConnectionType specifyConnection)
- Specified by:
sendWithCallback
in interfaceMessageDelivery
-
sendWriteWithCallback
public void sendWriteWithCallback(Message message, Replica to, AbstractWriteResponseHandler<?> handler)
Send a mutation message or a Paxos Commit to a given endpoint. This method specifies a callback which is invoked with the actual response. Also holds the message (only mutation messages) to determine if it needs to trigger a hint (uses StorageProxy for that).- Parameters:
message
- message to be sent.to
- endpoint to which the message needs to be senthandler
- callback interface which is used to pass the responses or suggest that a timeout occurred to the invoker of the send().
-
send
public void send(Message message, InetAddressAndPort to)
Send a message to a given endpoint. This method adheres to the fire and forget style messaging.- Specified by:
send
in interfaceMessageDelivery
- Parameters:
message
- messages to be sent.to
- endpoint to which the message needs to be sent
-
respond
public <V> void respond(V response, Message<?> message)
Send a message to a given endpoint. This method adheres to the fire and forget style messaging.- Specified by:
respond
in interfaceMessageDelivery
- Parameters:
message
- messages to be sent.response
-
-
send
public void send(Message message, InetAddressAndPort to, ConnectionType specifyConnection)
-
closeOutbound
public void closeOutbound(InetAddressAndPort to)
Only to be invoked once we believe the endpoint will never be contacted again. We close the connection after a five minute delay, to give asynchronous operations a chance to terminate
-
removeInbound
public void removeInbound(InetAddressAndPort from)
Only to be invoked once we believe the connections will never be used again.
-
interruptOutbound
public void interruptOutbound(InetAddressAndPort to)
Closes any current open channel/connection to the endpoint, but does not cause any message loss, and we will try to re-establish connections immediately
-
maybeReconnectWithNewIp
public io.netty.util.concurrent.Future<java.lang.Void> maybeReconnectWithNewIp(InetAddressAndPort address, InetAddressAndPort preferredAddress)
Reconnect to the peer using the givenaddr
. Outstanding messages in each channel will be sent on the current channel. Typically this function is used for something like EC2 public IP addresses which need to be used for communication between EC2 regions.- Parameters:
address
- IP Address to identify the peerpreferredAddress
- IP Address to use (and prefer) going forward for connecting to the peer
-
shutdown
public void shutdown()
Wait for callbacks and don't allow anymore to be created (since they could require writing hints)
-
shutdown
public void shutdown(long timeout, java.util.concurrent.TimeUnit units, boolean shutdownGracefully, boolean shutdownExecutors)
-
shutdownAbrubtly
public void shutdownAbrubtly()
-
listen
public void listen()
-
waitUntilListening
public void waitUntilListening() throws java.lang.InterruptedException
- Throws:
java.lang.InterruptedException
-
-