Class ClusterState

java.lang.Object
org.elasticsearch.cluster.ClusterState
All Implemented Interfaces:
Diffable<ClusterState>, Writeable, ChunkedToXContent

public class ClusterState extends Object implements ChunkedToXContent, Diffable<ClusterState>
Represents the state of the cluster, held in memory on all nodes in the cluster with updates coordinated by the elected master.

Conceptually immutable, but in practice it has a few components like RoutingNodes which are pure functions of the immutable state but are expensive to compute so they are built on-demand if needed.

The Metadata portion is written to disk on each update so it persists across full-cluster restarts. The rest of this data is maintained only in-memory and resets back to its initial state on a full-cluster restart, but it is held on all nodes so it persists across master elections (and therefore is preserved in a rolling restart).

Updates are triggered by submitting tasks to the MasterService on the elected master, typically using a TransportMasterNodeAction to route a request to the master on which the task is submitted via a queue obtained with ClusterService.createTaskQueue(java.lang.String, org.elasticsearch.common.Priority, org.elasticsearch.cluster.ClusterStateTaskExecutor<T>), which has an associated priority. Submitted tasks have an associated timeout. Tasks are processed in priority order, so a flood of higher-priority tasks can starve lower-priority ones from running. Therefore, avoid priorities other than Priority.NORMAL where possible. Tasks associated with client actions should typically have a timeout, or otherwise be sensitive to client cancellations, to avoid surprises caused by the execution of stale tasks long after they are submitted (since clients themselves tend to time out). In contrast, internal tasks can reasonably have an infinite timeout, especially if a timeout would simply trigger a retry.

Tasks that share the same ClusterStateTaskExecutor instance are processed as a batch. Each batch of tasks yields a new ClusterState which is published to the cluster by ClusterStatePublisher.publish(org.elasticsearch.cluster.ClusterStatePublicationEvent, org.elasticsearch.action.ActionListener<java.lang.Void>, org.elasticsearch.cluster.coordination.ClusterStatePublisher.AckListener). Publication usually works by sending a diff, computed via the Diffable interface, rather than the full state, although it will fall back to sending the full state if the receiving node is new or it has missed out on an intermediate state for some reason. States and diffs are published using the transport protocol, i.e. the Writeable interface and friends.

When committed, the new state is applied which exposes it to the node via ClusterStateApplier and ClusterStateListener callbacks registered with the ClusterApplierService. The new state is also made available via ClusterService.state(). The appliers are notified (in no particular order) before ClusterService.state() is updated, and the listeners are notified (in no particular order) afterwards. Cluster state updates run in sequence, one-by-one, so they can be a performance bottleneck. See the JavaDocs on the linked classes and methods for more details.

Cluster state updates can be used to trigger various actions via a ClusterStateListener rather than using a timer.

Implements ChunkedToXContent to be exposed in REST APIs (e.g. GET _cluster/state and POST _cluster/reroute) and to be indexed by monitoring, mostly just for diagnostics purposes. The XContent representation does not need to be 100% faithful since we never reconstruct a cluster state from its XContent representation, but the more faithful it is the more useful it is for diagnostics. Note that the XContent representation of the Metadata portion does have to be faithful (in Metadata.XContentContext.GATEWAY context) since this is how it persists across full cluster restarts.

Security-sensitive data such as passwords or private keys should not be stored in the cluster state, since the contents of the cluster state are exposed in various APIs.

  • Field Details

    • EMPTY_STATE

      public static final ClusterState EMPTY_STATE
    • UNKNOWN_UUID

      public static final String UNKNOWN_UUID
      See Also:
    • UNKNOWN_VERSION

      public static final long UNKNOWN_VERSION
      See Also:
    • VERSION_INTRODUCING_TRANSPORT_VERSIONS

      public static final Version VERSION_INTRODUCING_TRANSPORT_VERSIONS
  • Constructor Details

  • Method Details

    • term

      public long term()
    • version

      public long version()
    • getVersion

      public long getVersion()
    • stateUUID

      public String stateUUID()
      This stateUUID is automatically generated for for each version of cluster state. It is used to make sure that we are applying diffs to the right previous state.
    • nodes

      public DiscoveryNodes nodes()
    • getNodes

      public DiscoveryNodes getNodes()
    • nodesIfRecovered

      public DiscoveryNodes nodesIfRecovered()
      Returns the set of nodes that should be exposed to things like REST handlers that behave differently depending on the nodes in the cluster and their versions. Specifically, if the cluster has properly formed then this is the nodes in the last-applied cluster state, but if the cluster has not properly formed then no nodes are returned.
      Returns:
      the nodes in the cluster if the cluster has properly formed, otherwise an empty set of nodes.
    • clusterRecovered

      public boolean clusterRecovered()
    • compatibilityVersions

      public Map<String,CompatibilityVersions> compatibilityVersions()
    • hasMixedSystemIndexVersions

      public boolean hasMixedSystemIndexVersions()
    • getMinTransportVersion

      public TransportVersion getMinTransportVersion()
      Returns:
      the minimum TransportVersion that will be used for all future intra-cluster node-to-node communications. This value only ever increases, so if v.onOrAfter(cs.getMinTransportVersion()) is true once then it will remain true in the future.

      There are some subtle exceptions:

      • The "only ever increases" property is handled by the master node using the in-memory (ephemeral) part of the ClusterState only, so in theory a full restart of a mixed-version cluster may lose that state and allow some nodes to see this value decrease. For this to happen in practice requires some fairly unlucky timing during the initial master election. We tell users not to do this: if something breaks during a rolling upgrade then they should upgrade all remaining nodes to continue. But we do not enforce it.
      • The "used for all node-to-node communications" is false in a disordered upgrade (an upgrade to a semantically-newer but chronologically-older version) because for each connection between such nodes we will use TransportVersion.bestKnownVersion() to pick a transport version which is known by both endpoints. We tell users not to do disordered upgrades too, but do not enforce it.

      Note also that node-to-node communications which are not intra-cluster (i.e. they are not between nodes in the same cluster) may sometimes use an earlier TransportVersion than this value. This includes remote-cluster communication, and communication with nodes that are just starting up or otherwise are attempting to join this cluster.

    • getMinSystemIndexMappingVersions

      public Map<String,SystemIndexDescriptor.MappingsVersion> getMinSystemIndexMappingVersions()
    • clusterFeatures

      public ClusterFeatures clusterFeatures()
    • metadata

      public Metadata metadata()
    • getMetadata

      public Metadata getMetadata()
    • coordinationMetadata

      public CoordinationMetadata coordinationMetadata()
    • globalRoutingTable

      public GlobalRoutingTable globalRoutingTable()
    • routingTable

      public RoutingTable routingTable(ProjectId projectId)
    • routingTable

      @Deprecated(forRemoval=true) public RoutingTable routingTable()
      Deprecated, for removal: This API element is subject to removal in a future version.
    • getRoutingTable

      @Deprecated(forRemoval=true) public RoutingTable getRoutingTable()
      Deprecated, for removal: This API element is subject to removal in a future version.
    • blocks

      public ClusterBlocks blocks()
    • getBlocks

      public ClusterBlocks getBlocks()
    • customs

      public Map<String,ClusterState.Custom> customs()
    • getCustoms

      public Map<String,ClusterState.Custom> getCustoms()
    • custom

      public <T extends ClusterState.Custom> T custom(String type)
    • custom

      public <T extends ClusterState.Custom> T custom(String type, T defaultValue)
    • getClusterName

      public ClusterName getClusterName()
    • getLastAcceptedConfiguration

      public CoordinationMetadata.VotingConfiguration getLastAcceptedConfiguration()
    • getLastCommittedConfiguration

      public CoordinationMetadata.VotingConfiguration getLastCommittedConfiguration()
    • getVotingConfigExclusions

      public Set<CoordinationMetadata.VotingConfigExclusion> getVotingConfigExclusions()
    • getRoutingNodes

      public RoutingNodes getRoutingNodes()
      Returns a built (on demand) routing nodes view of the routing table.
    • mutableRoutingNodes

      public RoutingNodes mutableRoutingNodes()
      Returns a fresh mutable copy of the routing nodes view.
    • initializeAsync

      public void initializeAsync(Executor executor)
      Initialize data structures that lazy computed for this instance in the background by using the giving executor.
      Parameters:
      executor - executor to run initialization tasks on
    • projectState

      @Deprecated(forRemoval=true) public ProjectState projectState()
      Deprecated, for removal: This API element is subject to removal in a future version.
    • projectState

      public ProjectState projectState(ProjectId projectId)
    • forEachProject

      public <E extends Exception> void forEachProject(CheckedConsumer<ProjectState,E> action) throws E
      Throws:
      E
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • supersedes

      public boolean supersedes(ClusterState other)
      a cluster state supersedes another state if they are from the same master and the version of this state is higher than that of the other state.

      In essence that means that all the changes from the other cluster state are also reflected by the current one

    • toXContentChunked

      public Iterator<? extends ToXContent> toXContentChunked(ToXContent.Params outerParams)
      Description copied from interface: ChunkedToXContent
      Create an iterator of ToXContent chunks for a REST response. Each chunk is serialized with the same XContentBuilder and ToXContent.Params, which is also the same as the ToXContent.Params passed as the params argument. For best results, all chunks should be O(1) size. The last chunk in the iterator must always yield at least one byte of output. See also ChunkedToXContentHelper for some handy utilities.

      Note that chunked response bodies cannot send deprecation warning headers once transmission has started, so implementations must check for deprecated feature use before returning.

      Specified by:
      toXContentChunked in interface ChunkedToXContent
      Returns:
      iterator over chunks of ToXContent
    • builder

      public static ClusterState.Builder builder(ClusterName clusterName)
    • builder

      public static ClusterState.Builder builder(ClusterState state)
    • copyAndUpdate

      public ClusterState copyAndUpdate(Consumer<ClusterState.Builder> updater)
    • copyAndUpdateMetadata

      public ClusterState copyAndUpdateMetadata(Consumer<Metadata.Builder> updater)
    • copyAndUpdateProject

      public ClusterState copyAndUpdateProject(ProjectId projectId, Consumer<ProjectMetadata.Builder> updater)
    • diff

      public Diff<ClusterState> diff(ClusterState previousState)
      Description copied from interface: Diffable
      Returns serializable object representing differences between this and previousState
      Specified by:
      diff in interface Diffable<ClusterState>
    • readDiffFrom

      public static Diff<ClusterState> readDiffFrom(StreamInput in, DiscoveryNode localNode) throws IOException
      Throws:
      IOException
    • readFrom

      public static ClusterState readFrom(StreamInput in, DiscoveryNode localNode) throws IOException
      Throws:
      IOException
    • writeTo

      public void writeTo(StreamOutput out) throws IOException
      Description copied from interface: Writeable
      Write this into the StreamOutput.
      Specified by:
      writeTo in interface Writeable
      Throws:
      IOException