Package org.apache.cassandra.service
Class ActiveRepairService
- java.lang.Object
-
- org.apache.cassandra.service.ActiveRepairService
-
- All Implemented Interfaces:
IEndpointStateChangeSubscriber
,IFailureDetectionEventListener
,ActiveRepairServiceMBean
public class ActiveRepairService extends java.lang.Object implements IEndpointStateChangeSubscriber, IFailureDetectionEventListener, ActiveRepairServiceMBean
ActiveRepairService is the starting point for manual "active" repairs. Each user triggered repair will correspond to one or multiple repair session, one for each token range to repair. On repair session might repair multiple column families. For each of those column families, the repair session will request merkle trees for each replica of the range being repaired, diff those trees upon receiving them, schedule the streaming ofthe parts to repair (based on the tree diffs) and wait for all those operation. See RepairSession for more details. The creation of a repair session is done through the submitRepairSession that returns a future on the completion of that session.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ActiveRepairService.ConsistentSessions
static class
ActiveRepairService.ParentRepairSession
We keep a ParentRepairSession around for the duration of the entire repair, for example, on a 256 token vnode rf=3 cluster we would have 768 RepairSession but only one ParentRepairSession.static class
ActiveRepairService.ParentRepairStatus
static class
ActiveRepairService.RepairCommandExecutorHandle
-
Field Summary
Fields Modifier and Type Field Description ActiveRepairService.ConsistentSessions
consistent
SharedContext
ctx
static TimeUUID
NO_PENDING_REPAIR
ExecutorPlus
snapshotExecutor
static long
UNREPAIRED_SSTABLE
-
Fields inherited from interface org.apache.cassandra.service.ActiveRepairServiceMBean
MBEAN_NAME
-
-
Constructor Summary
Constructors Constructor Description ActiveRepairService()
ActiveRepairService(SharedContext ctx)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
abort(java.util.function.Predicate<ActiveRepairService.ParentRepairSession> predicate, java.lang.String message)
Remove any parent repair sessions matching predicatevoid
beforeChange(InetAddressAndPort endpoint, EndpointState currentState, ApplicationState newStateKey, VersionedValue newValue)
void
cleanUp(TimeUUID parentRepairSession, java.util.Set<InetAddressAndPort> endpoints)
Send Verb.CLEANUP_MSG to the given endpoints.java.util.List<javax.management.openmbean.CompositeData>
cleanupPending(java.util.List<java.lang.String> schemaArgs, java.lang.String rangeString, boolean force)
void
clearLocalRepairState()
void
convict(InetAddressAndPort ep, double phi)
Something has happened to a remote node - if that node is a coordinator, we mark the parent repair session id as failed.CoordinatorState
coordinator(TimeUUID id)
java.util.Collection<CoordinatorState>
coordinators()
void
failSession(java.lang.String session, boolean force)
int
getConcurrentMerkleTreeRequests()
EndpointsForRange
getNeighbors(java.lang.String keyspaceName, java.lang.Iterable<Range<Token>> keyspaceLocalRanges, Range<Token> toRepair, java.util.Collection<java.lang.String> dataCenters, java.util.Collection<java.lang.String> hosts)
Return all of the neighbors with whom we share the provided range.ActiveRepairService.ParentRepairSession
getParentRepairSession(TimeUUID parentSessionId)
We assume when calling this method that a parent session for the provided identifier exists, and that session is still in progress.int
getPaxosRepairParallelism()
java.util.List<javax.management.openmbean.CompositeData>
getPendingStats(java.util.List<java.lang.String> schemaArgs, java.lang.String rangeString)
int
getRepairPendingCompactionRejectThreshold()
int
getRepairSessionSpaceInMebibytes()
Deprecated.See CASSANDRA-17668int
getRepairSessionSpaceInMegabytes()
Deprecated.See CASSANDRA-15234int
getRepairSessionSpaceInMiB()
java.util.List<javax.management.openmbean.CompositeData>
getRepairStats(java.util.List<java.lang.String> schemaArgs, java.lang.String rangeString)
Pair<ActiveRepairService.ParentRepairStatus,java.util.List<java.lang.String>>
getRepairStatus(java.lang.Integer cmd)
java.util.List<java.util.Map<java.lang.String,java.lang.String>>
getSessions(boolean all, java.lang.String rangesStr)
boolean
getUseOffheapMerkleTrees()
void
handleMessage(Message<? extends RepairMessage> message)
static ActiveRepairService
instance()
void
onAlive(InetAddressAndPort endpoint, EndpointState state)
void
onChange(InetAddressAndPort endpoint, ApplicationState state, VersionedValue value)
void
onDead(InetAddressAndPort endpoint, EndpointState state)
void
onJoin(InetAddressAndPort endpoint, EndpointState epState)
Use to inform interested parties about the change in the state for specified endpointvoid
onRemove(InetAddressAndPort endpoint)
void
onRestart(InetAddressAndPort endpoint, EndpointState state)
Called whenever a node is restarted.int
parentRepairSessionCount()
int
parentRepairSessionsCount()
Each ongoing repair (incremental and non-incremental) is represented by aActiveRepairService.ParentRepairSession
entry in theActiveRepairService
cache.ParticipateState
participate(TimeUUID id)
java.util.Collection<ParticipateState>
participates()
Future<?>
prepareForRepair(TimeUUID parentRepairSession, InetAddressAndPort coordinator, java.util.Set<InetAddressAndPort> endpoints, RepairOption options, boolean isForcedRepair, java.util.List<ColumnFamilyStore> columnFamilyStores)
void
recordRepairStatus(int cmd, ActiveRepairService.ParentRepairStatus parentRepairStatus, java.util.List<java.lang.String> messages)
void
register(CoordinatorState state)
boolean
register(ParticipateState state)
void
registerParentRepairSession(TimeUUID parentRepairSession, InetAddressAndPort coordinator, java.util.List<ColumnFamilyStore> columnFamilyStores, java.util.Collection<Range<Token>> ranges, boolean isIncremental, long repairedAt, boolean isGlobal, PreviewKind previewKind)
ActiveRepairService.ParentRepairSession
removeParentRepairSession(TimeUUID parentSessionId)
called when the repair session is done - either failed or anticompaction has completedstatic ExecutorPlus
repairCommandExecutor()
Future<?>
repairPaxosForTopologyChange(java.lang.String ksName, java.util.Collection<Range<Token>> ranges, java.lang.String reason)
int
sessionCount()
void
setConcurrentMerkleTreeRequests(int value)
void
setPaxosRepairParallelism(int v)
void
setRepairPendingCompactionRejectThreshold(int value)
void
setRepairSessionSpaceInMebibytes(int sizeInMebibytes)
Deprecated.See CASSANDRA-17668void
setRepairSessionSpaceInMegabytes(int sizeInMegabytes)
Deprecated.See CASSANDRA-15234void
setRepairSessionSpaceInMiB(int sizeInMebibytes)
void
setUseOffheapMerkleTrees(boolean value)
void
shutdownNowAndWait(long timeout, java.util.concurrent.TimeUnit unit)
void
start()
void
stop()
RepairSession
submitRepairSession(TimeUUID parentRepairSession, CommonRange range, java.lang.String keyspace, RepairParallelism parallelismDegree, boolean isIncremental, boolean pullRepair, PreviewKind previewKind, boolean optimiseStreams, boolean repairPaxos, boolean paxosOnly, ExecutorPlus executor, Scheduler validationScheduler, java.lang.String... cfnames)
Requests repairs for the given keyspace and column families.void
terminateSessions()
ValidationState
validation(java.util.UUID id)
java.util.Collection<ValidationState>
validations()
boolean
verifyCompactionsPendingThreshold(TimeUUID parentRepairSession, PreviewKind previewKind)
-
-
-
Field Detail
-
consistent
public final ActiveRepairService.ConsistentSessions consistent
-
UNREPAIRED_SSTABLE
public static final long UNREPAIRED_SSTABLE
- See Also:
- Constant Field Values
-
NO_PENDING_REPAIR
public static final TimeUUID NO_PENDING_REPAIR
-
ctx
public final SharedContext ctx
-
snapshotExecutor
public final ExecutorPlus snapshotExecutor
-
-
Constructor Detail
-
ActiveRepairService
public ActiveRepairService()
-
ActiveRepairService
public ActiveRepairService(SharedContext ctx)
-
-
Method Detail
-
instance
public static ActiveRepairService instance()
-
repairCommandExecutor
public static ExecutorPlus repairCommandExecutor()
-
start
public void start()
-
clearLocalRepairState
public void clearLocalRepairState()
-
stop
public void stop()
-
getSessions
public java.util.List<java.util.Map<java.lang.String,java.lang.String>> getSessions(boolean all, java.lang.String rangesStr)
- Specified by:
getSessions
in interfaceActiveRepairServiceMBean
-
failSession
public void failSession(java.lang.String session, boolean force)
- Specified by:
failSession
in interfaceActiveRepairServiceMBean
-
setRepairSessionSpaceInMegabytes
@Deprecated(since="4.1") public void setRepairSessionSpaceInMegabytes(int sizeInMegabytes)
Deprecated.See CASSANDRA-15234- Specified by:
setRepairSessionSpaceInMegabytes
in interfaceActiveRepairServiceMBean
-
getRepairSessionSpaceInMegabytes
@Deprecated(since="4.1") public int getRepairSessionSpaceInMegabytes()
Deprecated.See CASSANDRA-15234- Specified by:
getRepairSessionSpaceInMegabytes
in interfaceActiveRepairServiceMBean
-
setRepairSessionSpaceInMebibytes
@Deprecated(since="4.1") public void setRepairSessionSpaceInMebibytes(int sizeInMebibytes)
Deprecated.See CASSANDRA-17668- Specified by:
setRepairSessionSpaceInMebibytes
in interfaceActiveRepairServiceMBean
-
getRepairSessionSpaceInMebibytes
@Deprecated(since="4.1") public int getRepairSessionSpaceInMebibytes()
Deprecated.See CASSANDRA-17668- Specified by:
getRepairSessionSpaceInMebibytes
in interfaceActiveRepairServiceMBean
-
setRepairSessionSpaceInMiB
public void setRepairSessionSpaceInMiB(int sizeInMebibytes)
- Specified by:
setRepairSessionSpaceInMiB
in interfaceActiveRepairServiceMBean
-
getRepairSessionSpaceInMiB
public int getRepairSessionSpaceInMiB()
- Specified by:
getRepairSessionSpaceInMiB
in interfaceActiveRepairServiceMBean
-
getConcurrentMerkleTreeRequests
public int getConcurrentMerkleTreeRequests()
- Specified by:
getConcurrentMerkleTreeRequests
in interfaceActiveRepairServiceMBean
-
setConcurrentMerkleTreeRequests
public void setConcurrentMerkleTreeRequests(int value)
- Specified by:
setConcurrentMerkleTreeRequests
in interfaceActiveRepairServiceMBean
-
getRepairStats
public java.util.List<javax.management.openmbean.CompositeData> getRepairStats(java.util.List<java.lang.String> schemaArgs, java.lang.String rangeString)
- Specified by:
getRepairStats
in interfaceActiveRepairServiceMBean
-
getPendingStats
public java.util.List<javax.management.openmbean.CompositeData> getPendingStats(java.util.List<java.lang.String> schemaArgs, java.lang.String rangeString)
- Specified by:
getPendingStats
in interfaceActiveRepairServiceMBean
-
cleanupPending
public java.util.List<javax.management.openmbean.CompositeData> cleanupPending(java.util.List<java.lang.String> schemaArgs, java.lang.String rangeString, boolean force)
- Specified by:
cleanupPending
in interfaceActiveRepairServiceMBean
-
parentRepairSessionsCount
public int parentRepairSessionsCount()
Description copied from interface:ActiveRepairServiceMBean
Each ongoing repair (incremental and non-incremental) is represented by aActiveRepairService.ParentRepairSession
entry in theActiveRepairService
cache. Returns the current number of ongoing repairs (the current number of cached entries).- Specified by:
parentRepairSessionsCount
in interfaceActiveRepairServiceMBean
- Returns:
- current size of the internal cache holding
ActiveRepairService.ParentRepairSession
instances
-
submitRepairSession
public RepairSession submitRepairSession(TimeUUID parentRepairSession, CommonRange range, java.lang.String keyspace, RepairParallelism parallelismDegree, boolean isIncremental, boolean pullRepair, PreviewKind previewKind, boolean optimiseStreams, boolean repairPaxos, boolean paxosOnly, ExecutorPlus executor, Scheduler validationScheduler, java.lang.String... cfnames)
Requests repairs for the given keyspace and column families.- Returns:
- Future for asynchronous call or null if there is no need to repair
-
getUseOffheapMerkleTrees
public boolean getUseOffheapMerkleTrees()
- Specified by:
getUseOffheapMerkleTrees
in interfaceActiveRepairServiceMBean
-
setUseOffheapMerkleTrees
public void setUseOffheapMerkleTrees(boolean value)
- Specified by:
setUseOffheapMerkleTrees
in interfaceActiveRepairServiceMBean
-
terminateSessions
public void terminateSessions()
-
recordRepairStatus
public void recordRepairStatus(int cmd, ActiveRepairService.ParentRepairStatus parentRepairStatus, java.util.List<java.lang.String> messages)
-
getRepairStatus
public Pair<ActiveRepairService.ParentRepairStatus,java.util.List<java.lang.String>> getRepairStatus(java.lang.Integer cmd)
-
getNeighbors
public EndpointsForRange getNeighbors(java.lang.String keyspaceName, java.lang.Iterable<Range<Token>> keyspaceLocalRanges, Range<Token> toRepair, java.util.Collection<java.lang.String> dataCenters, java.util.Collection<java.lang.String> hosts)
Return all of the neighbors with whom we share the provided range.- Parameters:
keyspaceName
- keyspace to repairkeyspaceLocalRanges
- local-range for given keyspaceNametoRepair
- token to repairdataCenters
- the data centers to involve in the repair- Returns:
- neighbors with whom we share the provided range
-
verifyCompactionsPendingThreshold
public boolean verifyCompactionsPendingThreshold(TimeUUID parentRepairSession, PreviewKind previewKind)
-
prepareForRepair
public Future<?> prepareForRepair(TimeUUID parentRepairSession, InetAddressAndPort coordinator, java.util.Set<InetAddressAndPort> endpoints, RepairOption options, boolean isForcedRepair, java.util.List<ColumnFamilyStore> columnFamilyStores)
-
cleanUp
public void cleanUp(TimeUUID parentRepairSession, java.util.Set<InetAddressAndPort> endpoints)
Send Verb.CLEANUP_MSG to the given endpoints. This results in removing parent session object from the endpoint's cache. This method does not throw an exception in case of a messaging failure.
-
registerParentRepairSession
public void registerParentRepairSession(TimeUUID parentRepairSession, InetAddressAndPort coordinator, java.util.List<ColumnFamilyStore> columnFamilyStores, java.util.Collection<Range<Token>> ranges, boolean isIncremental, long repairedAt, boolean isGlobal, PreviewKind previewKind)
-
getParentRepairSession
public ActiveRepairService.ParentRepairSession getParentRepairSession(TimeUUID parentSessionId) throws NoSuchRepairSessionException
We assume when calling this method that a parent session for the provided identifier exists, and that session is still in progress. When it doesn't, that should mean eitherabort(Predicate, String)
orfailRepair(TimeUUID, String)
have removed it.- Parameters:
parentSessionId
- an identifier for an active parent repair session- Returns:
- the
ActiveRepairService.ParentRepairSession
associated with the provided identifier - Throws:
NoSuchRepairSessionException
- if the provided identifier does not map to an active parent session
-
removeParentRepairSession
public ActiveRepairService.ParentRepairSession removeParentRepairSession(TimeUUID parentSessionId)
called when the repair session is done - either failed or anticompaction has completedclears out any snapshots created by this repair
- Parameters:
parentSessionId
- an identifier for an active parent repair session- Returns:
- the
ActiveRepairService.ParentRepairSession
associated with the provided identifier - See Also:
CassandraTableRepairManager.snapshot(String, Collection, boolean)
-
handleMessage
public void handleMessage(Message<? extends RepairMessage> message)
-
onJoin
public void onJoin(InetAddressAndPort endpoint, EndpointState epState)
Description copied from interface:IEndpointStateChangeSubscriber
Use to inform interested parties about the change in the state for specified endpoint- Specified by:
onJoin
in interfaceIEndpointStateChangeSubscriber
- Parameters:
endpoint
- endpoint for which the state change occurred.epState
- state that actually changed for the above endpoint.
-
beforeChange
public void beforeChange(InetAddressAndPort endpoint, EndpointState currentState, ApplicationState newStateKey, VersionedValue newValue)
- Specified by:
beforeChange
in interfaceIEndpointStateChangeSubscriber
-
onChange
public void onChange(InetAddressAndPort endpoint, ApplicationState state, VersionedValue value)
- Specified by:
onChange
in interfaceIEndpointStateChangeSubscriber
-
onAlive
public void onAlive(InetAddressAndPort endpoint, EndpointState state)
- Specified by:
onAlive
in interfaceIEndpointStateChangeSubscriber
-
onDead
public void onDead(InetAddressAndPort endpoint, EndpointState state)
- Specified by:
onDead
in interfaceIEndpointStateChangeSubscriber
-
onRemove
public void onRemove(InetAddressAndPort endpoint)
- Specified by:
onRemove
in interfaceIEndpointStateChangeSubscriber
-
onRestart
public void onRestart(InetAddressAndPort endpoint, EndpointState state)
Description copied from interface:IEndpointStateChangeSubscriber
Called whenever a node is restarted. Note that there is no guarantee when that happens that the node was previously marked down. It will have only ifstate.isAlive() == false
asstate
is from before the restarted node is marked up.- Specified by:
onRestart
in interfaceIEndpointStateChangeSubscriber
-
convict
public void convict(InetAddressAndPort ep, double phi)
Something has happened to a remote node - if that node is a coordinator, we mark the parent repair session id as failed.The fail marker is kept in the map for 24h to make sure that if the coordinator does not agree that the repair failed, we need to fail the entire repair session
- Specified by:
convict
in interfaceIFailureDetectionEventListener
- Parameters:
ep
- endpoint to be convictedphi
- the value of phi with with ep was convicted
-
getRepairPendingCompactionRejectThreshold
public int getRepairPendingCompactionRejectThreshold()
- Specified by:
getRepairPendingCompactionRejectThreshold
in interfaceActiveRepairServiceMBean
-
setRepairPendingCompactionRejectThreshold
public void setRepairPendingCompactionRejectThreshold(int value)
- Specified by:
setRepairPendingCompactionRejectThreshold
in interfaceActiveRepairServiceMBean
-
abort
public void abort(java.util.function.Predicate<ActiveRepairService.ParentRepairSession> predicate, java.lang.String message)
Remove any parent repair sessions matching predicate
-
parentRepairSessionCount
public int parentRepairSessionCount()
-
sessionCount
public int sessionCount()
-
repairPaxosForTopologyChange
public Future<?> repairPaxosForTopologyChange(java.lang.String ksName, java.util.Collection<Range<Token>> ranges, java.lang.String reason)
-
getPaxosRepairParallelism
public int getPaxosRepairParallelism()
- Specified by:
getPaxosRepairParallelism
in interfaceActiveRepairServiceMBean
-
setPaxosRepairParallelism
public void setPaxosRepairParallelism(int v)
- Specified by:
setPaxosRepairParallelism
in interfaceActiveRepairServiceMBean
-
shutdownNowAndWait
public void shutdownNowAndWait(long timeout, java.util.concurrent.TimeUnit unit) throws java.lang.InterruptedException, java.util.concurrent.TimeoutException
- Throws:
java.lang.InterruptedException
java.util.concurrent.TimeoutException
-
coordinators
public java.util.Collection<CoordinatorState> coordinators()
-
coordinator
public CoordinatorState coordinator(TimeUUID id)
-
register
public void register(CoordinatorState state)
-
register
public boolean register(ParticipateState state)
-
participates
public java.util.Collection<ParticipateState> participates()
-
participate
public ParticipateState participate(TimeUUID id)
-
validations
public java.util.Collection<ValidationState> validations()
-
validation
public ValidationState validation(java.util.UUID id)
-
-