A Hub[A]
is a concurrent data structure that allows multiple publishers to
publish A
values and multiple subscribers to poll A
values with the
guarantee that all subscribers will receive all values published to the hub
while they are subscribed.
A MutableConcurrentQueue interface to use under the hood in ZIO.
A MutableConcurrentQueue interface to use under the hood in ZIO.
The implementation at minimum:
this is declared as abstract class
since invokevirtual
is slightly
cheaper than invokeinterface
.
This is a specialized implementation of MutableConcurrentQueue of capacity 1.
This is a specialized implementation of MutableConcurrentQueue of capacity 1. Since capacity 1 queues are by default used under the hood in Streams as intermediate resource they should be very cheap to create and throw away. Hence this queue is optimized (unlike RingBuffer*) for a very small footprint, while still being plenty fast.
Allocating an object takes only 24 bytes + 8+ bytes in long adder (so 32+ bytes total), which is 15x less than the smallest RingBuffer.
zio.internal.OneElementConcurrentQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 int OneElementConcurrentQueue.capacity 16 4 java.util.concurrent.atomic.AtomicReference OneElementConcurrentQueue.ref 20 4 java.util.concurrent.atomic.LongAdder OneElementConcurrentQueue.deqAdder Instance size: 24 bytes Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
A lock-free array-based bounded queue.
A lock-free array-based bounded queue. It is thread-safe and can be used in multiple-producer/multiple-consumer (MPMC) setting.
A simple array-based queue of size N uses an array buf
of size N as an
underlying storage. There are 2 pointers head
and tail
. The element is
enqueued into buf
at position tail % N
and dequeued from head % N
. Each
time an enqueue happens tail
is incremented, similarly when dequeue happens
head
is incremented.
Since pointers wrap around the array as they get incremented such data structure is also called a circular buffer or a ring buffer.
Because queue is bounded, enqueue and dequeue may fail, which is captured in
the semantics of offer
and poll
methods.
Using offer
as an example, the algorithm can be broken down roughly into
three steps:
Steps 1 and 2 are usually done in a loop to accommodate the possibility of failure due to race. Depending on the implementation of these steps the resulting queue will have different characteristics. For instance, the more sub-steps are between reserve and publish in step 2, the higher is the chance that one thread will delay other threads due to being descheduled.
The queue uses a buf
array to store elements. It uses seq
array to store
longs which serve as:
buf
via volatile write (can even
be relaxed to ordered store).See comments in offer
/poll
methods for more details on seq
.
The benefit of using seq
+ head
/tail
counters is that there are no
allocations during enqueue/dequeue and very little overhead. The downside is
it doubles (on 64bit) or triples (compressed OOPs) the amount of memory
needed for a queue.
Concurrent enqueues and concurrent dequeues are possible. However there is no helping, so threads can delay other threads, and thus the queue doesn't provide full set of lock-free guarantees. In practice it's usually not a problem, since benefits are simplicity, zero GC pressure and speed.
There are 2 implementations of a RingBuffer:
RingBufferArb
that supports queues with arbitrary capacity;RingBufferPow2
that supports queues with only power of 2 capacities.The reason is head % N
and tail % N
are rather cheap when can be done as
a simple mask (N is pow 2), and pretty expensive when involve an idiv
instruction. The difference is especially pronounced in tight loops (see.
RoundtripBenchmark).
To ensure good performance reads/writes to head
and tail
fields need to
be independent, e.g. they shouldn't fall on the same (adjacent) cache-line.
We can make those counters regular volatile long fields and space them out,
but we still need a way to do CAS on them. The only way to do this except
Unsafe
is to use AtomicLongFieldUpdater
, which is exactly what we have
here.
zio.internal.MutableQueueFieldsPadding for more details on padding and
object's memory layout.
The design is heavily inspired by such libraries as
https://github.com/LMAX-Exchange/disruptor and
https://github.com/JCTools/JCTools which is based off D. Vyukov's design
http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
Compared to JCTools this implementation doesn't rely on sun.misc.Unsafe
, so
it is arguably more portable, and should be easier to read. It's also very
extensively commented, including reasoning, assumptions, and hacks.
There is an alternative design described in the paper A Portable Lock-Free Bounded Queue by Pirkelbauer et al. It provides full lock-free guarantees, which generally means that one out of many contending threads is guaranteed to make progress in a finite number of steps. The design thus is not susceptible to threads delaying other threads. However the helping scheme is rather involved and cannot be implemented without allocations (at least I couldn't come up with a way yet). This translates into worse performance on average, and better performance in some very specific situations.
This can be used whenever an arbitrary number of unique keys needs to be generated as this will just use memory location for equality.
A WeakConcurrentBag stores a collection of values, each wrapped in a
WeakReference
.
A WeakConcurrentBag stores a collection of values, each wrapped in a
WeakReference
. The structure is optimized for addition, and will achieve
zero allocations in the happy path (aside from the allocation of the
WeakReference
, which is unavoidable). To remove a value from the bag, it is
sufficient to clear the corresponding weak reference, at which point the weak
reference will be removed from the bag during the next garbage collection.
Garbage collection happens regularly during the add
operation. Assuming
uniform distribution of hash codes of values added to the bag, the chance of
garbage collection occurring during an add
operation is 1/n, where n
is
the capacity of the table backing the bag.
(Since version 2.0.0) use Executor
(Since version 2.0.0) use RuntimeConfig
A List
data type that tries to avoid allocating by special-casing the
singleton list and preventing pattern matching.
Returns an effect that models success with the specified value.
Returns an STM
effect that succeeds with the specified value.
(Since version 2.0.0) use Executor