package impls
- Alphabetic
- Public
- All
Type Members
-
class
OneElementConcurrentQueue[A] extends MutableConcurrentQueue[A] with Serializable
This is a specialized implementation of MutableConcurrentQueue of capacity 1.
This is a specialized implementation of MutableConcurrentQueue of capacity 1. Since capacity 1 queues are by default used under the hood in Streams as intermediate resource they should be very cheap to create and throw away. Hence this queue is optimized (unlike RingBuffer*) for a very small footprint, while still being plenty fast.
Allocating an object takes only 24 bytes + 8+ bytes in long adder (so 32+ bytes total), which is 15x less than the smallest RingBuffer.
zio.internal.impls.OneElementConcurrentQueue object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 int OneElementConcurrentQueue.capacity 16 4 java.util.concurrent.atomic.AtomicReference OneElementConcurrentQueue.ref 20 4 java.util.concurrent.atomic.LongAdder OneElementConcurrentQueue.deqAdder Instance size: 24 bytes Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
-
abstract
class
RingBuffer[A] extends MutableQueueFieldsPadding[A] with Serializable
A lock-free array based bounded queue.
A lock-free array based bounded queue. It is thread-safe and can be used in multiple-producer/multiple-consumer (MPMC) setting.
Main concepts
A simple array based queue of size N uses an array
buf
of size N as an underlying storage. There are 2 pointershead
andtail
. The element is enqueued intobuf
at positiontail % N
and dequeued fromhead % N
. Each time an enqueue happenstail
is incremented, similarly when dequeue happenshead
is incremented.Since pointers wrap around the array as they get incremented such data structure is also called a circular buffer or a ring buffer.
Because queue is bounded, enqueue and dequeue may fail, which is captured in the semantics of
offer
andpoll
methods.Using
offer
as an example, the algorithm can be broken down roughly into three steps:- Find a place to insert an element. 2. Reserve this place, put an element and make it visible to other threads (store and publish). 3. If there was no place on step 1 return false, otherwise returns true.
Steps 1 and 2 are usually done in a loop to accommodate the possibility of failure due to race. Depending on the implementation of these steps the resulting queue will have different characteristics. For instance, the more sub-steps are between reserve and publish in step 2, the higher is the chance that one thread will delay other threads due to being descheduled.
Notes on the design
The queue uses a
buf
array to store elements. It usesseq
array to store longs which serve as: 1. an indicator to producer/consumer threads whether the slot is right for enqueue/dequeue, 2. an indicator whether the queue is empty/full, 3. a mechanism to publish changes tobuf
via volatile write (can even be relaxed to ordered store). See comments inoffer
/poll
methods for more details onseq
.The benefit of using
seq
+head
/tail
counters is that there are no allocations during enqueue/dequeue and very little overhead. The downside is it doubles (on 64bit) or triples (compressed OOPs) the amount of memory needed for queue.Concurrent enqueues and concurrent dequeues are possible. However there is no helping, so threads can delay other threads, and thus the queue doesn't provide full set of lock-free guarantees. In practice it's usually not a problem, since benefits are simplicity, zero GC pressure and speed.
There are 2 implementations of a RingBuffer: 1.
RingBufferArb
that supports queues with arbitrary capacity; 2.RingBufferPow2
that supports queues with only power of 2 capacities.The reason is
head % N
andtail % N
are rather cheap when can be done as a simple mask (N is pow 2), and pretty expensive when involve anidiv
instruction. The difference is especially pronounced in tight loops (see. RoundtripBenchmark).To ensure good performance reads/writes to
head
andtail
fields need to be independant, e.g. they shouldn't fall on the same (adjacent) cache-line.We can make those counters regular volatile long fields and space them out, but we still need a way to do CAS on them. The only way to do this except
Unsafe
is to useAtomicLongFieldUpdater
, which is exactly what we have here.- See also
zio.internal.impls.padding.MutableQueueFieldsPadding for more details on padding and object's memory layout. The design is heavily inspired by such libraries as https://github.com/LMAX-Exchange/disruptor and https://github.com/JCTools/JCTools which is based off D. Vyukov's design http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue Compared to JCTools this implementation doesn't rely on
sun.misc.Unsafe
, so it is arguably more portable, and should be easier to read. It's also very extensively commented, including reasoning, assumptions, and hacks.Alternative designs
There is an alternative design described in the paper A Portable Lock-Free Bounded Queue by Pirkelbauer et al. It provides full lock-free guarantees, which generally means that one out of many contending threads is guaranteed to make progress in a finite number of steps. The design thus is not susceptible to threads delaying other threads. However the helping scheme is rather involved and cannot be implemented without allocations (at least I couldn't come up with a way yet). This translates into worse performance on average, and better performance in some very specific situations.
- class RingBufferArb[A] extends RingBuffer[A]
- class RingBufferPow2[A] extends RingBuffer[A]
Value Members
- object RingBuffer extends Serializable
- object RingBufferArb extends Serializable
- object RingBufferPow2 extends Serializable