trait HasLlamaCppModelProperties extends AnyRef
Contains settable model parameters for the AutoGGUFModel.
- Self Type
- HasLlamaCppModelProperties with ParamsAndFeaturesWritable with HasProtectedParams
- Grouped
- Alphabetic
- By Inheritance
- HasLlamaCppModelProperties
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- val chatTemplate: Param[String]
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @HotSpotIntrinsicCandidate() @native()
- val defragmentationThreshold: FloatParam
- val disableLog: BooleanParam
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- val flashAttention: BooleanParam
- def getChatTemplate: String
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- def getDefragmentationThreshold: Float
- def getDisableLog: Boolean
- def getFlashAttention: Boolean
- def getLogVerbosity: Int
- def getMainGpu: Int
- def getMetadata: String
Get the metadata for the model
- def getMetadataMap: Map[String, Map[String, String]]
- def getModelDraft: String
- def getModelParameters: ModelParameters
- Attributes
- protected
- def getNBatch: Int
- def getNCtx: Int
- def getNDraft: Int
- def getNGpuLayers: Int
- def getNGpuLayersDraft: Int
- def getNThreads: Int
- def getNThreadsBatch: Int
- def getNUbatch: Int
- def getNoKvOffload: Boolean
- def getNuma: String
- def getRopeFreqBase: Float
- def getRopeFreqScale: Float
- def getRopeScalingType: String
- def getSplitMode: String
- def getSystemPrompt: String
- def getUseMlock: Boolean
- def getUseMmap: Boolean
- def getYarnAttnFactor: Float
- def getYarnBetaFast: Float
- def getYarnBetaSlow: Float
- def getYarnExtFactor: Float
- def getYarnOrigCtx: Int
- val gpuSplitMode: Param[String]
Set how to split the model across GPUs
Set how to split the model across GPUs
- NONE: No GPU split
- LAYER: Split the model across GPUs by layer
- ROW: Split the model across GPUs by rows
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val logVerbosity: IntParam
- val logger: Logger
- Attributes
- protected
- val mainGpu: IntParam
- val metadata: (HasLlamaCppModelProperties.this)#ProtectedParam[String]
- val modelDraft: Param[String]
- val nBatch: IntParam
- val nCtx: IntParam
- val nDraft: IntParam
- val nGpuLayers: IntParam
- val nGpuLayersDraft: IntParam
- val nThreads: IntParam
- val nThreadsBatch: IntParam
- val nUbatch: IntParam
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val noKvOffload: BooleanParam
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- val numaStrategy: Param[String]
Set optimization strategies that help on some NUMA systems (if available)
Set optimization strategies that help on some NUMA systems (if available)
Available Strategies:
- DISABLED: No NUMA optimizations
- DISTRIBUTE: Spread execution evenly over all
- ISOLATE: Only spawn threads on CPUs on the node that execution started on
- NUMA_CTL: Use the CPU map provided by numactl
- MIRROR: Mirrors the model across NUMA nodes
- val ropeFreqBase: FloatParam
- val ropeFreqScale: FloatParam
- val ropeScalingType: Param[String]
Set the RoPE frequency scaling method, defaults to linear unless specified by the model.
Set the RoPE frequency scaling method, defaults to linear unless specified by the model.
- UNSPECIFIED: Don't use any scaling
- LINEAR: Linear scaling
- YARN: YaRN RoPE scaling
- def setChatTemplate(chatTemplate: String): HasLlamaCppModelProperties.this
The chat template to use
- def setDefragmentationThreshold(defragThold: Float): HasLlamaCppModelProperties.this
Set the KV cache defragmentation threshold
- def setDisableLog(disableLog: Boolean): HasLlamaCppModelProperties.this
- def setFlashAttention(flashAttention: Boolean): HasLlamaCppModelProperties.this
Whether to enable Flash Attention
- def setGpuSplitMode(splitMode: String): HasLlamaCppModelProperties.this
Set how to split the model across GPUs
Set how to split the model across GPUs
- NONE: No GPU split -LAYER: Split the model across GPUs by layer 2. ROW: Split the model across GPUs by rows
- def setLogVerbosity(logVerbosity: Int): HasLlamaCppModelProperties.this
Set the verbosity threshold.
Set the verbosity threshold. Messages with a higher verbosity will be ignored.
Values map to the following:
- GGML_LOG_LEVEL_NONE = 0
- GGML_LOG_LEVEL_DEBUG = 1
- GGML_LOG_LEVEL_INFO = 2
- GGML_LOG_LEVEL_WARN = 3
- GGML_LOG_LEVEL_ERROR = 4
- GGML_LOG_LEVEL_CONT = 5 (continue previous log)
- def setMainGpu(mainGpu: Int): HasLlamaCppModelProperties.this
Set the GPU that is used for scratch and small tensors
- def setMetadata(metadata: String): HasLlamaCppModelProperties.this
Set the metadata for the model
- def setModelDraft(modelDraft: String): HasLlamaCppModelProperties.this
Set the draft model for speculative decoding
- def setNBatch(nBatch: Int): HasLlamaCppModelProperties.this
Set the logical batch size for prompt processing (must be >=32 to use BLAS)
- def setNCtx(nCtx: Int): HasLlamaCppModelProperties.this
Set the size of the prompt context
- def setNDraft(nDraft: Int): HasLlamaCppModelProperties.this
Set the number of tokens to draft for speculative decoding
- def setNGpuLayers(nGpuLayers: Int): HasLlamaCppModelProperties.this
Set the number of layers to store in VRAM (-1 - use default)
- def setNGpuLayersDraft(nGpuLayersDraft: Int): HasLlamaCppModelProperties.this
Set the number of layers to store in VRAM for the draft model (-1 - use default)
- def setNThreads(nThreads: Int): HasLlamaCppModelProperties.this
Set the number of threads to use during generation
- def setNThreadsBatch(nThreadsBatch: Int): HasLlamaCppModelProperties.this
Set the number of threads to use during batch and prompt processing
- def setNUbatch(nUbatch: Int): HasLlamaCppModelProperties.this
Set the physical batch size for prompt processing (must be >=32 to use BLAS)
- def setNoKvOffload(noKvOffload: Boolean): HasLlamaCppModelProperties.this
Whether to disable KV offload
- def setNumaStrategy(numa: String): HasLlamaCppModelProperties.this
Set optimization strategies that help on some NUMA systems (if available)
Set optimization strategies that help on some NUMA systems (if available)
Available Strategies:
- DISABLED: No NUMA optimizations
- DISTRIBUTE: spread execution evenly over all
- ISOLATE: only spawn threads on CPUs on the node that execution started on
- NUMA_CTL: use the CPU map provided by numactl
- MIRROR: Mirrors the model across NUMA nodes
- def setRopeFreqBase(ropeFreqBase: Float): HasLlamaCppModelProperties.this
Set the RoPE base frequency, used by NTK-aware scaling
- def setRopeFreqScale(ropeFreqScale: Float): HasLlamaCppModelProperties.this
Set the RoPE frequency scaling factor, expands context by a factor of 1/N
- def setRopeScalingType(ropeScalingType: String): HasLlamaCppModelProperties.this
Set the RoPE frequency scaling method, defaults to linear unless specified by the model.
Set the RoPE frequency scaling method, defaults to linear unless specified by the model.
- NONE: Don't use any scaling
- LINEAR: Linear scaling
- YARN: YaRN RoPE scaling
- def setSystemPrompt(systemPrompt: String): HasLlamaCppModelProperties.this
Set a system prompt to use
- def setUseMlock(useMlock: Boolean): HasLlamaCppModelProperties.this
Whether to force the system to keep model in RAM rather than swapping or compressing
- def setUseMmap(useMmap: Boolean): HasLlamaCppModelProperties.this
Whether to use memory-map model (faster load but may increase pageouts if not using mlock)
- def setYarnAttnFactor(yarnAttnFactor: Float): HasLlamaCppModelProperties.this
Set the YaRN scale sqrt(t) or attention magnitude
- def setYarnBetaFast(yarnBetaFast: Float): HasLlamaCppModelProperties.this
Set the YaRN low correction dim or beta
- def setYarnBetaSlow(yarnBetaSlow: Float): HasLlamaCppModelProperties.this
Set the YaRN high correction dim or alpha
- def setYarnExtFactor(yarnExtFactor: Float): HasLlamaCppModelProperties.this
Set the YaRN extrapolation mix factor
- def setYarnOrigCtx(yarnOrigCtx: Int): HasLlamaCppModelProperties.this
Set the YaRN original context size of model
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- val systemPrompt: Param[String]
- def toString(): String
- Definition Classes
- AnyRef → Any
- val useMlock: BooleanParam
- val useMmap: BooleanParam
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- val yarnAttnFactor: FloatParam
- val yarnBetaFast: FloatParam
- val yarnBetaSlow: FloatParam
- val yarnExtFactor: FloatParam
- val yarnOrigCtx: IntParam
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)
Inherited from AnyRef
Inherited from Any
Parameter setters
Parameter getters
Parameters
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.