Interface InferenceService

All Superinterfaces:
AutoCloseable, Closeable

public interface InferenceService extends Closeable
  • Method Details

    • init

      default void init(Client client)
    • name

      String name()
    • parseRequestConfig

      void parseRequestConfig(String modelId, TaskType taskType, Map<String,Object> config, ActionListener<Model> parsedModelListener)
      Parse model configuration from the config map from a request and return the parsed Model. This requires that both the secrets and service settings be contained in the service_settings field. This function modifies config map, fields are removed from the map as they are read.

      If the map contains unrecognized configuration option an ElasticsearchStatusException is thrown.

      Parameters:
      modelId - Model Id
      taskType - The model task type
      config - Configuration options including the secrets
      parsedModelListener - A listener which will handle the resulting model or failure
    • parsePersistedConfigWithSecrets

      Model parsePersistedConfigWithSecrets(String modelId, TaskType taskType, Map<String,Object> config, Map<String,Object> secrets)
      Parse model configuration from config map from persisted storage and return the parsed Model. This requires that secrets and service settings be in two separate maps. This function modifies config map, fields are removed from the map as they are read. If the map contains unrecognized configuration options, no error is thrown.
      Parameters:
      modelId - Model Id
      taskType - The model task type
      config - Configuration options
      secrets - Sensitive configuration options (e.g. api key)
      Returns:
      The parsed Model
    • parsePersistedConfig

      Model parsePersistedConfig(String modelId, TaskType taskType, Map<String,Object> config)
      Parse model configuration from config map from persisted storage and return the parsed Model. This function modifies config map, fields are removed from the map as they are read. If the map contains unrecognized configuration options, no error is thrown.
      Parameters:
      modelId - Model Id
      taskType - The model task type
      config - Configuration options
      Returns:
      The parsed Model
    • getConfiguration

      InferenceServiceConfiguration getConfiguration()
    • hideFromConfigurationApi

      default Boolean hideFromConfigurationApi()
      Whether this service should be hidden from the API. Should be used for services that are not ready to be used.
    • supportedTaskTypes

      EnumSet<TaskType> supportedTaskTypes()
      The task types supported by the service
      Returns:
      Set of supported.
    • infer

      void infer(Model model, @Nullable String query, List<String> input, boolean stream, Map<String,Object> taskSettings, InputType inputType, TimeValue timeout, ActionListener<InferenceServiceResults> listener)
      Perform inference on the model.
      Parameters:
      model - The model
      query - Inference query, mainly for re-ranking
      input - Inference input
      stream - Stream inference results
      taskSettings - Settings in the request to override the model's defaults
      inputType - For search, ingest etc
      timeout - The timeout for the request
      listener - Inference result listener
    • chunkedInfer

      void chunkedInfer(Model model, @Nullable String query, List<String> input, Map<String,Object> taskSettings, InputType inputType, ChunkingOptions chunkingOptions, TimeValue timeout, ActionListener<List<ChunkedInferenceServiceResults>> listener)
      Chunk long text according to chunkingOptions or the model defaults if chunkingOptions contains unset values.
      Parameters:
      model - The model
      query - Inference query, mainly for re-ranking
      input - Inference input
      taskSettings - Settings in the request to override the model's defaults
      inputType - For search, ingest etc
      chunkingOptions - The window and span options to apply
      timeout - The timeout for the request
      listener - Chunked Inference result listener
    • start

      void start(Model model, TimeValue timeout, ActionListener<Boolean> listener)
      Start or prepare the model for use.
      Parameters:
      model - The model
      timeout - Start timeout
      listener - The listener
    • stop

      default void stop(UnparsedModel unparsedModel, ActionListener<Boolean> listener)
      Stop the model deployment. The default action does nothing except acknowledge the request (true).
      Parameters:
      unparsedModel - The unparsed model configuration
      listener - The listener
    • checkModelConfig

      default void checkModelConfig(Model model, ActionListener<Model> listener)
      Optionally test the new model configuration in the inference service. This function should be called when the model is first created, the default action is to do nothing.
      Parameters:
      model - The new model
      listener - The listener
    • updateModelWithEmbeddingDetails

      default Model updateModelWithEmbeddingDetails(Model model, int embeddingSize)
      Update a text embedding model's dimensions based on a provided embedding size and set the default similarity if required. The default behaviour is to just return the model.
      Parameters:
      model - The original model without updated embedding details
      embeddingSize - The embedding size to update the model with
      Returns:
      The model with updated embedding details
    • updateModelWithChatCompletionDetails

      default Model updateModelWithChatCompletionDetails(Model model)
      Update a chat completion model's max tokens if required. The default behaviour is to just return the model.
      Parameters:
      model - The original model without updated embedding details
      Returns:
      The model with updated chat completion details
    • getMinimalSupportedVersion

      TransportVersion getMinimalSupportedVersion()
      Defines the version required across all clusters to use this service
      Returns:
      TransportVersion specifying the version
    • supportedStreamingTasks

      default Set<TaskType> supportedStreamingTasks()
      The set of tasks where this service provider supports using the streaming API.
      Returns:
      set of supported task types. Defaults to empty.
    • canStream

      default boolean canStream(TaskType taskType)
      Checks the task type against the set of supported streaming tasks returned by supportedStreamingTasks().
      Parameters:
      taskType - the task that supports streaming
      Returns:
      true if the taskType is supported
    • defaultConfigIds

      default List<InferenceService.DefaultConfigId> defaultConfigIds()
      Get the Ids and task type of any default configurations provided by this service
      Returns:
      Defaults
    • defaultConfigs

      default void defaultConfigs(ActionListener<List<Model>> defaultsListener)
      Call the listener with the default model configurations defined by the service
      Parameters:
      defaultsListener - The listener
    • updateModelsWithDynamicFields

      default void updateModelsWithDynamicFields(List<Model> model, ActionListener<List<Model>> listener)