CLI Configuration¶

InstructLab’s configuration is read from the $/XDG_CONFIG_DIR/instructlab/config.yaml file. The configuration is handled and valided by a Pydantic schema.

pydantic model instructlab.configuration.Config¶

Configuration for the InstructLab CLI. Config options are defined by the respective subclasses and are loaded into a single ‘Config’ object here Instantation of this object should be done via ‘get_default_config()’ Note that values here can be overriden by a users ‘config.yaml’ or command line overrides in some cases

Show JSON schema

{
   "title": "Config",
   "description": "Configuration for the InstructLab CLI.\nConfig options are defined by the respective subclasses and are loaded into a single 'Config' object here\nInstantation of this object should be done via 'get_default_config()'\nNote that values here can be overriden by a users 'config.yaml' or command line overrides in some cases",
   "type": "object",
   "properties": {
      "chat": {
         "$ref": "#/$defs/_chat",
         "description": "Chat configuration section."
      },
      "generate": {
         "$ref": "#/$defs/_generate",
         "description": "Generate configuration section."
      },
      "serve": {
         "$ref": "#/$defs/_serve",
         "description": "Serve configuration section."
      },
      "train": {
         "$ref": "#/$defs/_train",
         "description": "Train configuration section."
      },
      "evaluate": {
         "$ref": "#/$defs/_evaluate",
         "description": "Evaluate configuration section."
      },
      "general": {
         "$ref": "#/$defs/_general",
         "description": "General configuration section."
      },
      "version": {
         "default": "1.0.0",
         "description": "Configuration file structure version.",
         "title": "Version",
         "type": "string"
      },
      "metadata": {
         "$ref": "#/$defs/_metadata",
         "description": "Metadata pertaining to the specifics of the system which the Configuration is meant to be applied to."
      }
   },
   "$defs": {
      "DistributedBackend": {
         "enum": [
            "fsdp",
            "deepspeed"
         ],
         "title": "DistributedBackend",
         "type": "string"
      },
      "_chat": {
         "description": "Class describing configuration of the 'chat' sub-command.",
         "properties": {
            "model": {
               "description": "Model to be used for chatting with.",
               "title": "Model",
               "type": "string"
            },
            "vi_mode": {
               "default": false,
               "description": "Enable vim keybindings for chat.",
               "title": "Vi Mode",
               "type": "boolean"
            },
            "visible_overflow": {
               "default": true,
               "description": "Renders vertical overflow if enabled, displays ellipses otherwise.",
               "title": "Visible Overflow",
               "type": "boolean"
            },
            "context": {
               "default": "default",
               "description": "Predefined setting or environment that influences the behavior and responses of the chat assistant. Each context is associated with a specific prompt that guides the assistant on how to respond to user inputs. Available contexts: default, cli_helper.",
               "title": "Context",
               "type": "string"
            },
            "session": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Filepath of a dialog session file.",
               "title": "Session"
            },
            "logs_dir": {
               "description": "Directory where chat logs are stored.",
               "title": "Logs Dir",
               "type": "string"
            },
            "max_tokens": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The maximum number of tokens that can be generated in the chat completion. Be aware that larger values use more memory.",
               "title": "Max Tokens"
            },
            "temperature": {
               "default": 1.0,
               "description": "Controls the randomness of the model's responses. Lower values make the output more deterministic, while higher values produce more random results.",
               "title": "Temperature",
               "type": "number"
            }
         },
         "title": "_chat",
         "type": "object"
      },
      "_evaluate": {
         "description": "Class describing configuration of the 'evaluate' sub-command.",
         "properties": {
            "model": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Model to be evaluated",
               "title": "Model"
            },
            "base_model": {
               "default": "instructlab/granite-7b-lab",
               "description": "Base model to compare with 'model' for mt_bench_branch and mmlu_branch.",
               "title": "Base Model",
               "type": "string"
            },
            "branch": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Taxonomy branch containing custom skills/knowledge that should be used for evaluation runs.",
               "title": "Branch"
            },
            "base_branch": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Base taxonomy branch",
               "title": "Base Branch"
            },
            "gpus": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Number of GPUs to use for running evaluation.",
               "title": "Gpus"
            },
            "mmlu": {
               "$ref": "#/$defs/_mmlu",
               "description": "MMLU benchmarking settings"
            },
            "mmlu_branch": {
               "$ref": "#/$defs/_mmlubranch",
               "description": "Settings to run MMLU against a branch of taxonomy containing custom skills/knowledge used for training."
            },
            "mt_bench": {
               "$ref": "#/$defs/_mtbench",
               "description": "Multi-turn benchmarking settings for skills."
            },
            "mt_bench_branch": {
               "$ref": "#/$defs/_mtbenchbranch",
               "description": "Settings to run MT-Bench against a branch of taxonomy containing custom skills/knowledge used for training"
            }
         },
         "title": "_evaluate",
         "type": "object"
      },
      "_general": {
         "description": "Class describing various top-level configuration options for all commands.",
         "properties": {
            "log_level": {
               "default": "INFO",
               "description": "Log level for logging.",
               "title": "Log Level",
               "type": "string"
            },
            "debug_level": {
               "default": 0,
               "description": "Debug level for logging.",
               "title": "Debug Level",
               "type": "integer"
            },
            "log_format": {
               "default": "%(levelname)s %(asctime)s %(name)s:%(lineno)d: %(message)s",
               "description": "Log format. https://docs.python.org/3/library/logging.html#logrecord-attributes",
               "title": "Log Format",
               "type": "string"
            },
            "use_legacy_tmpl": {
               "default": false,
               "description": "Use legacy IBM Granite chat template (default uses 3.0 Instruct template)",
               "title": "Use Legacy Tmpl",
               "type": "boolean"
            }
         },
         "title": "_general",
         "type": "object"
      },
      "_generate": {
         "description": "Class describing configuration of the 'generate' sub-command.",
         "properties": {
            "pipeline": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "full",
               "description": "Data generation pipeline to use. Available: 'simple', 'full', or a valid path to a directory of pipeline workflow YAML files. Note that 'full' requires a larger teacher model, Mixtral-8x7b.",
               "title": "Pipeline"
            },
            "max_num_tokens": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 4096,
               "description": "The maximum amount of tokens for the model to generate during knowledge generation. A lower number yields less data but a faster SDG run. It is reccomended to use this on consumer hardware",
               "title": "Max Num Tokens"
            },
            "model": {
               "description": "Teacher model that will be used to synthetically generate training data.",
               "title": "Model",
               "type": "string"
            },
            "taxonomy_path": {
               "description": "Directory where taxonomy is stored and accessed from.",
               "title": "Taxonomy Path",
               "type": "string"
            },
            "taxonomy_base": {
               "default": "origin/main",
               "description": "Branch of taxonomy used to calculate diff against.",
               "title": "Taxonomy Base",
               "type": "string"
            },
            "teacher": {
               "$ref": "#/$defs/_serve",
               "description": "Teacher configuration"
            },
            "num_cpus": {
               "default": 10,
               "description": "Number of CPU cores to use for generation.",
               "exclusiveMinimum": 0,
               "title": "Num Cpus",
               "type": "integer"
            },
            "chunk_word_count": {
               "default": 1000,
               "description": "Maximum number of words per chunk.",
               "exclusiveMinimum": 0,
               "title": "Chunk Word Count",
               "type": "integer"
            },
            "num_instructions": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": -1,
               "deprecated": true,
               "description": "Number of instructions to use",
               "title": "Num Instructions"
            },
            "sdg_scale_factor": {
               "anyOf": [
                  {
                     "exclusiveMinimum": 0,
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 30,
               "description": "The total number of instructions to be generated.",
               "title": "Sdg Scale Factor"
            },
            "output_dir": {
               "description": "Directory where generated datasets are stored.",
               "title": "Output Dir",
               "type": "string"
            }
         },
         "title": "_generate",
         "type": "object"
      },
      "_metadata": {
         "properties": {
            "cpu_info": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Manufacturer, Family, and SKU of the system CPU, ex: Apple M3 Max",
               "title": "Cpu Info"
            },
            "gpu_manufacturer": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Manufacturer of the system GPU, ex: Nvidia",
               "title": "Gpu Manufacturer"
            },
            "gpu_family": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Family of the system GPU, ex: H100",
               "title": "Gpu Family"
            },
            "gpu_count": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Amount of GPUs on the system, ex: 8",
               "title": "Gpu Count"
            },
            "gpu_sku": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Specific SKU related information about the given GPU, ex: PCIe, NVL",
               "title": "Gpu Sku"
            }
         },
         "title": "_metadata",
         "type": "object"
      },
      "_mmlu": {
         "description": "Class describing configuration of MMLU evaluation benchmark.",
         "properties": {
            "few_shots": {
               "default": 5,
               "description": "Number of question-answer pairs provided in the context preceding the question used for evaluation.",
               "title": "Few Shots",
               "type": "integer"
            },
            "batch_size": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": "auto",
               "description": "Batch size for evaluation. Valid values are a positive integer or 'auto' to select the largest batch size that will fit in memory.",
               "title": "Batch Size"
            }
         },
         "title": "_mmlu",
         "type": "object"
      },
      "_mmlubranch": {
         "description": "Class describing configuration of MMLUBranch evaluation benchmark.",
         "properties": {
            "tasks_dir": {
               "description": "Directory where custom MMLU tasks are stored.",
               "title": "Tasks Dir",
               "type": "string"
            }
         },
         "title": "_mmlubranch",
         "type": "object"
      },
      "_mtbench": {
         "description": "Class describing configuration of MTBench evaluation benchmark.",
         "properties": {
            "judge_model": {
               "description": "Judge model for mt_bench and mt_bench_branch.",
               "title": "Judge Model",
               "type": "string"
            },
            "output_dir": {
               "description": "Directory where evaluation results are stored.",
               "title": "Output Dir",
               "type": "string"
            },
            "max_workers": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "default": "auto",
               "description": "Number of workers to use for evaluation with mt_bench or mt_bench_branch. Must be a positive integer or 'auto'.",
               "title": "Max Workers"
            }
         },
         "title": "_mtbench",
         "type": "object"
      },
      "_mtbenchbranch": {
         "description": "Class describing configuration of MTBenchBranch evaluation benchmark.",
         "properties": {
            "taxonomy_path": {
               "description": "Path to where base taxonomy is stored.",
               "title": "Taxonomy Path",
               "type": "string"
            }
         },
         "title": "_mtbenchbranch",
         "type": "object"
      },
      "_serve": {
         "description": "Class describing configuration of the 'serve' sub-command.",
         "properties": {
            "vllm": {
               "$ref": "#/$defs/_serve_vllm",
               "description": "vLLM serving settings."
            },
            "llama_cpp": {
               "$ref": "#/$defs/_serve_llama_cpp",
               "description": "llama-cpp serving settings."
            },
            "model_path": {
               "description": "Directory where model to be served is stored.",
               "title": "Model Path",
               "type": "string"
            },
            "server": {
               "$ref": "#/$defs/_serve_server",
               "default": {
                  "host": "127.0.0.1",
                  "port": 8000
               },
               "description": "Server configuration including host and port."
            },
            "chat_template": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Chat template to supply to the model. Possible values: 'auto'(default), 'tokenizer', a path to a jinja2 file.",
               "examples": [
                  "auto",
                  "tokenizer",
                  "A filesystem path expressing the location of a custom template"
               ],
               "title": "Chat Template"
            },
            "backend": {
               "anyOf": [
                  {
                     "pattern": "vllm|llama-cpp",
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Serving backend to use to host the model.",
               "examples": [
                  "vllm",
                  "llama-cpp"
               ],
               "title": "Backend"
            }
         },
         "title": "_serve",
         "type": "object"
      },
      "_serve_llama_cpp": {
         "description": "Class describing configuration of llama-cpp serving backend.",
         "properties": {
            "gpu_layers": {
               "default": -1,
               "description": "Number of model layers to offload to GPU. -1 means all layers.",
               "title": "Gpu Layers",
               "type": "integer"
            },
            "max_ctx_size": {
               "default": 4096,
               "description": "Maximum number of tokens that can be processed by the model.",
               "exclusiveMinimum": 0,
               "title": "Max Ctx Size",
               "type": "integer"
            },
            "llm_family": {
               "default": "",
               "description": "Large Language Model Family",
               "examples": [
                  "granite",
                  "mixtral"
               ],
               "title": "Llm Family",
               "type": "string"
            }
         },
         "title": "_serve_llama_cpp",
         "type": "object"
      },
      "_serve_server": {
         "description": "Class describing configuration of server serving backend.",
         "properties": {
            "host": {
               "default": "127.0.0.1",
               "description": "Host to serve on.",
               "title": "Host",
               "type": "string"
            },
            "port": {
               "default": 8000,
               "description": "Port to serve on.",
               "title": "Port",
               "type": "integer"
            }
         },
         "title": "_serve_server",
         "type": "object"
      },
      "_serve_vllm": {
         "description": "Class describing configuration of vLLM serving backend.",
         "properties": {
            "llm_family": {
               "default": "",
               "description": "Large Language Model Family",
               "examples": [
                  "granite",
                  "mixtral"
               ],
               "title": "Llm Family",
               "type": "string"
            },
            "max_startup_attempts": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 120,
               "description": "Maximum number of attempts to start the vLLM server.",
               "title": "Max Startup Attempts"
            },
            "gpus": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Number of GPUs to use.",
               "title": "Gpus"
            },
            "vllm_args": {
               "anyOf": [
                  {
                     "items": {
                        "type": "string"
                     },
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "vLLM specific arguments. All settings can be passed as a list of strings, see: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html",
               "examples": [
                  [
                     "--dtype",
                     "auto"
                  ],
                  [
                     "--lora-alpha",
                     "32"
                  ]
               ],
               "title": "Vllm Args"
            }
         },
         "title": "_serve_vllm",
         "type": "object"
      },
      "_train": {
         "description": "Class describing configuration of the 'train' sub-command.",
         "properties": {
            "pipeline": {
               "default": "full",
               "description": "Training pipeline to use. Simple is for systems with limited resources, full is for more capable consumer systems (64 GB of RAM), and accelerated is for systems with a dedicated GPU.",
               "examples": [
                  "simple",
                  "full",
                  "accelerated"
               ],
               "pattern": "simple|full|accelerated",
               "title": "Pipeline",
               "type": "string"
            },
            "model_path": {
               "default": "instructlab/granite-7b-lab",
               "description": "Directory where the model to be trained is stored.",
               "title": "Model Path",
               "type": "string"
            },
            "device": {
               "default": "cpu",
               "description": "PyTorch device to use. Use 'cpu' for 'simple' and 'full' training on Linux. Use 'mps' for 'full' training on MacOS Metal Performance Shader. Use 'cuda' for Nvidia CUDA / AMD ROCm GPUs. Use 'hpu' for Intel Gaudi GPUs.",
               "examples": [
                  "cpu",
                  "mps",
                  "cuda",
                  "hpu"
               ],
               "pattern": "cpu|mps|cuda|hpu",
               "title": "Device",
               "type": "string"
            },
            "data_path": {
               "description": "For the training library (primary training method), this specifies the path to the dataset file. For legacy training (MacOS/Linux), this specifies the path to the directory.",
               "title": "Data Path",
               "type": "string"
            },
            "ckpt_output_dir": {
               "description": "Directory where periodic training checkpoints are stored.",
               "title": "Ckpt Output Dir",
               "type": "string"
            },
            "data_output_dir": {
               "description": "Directory where the processed training data is stored (post filtering/tokenization/masking).",
               "title": "Data Output Dir",
               "type": "string"
            },
            "max_seq_len": {
               "default": 4096,
               "description": "Maximum sequence length to be included in the training set. Samples exceeding this length will be dropped.",
               "title": "Max Seq Len",
               "type": "integer"
            },
            "max_batch_len": {
               "default": 5000,
               "description": "Maximum tokens per gpu for each batch that will be handled in a single step. If running into out-of-memory errors, this value can be lowered but not below the `max_seq_len`.",
               "title": "Max Batch Len",
               "type": "integer"
            },
            "num_epochs": {
               "default": 10,
               "description": "Number of epochs to run training for.",
               "title": "Num Epochs",
               "type": "integer"
            },
            "effective_batch_size": {
               "default": 64,
               "description": "The number of samples in a batch that the model should see before its parameters are updated.",
               "title": "Effective Batch Size",
               "type": "integer"
            },
            "save_samples": {
               "default": 250000,
               "description": "Number of samples the model should see before saving a checkpoint.",
               "title": "Save Samples",
               "type": "integer"
            },
            "checkpoint_at_epoch": {
               "default": true,
               "description": "Save a checkpoint at the end of each epoch.",
               "title": "Checkpoint At Epoch",
               "type": "boolean"
            },
            "deepspeed_cpu_offload_optimizer": {
               "default": false,
               "description": "Allow CPU offload for deepspeed optimizer.",
               "title": "Deepspeed Cpu Offload Optimizer",
               "type": "boolean"
            },
            "fsdp_cpu_offload_optimizer": {
               "default": false,
               "description": "Allow CPU offload for FSDP optimizer.",
               "title": "Fsdp Cpu Offload Optimizer",
               "type": "boolean"
            },
            "distributed_backend": {
               "$ref": "#/$defs/DistributedBackend",
               "default": "fsdp",
               "description": "Pick a distributed training backend framework for GPU accelerated full fine-tuning."
            },
            "lora_rank": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 0,
               "description": "Rank of low rank matrices to be used during training.",
               "title": "Lora Rank"
            },
            "lora_quantize_dtype": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "nf4",
               "description": "The data type for quantization in LoRA training. Valid options are 'None' and 'nf4'.",
               "examples": [
                  "nf4"
               ],
               "title": "Lora Quantize Dtype"
            },
            "is_padding_free": {
               "default": false,
               "description": "Boolean to indicate if the model being trained is a padding-free transformer model such as Granite.",
               "title": "Is Padding Free",
               "type": "boolean"
            },
            "nproc_per_node": {
               "default": 1,
               "description": "Number of GPUs to use for training. This value is not supported in legacy training or MacOS.",
               "title": "Nproc Per Node",
               "type": "integer"
            },
            "disable_flash_attn": {
               "anyOf": [
                  {
                     "type": "boolean"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": false,
               "description": "Whether or not we should disable the use of flash-attention during training. This is useful when using older GPUs.",
               "title": "Disable Flash Attn"
            },
            "additional_args": {
               "description": "Additional arguments to pass to the training script. These arguments are passed as key-value pairs to the training script.",
               "title": "Additional Args",
               "type": "object"
            },
            "phased_phase1_num_epochs": {
               "anyOf": [
                  {
                     "exclusiveMinimum": 0,
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 7,
               "description": "Number of epochs to run training for during phase1 (experimentally optimal number is 7).",
               "title": "Phased Phase1 Num Epochs"
            },
            "phased_phase1_samples_per_save": {
               "default": 0,
               "description": "Number of samples the model should see before saving a checkpoint during phase1. Disabled when set to 0.",
               "minimum": 0,
               "title": "Phased Phase1 Samples Per Save",
               "type": "integer"
            },
            "phased_phase1_learning_rate": {
               "default": 2e-05,
               "description": "Learning rate for phase1 knowledge training.",
               "minimum": 0.0,
               "title": "Phased Phase1 Learning Rate",
               "type": "number"
            },
            "phased_phase1_effective_batch_size": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 128,
               "description": "Phased phase1 effective batch size.",
               "title": "Phased Phase1 Effective Batch Size"
            },
            "phased_phase2_num_epochs": {
               "anyOf": [
                  {
                     "exclusiveMinimum": 0,
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 10,
               "description": "Number of epochs to run training for during phase2.",
               "title": "Phased Phase2 Num Epochs"
            },
            "phased_phase2_samples_per_save": {
               "default": 0,
               "description": "Number of samples the model should see before saving a checkpoint during phase2. Disabled when set to 0.",
               "minimum": 0,
               "title": "Phased Phase2 Samples Per Save",
               "type": "integer"
            },
            "phased_phase2_learning_rate": {
               "default": 6e-06,
               "description": "Learning rate for phase2 skills training.",
               "minimum": 0.0,
               "title": "Phased Phase2 Learning Rate",
               "type": "number"
            },
            "phased_phase2_effective_batch_size": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": 3840,
               "description": "Phased phase2 effective batch size.",
               "title": "Phased Phase2 Effective Batch Size"
            },
            "phased_mt_bench_judge": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Judge model path for phased MT-Bench evaluation.",
               "title": "Phased Mt Bench Judge"
            },
            "phased_base_dir": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "description": "Base directory for organization of end-to-end intermediate outputs.",
               "title": "Phased Base Dir"
            },
            "training_journal": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Optional path to a yaml file that tracks the progress of multiphase training.",
               "title": "Training Journal"
            }
         },
         "title": "_train",
         "type": "object"
      }
   }
}

Fields:

chat (instructlab.configuration._chat)
evaluate (instructlab.configuration._evaluate)
general (instructlab.configuration._general)
generate (instructlab.configuration._generate)
metadata (instructlab.configuration._metadata)
serve (instructlab.configuration._serve)
train (instructlab.configuration._train)
version (str)

field chat: _chat [Optional]¶: Chat configuration section.

field evaluate: _evaluate [Optional]¶: Evaluate configuration section.

field general: _general [Optional]¶: General configuration section.

field generate: _generate [Optional]¶: Generate configuration section.

field metadata: _metadata [Optional]¶: Metadata pertaining to the specifics of the system which the Configuration is meant to be applied to.

field serve: _serve [Optional]¶: Serve configuration section.

field train: _train [Optional]¶: Train configuration section.

field version: str = '1.0.0'¶: Configuration file structure version.

General¶

pydantic model instructlab.configuration._general¶

Class describing various top-level configuration options for all commands.

Fields:

debug_level (int)
log_format (str)
log_level (str)
use_legacy_tmpl (bool)

field debug_level: int = 0¶

Debug level for logging.

Validated by:

after_debug_level

field log_format: Annotated[str, Strict(strict=True)] = '%(levelname)s %(asctime)s %(name)s:%(lineno)d: %(message)s'¶

Log format. https://docs.python.org/3/library/logging.html#logrecord-attributes

Constraints:

strict = True

Validated by:

after_debug_level
validate_log_format

field log_level: Annotated[str, Strict(strict=True)] = 'INFO'¶

Log level for logging.

Constraints:

strict = True

Validated by:

after_debug_level
validate_log_level

field use_legacy_tmpl: bool = False¶

Use legacy IBM Granite chat template (default uses 3.0 Instruct template)

Validated by:

after_debug_level

Metadata¶

pydantic model instructlab.configuration._metadata¶

Fields:

cpu_info (str | None)
gpu_count (int | None)
gpu_family (str | None)
gpu_manufacturer (str | None)
gpu_sku (list[str] | None)

field cpu_info: str | None = None¶: Manufacturer, Family, and SKU of the system CPU, ex: Apple M3 Max

field gpu_count: int | None = None¶: Amount of GPUs on the system, ex: 8

field gpu_family: str | None = None¶: Family of the system GPU, ex: H100

field gpu_manufacturer: str | None = None¶: Manufacturer of the system GPU, ex: Nvidia

field gpu_sku: list[str] | None = None¶: Specific SKU related information about the given GPU, ex: PCIe, NVL

ilab model chat¶

pydantic model instructlab.configuration._chat¶

Class describing configuration of the ‘chat’ sub-command.

Fields:

context (str)
logs_dir (str)
max_tokens (int | None)
model (str)
session (str | None)
temperature (float)
vi_mode (bool)
visible_overflow (bool)

field context: str = 'default'¶: Predefined setting or environment that influences the behavior and responses of the chat assistant. Each context is associated with a specific prompt that guides the assistant on how to respond to user inputs. Available contexts: default, cli_helper.

field logs_dir: str [Optional]¶: Directory where chat logs are stored.

field max_tokens: int | None = None¶: The maximum number of tokens that can be generated in the chat completion. Be aware that larger values use more memory.

field model: str [Optional]¶: Model to be used for chatting with.

field session: str | None = None¶: Filepath of a dialog session file.

field temperature: float = 1.0¶: Controls the randomness of the model’s responses. Lower values make the output more deterministic, while higher values produce more random results.

field vi_mode: bool = False¶: Enable vim keybindings for chat.

field visible_overflow: bool = True¶: Renders vertical overflow if enabled, displays ellipses otherwise.

ilab model evaluate¶

pydantic model instructlab.configuration._evaluate¶

Class describing configuration of the ‘evaluate’ sub-command.

Fields:

base_branch (str | None)
base_model (str)
branch (str | None)
gpus (int | None)
mmlu (instructlab.configuration._mmlu)
mmlu_branch (instructlab.configuration._mmlubranch)
model (str | None)
mt_bench (instructlab.configuration._mtbench)
mt_bench_branch (instructlab.configuration._mtbenchbranch)

field base_branch: str | None = None¶: Base taxonomy branch

field base_model: str = 'instructlab/granite-7b-lab'¶: Base model to compare with ‘model’ for mt_bench_branch and mmlu_branch.

field branch: str | None = None¶: Taxonomy branch containing custom skills/knowledge that should be used for evaluation runs.

field gpus: int | None = None¶: Number of GPUs to use for running evaluation.

field mmlu: _mmlu [Optional]¶: MMLU benchmarking settings

field mmlu_branch: _mmlubranch [Optional]¶: Settings to run MMLU against a branch of taxonomy containing custom skills/knowledge used for training.

field model: str | None = None¶: Model to be evaluated

field mt_bench: _mtbench [Optional]¶: Multi-turn benchmarking settings for skills.

field mt_bench_branch: _mtbenchbranch [Optional]¶: Settings to run MT-Bench against a branch of taxonomy containing custom skills/knowledge used for training

pydantic model instructlab.configuration._mmlu¶

Class describing configuration of MMLU evaluation benchmark.

Fields:

batch_size (str | int)
few_shots (int)

field batch_size: str | int = 'auto'¶: Batch size for evaluation. Valid values are a positive integer or ‘auto’ to select the largest batch size that will fit in memory.

field few_shots: int = 5¶: Number of question-answer pairs provided in the context preceding the question used for evaluation.

pydantic model instructlab.configuration._mmlubranch¶

Class describing configuration of MMLUBranch evaluation benchmark.

Fields:

tasks_dir (str)

field tasks_dir: str [Optional]¶: Directory where custom MMLU tasks are stored.

pydantic model instructlab.configuration._mtbench¶

Class describing configuration of MTBench evaluation benchmark.

Fields:

judge_model (str)
max_workers (str | int)
output_dir (str)

field judge_model: str [Optional]¶: Judge model for mt_bench and mt_bench_branch.

field max_workers: str | int = 'auto'¶: Number of workers to use for evaluation with mt_bench or mt_bench_branch. Must be a positive integer or ‘auto’.

field output_dir: str [Optional]¶: Directory where evaluation results are stored.

pydantic model instructlab.configuration._mtbenchbranch¶

Class describing configuration of MTBenchBranch evaluation benchmark.

Fields:

taxonomy_path (str)

field taxonomy_path: str [Optional]¶: Path to where base taxonomy is stored.

ilab data generate¶

pydantic model instructlab.configuration._generate¶

Class describing configuration of the ‘generate’ sub-command.

Fields:

chunk_word_count (int)
max_num_tokens (int | None)
model (str)
num_cpus (int)
num_instructions (int | None)
output_dir (str)
pipeline (str | None)
sdg_scale_factor (int | None)
taxonomy_base (str)
taxonomy_path (str)
teacher (instructlab.configuration._serve)

field chunk_word_count: Annotated[int, Gt(gt=0)] = 1000¶

Maximum number of words per chunk.

Constraints:

gt = 0

field max_num_tokens: int | None = 4096¶: The maximum amount of tokens for the model to generate during knowledge generation. A lower number yields less data but a faster SDG run. It is reccomended to use this on consumer hardware

field model: Annotated[str, Strict(strict=True)] [Optional]¶

Teacher model that will be used to synthetically generate training data.

Constraints:

strict = True

field num_cpus: Annotated[int, Gt(gt=0)] = 10¶

Number of CPU cores to use for generation.

Constraints:

gt = 0

field output_dir: Annotated[str, Strict(strict=True)] [Optional]¶

Directory where generated datasets are stored.

Constraints:

strict = True

field pipeline: str | None = 'full'¶: Data generation pipeline to use. Available: ‘simple’, ‘full’, or a valid path to a directory of pipeline workflow YAML files. Note that ‘full’ requires a larger teacher model, Mixtral-8x7b.

field sdg_scale_factor: Annotated[int, Gt(gt=0)] | None = 30¶: The total number of instructions to be generated.

field taxonomy_base: Annotated[str, Strict(strict=True)] = 'origin/main'¶

Branch of taxonomy used to calculate diff against.

Constraints:

strict = True

field taxonomy_path: Annotated[str, Strict(strict=True)] [Optional]¶

Directory where taxonomy is stored and accessed from.

Constraints:

strict = True

field teacher: _serve [Optional]¶: Teacher configuration

num_instructions: int | None¶

Data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg¶: The deprecation message to be emitted.

wrapped_property¶: The property instance if the deprecated field is a computed field, or None.

field_name¶: The name of the field being deprecated.

ilab model serve¶

pydantic model instructlab.configuration._serve¶

Class describing configuration of the ‘serve’ sub-command.

Fields:

backend (str | None)
chat_template (str | None)
llama_cpp (instructlab.configuration._serve_llama_cpp)
model_path (str)
server (instructlab.configuration._serve_server)
vllm (instructlab.configuration._serve_vllm)

field backend: str | None = None¶

Serving backend to use to host the model.

Constraints:

pattern = vllm|llama-cpp

field chat_template: str | None = None¶: Chat template to supply to the model. Possible values: ‘auto’(default), ‘tokenizer’, a path to a jinja2 file.

field llama_cpp: _serve_llama_cpp [Optional]¶: llama-cpp serving settings.

field model_path: Annotated[str, Strict(strict=True)] [Optional]¶

Directory where model to be served is stored.

Constraints:

strict = True

field server: _serve_server = _serve_server(host='127.0.0.1', port=8000)¶: Server configuration including host and port.

field vllm: _serve_vllm [Optional]¶: vLLM serving settings.

api_base()¶: Returns server API URL, based on the configured host and port

pydantic model instructlab.configuration._serve_llama_cpp¶

Class describing configuration of llama-cpp serving backend.

Fields:

gpu_layers (int)
llm_family (str)
max_ctx_size (int)

field gpu_layers: int = -1¶: Number of model layers to offload to GPU. -1 means all layers.

field llm_family: str = ''¶: Large Language Model Family

field max_ctx_size: Annotated[int, Gt(gt=0)] = 4096¶

Maximum number of tokens that can be processed by the model.

Constraints:

gt = 0

pydantic model instructlab.configuration._serve_vllm¶

Class describing configuration of vLLM serving backend.

Fields:

gpus (int | None)
llm_family (str)
max_startup_attempts (int | None)
vllm_args (list[str] | None)

field gpus: int | None = None¶: Number of GPUs to use.

field llm_family: str = ''¶: Large Language Model Family

field max_startup_attempts: int | None = 120¶: Maximum number of attempts to start the vLLM server.

field vllm_args: list[str] | None [Optional]¶: vLLM specific arguments. All settings can be passed as a list of strings, see: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

pydantic model instructlab.configuration._serve_server¶

Class describing configuration of server serving backend.

Fields:

host (str)
port (int)

field host: Annotated[str, Strict(strict=True)] = '127.0.0.1'¶

Host to serve on.

Constraints:

strict = True

field port: Annotated[int, Strict(strict=True)] = 8000¶

Port to serve on.

Constraints:

strict = True

ilab model train¶

pydantic model instructlab.configuration._train¶

Class describing configuration of the ‘train’ sub-command.

Fields:

additional_args (dict[str, Any])
checkpoint_at_epoch (bool)
ckpt_output_dir (str)
data_output_dir (str)
data_path (str)
deepspeed_cpu_offload_optimizer (bool)
device (str)
disable_flash_attn (bool | None)
distributed_backend (instructlab.training.config.DistributedBackend)
effective_batch_size (int)
fsdp_cpu_offload_optimizer (bool)
is_padding_free (bool)
lora_quantize_dtype (str | None)
lora_rank (int | None)
max_batch_len (int)
max_seq_len (int)
model_path (str)
nproc_per_node (int)
num_epochs (int)
phased_base_dir (str | None)
phased_mt_bench_judge (str | None)
phased_phase1_effective_batch_size (int | None)
phased_phase1_learning_rate (float)
phased_phase1_num_epochs (int | None)
phased_phase1_samples_per_save (int)
phased_phase2_effective_batch_size (int | None)
phased_phase2_learning_rate (float)
phased_phase2_num_epochs (int | None)
phased_phase2_samples_per_save (int)
pipeline (str)
save_samples (int)
training_journal (str | None)

field additional_args: dict[str, Any] [Optional]¶: Additional arguments to pass to the training script. These arguments are passed as key-value pairs to the training script.

field checkpoint_at_epoch: bool = True¶: Save a checkpoint at the end of each epoch.

field ckpt_output_dir: str [Optional]¶: Directory where periodic training checkpoints are stored.

field data_output_dir: str [Optional]¶: Directory where the processed training data is stored (post filtering/tokenization/masking).

field data_path: str [Optional]¶: For the training library (primary training method), this specifies the path to the dataset file. For legacy training (MacOS/Linux), this specifies the path to the directory.

field deepspeed_cpu_offload_optimizer: bool = False¶: Allow CPU offload for deepspeed optimizer.

field device: str = 'cpu'¶

PyTorch device to use. Use ‘cpu’ for ‘simple’ and ‘full’ training on Linux. Use ‘mps’ for ‘full’ training on MacOS Metal Performance Shader. Use ‘cuda’ for Nvidia CUDA / AMD ROCm GPUs. Use ‘hpu’ for Intel Gaudi GPUs.

Constraints:

pattern = cpu|mps|cuda|hpu

field disable_flash_attn: bool | None = False¶: Whether or not we should disable the use of flash-attention during training. This is useful when using older GPUs.

field distributed_backend: DistributedBackend = DistributedBackend.FSDP¶: Pick a distributed training backend framework for GPU accelerated full fine-tuning.

field effective_batch_size: int = 64¶: The number of samples in a batch that the model should see before its parameters are updated.

field fsdp_cpu_offload_optimizer: bool = False¶: Allow CPU offload for FSDP optimizer.

field is_padding_free: bool = False¶: Boolean to indicate if the model being trained is a padding-free transformer model such as Granite.

field lora_quantize_dtype: str | None = 'nf4'¶: The data type for quantization in LoRA training. Valid options are ‘None’ and ‘nf4’.

field lora_rank: int | None = 0¶: Rank of low rank matrices to be used during training.

field max_batch_len: int = 5000¶: Maximum tokens per gpu for each batch that will be handled in a single step. If running into out-of-memory errors, this value can be lowered but not below the max_seq_len.

field max_seq_len: int = 4096¶: Maximum sequence length to be included in the training set. Samples exceeding this length will be dropped.

field model_path: str = 'instructlab/granite-7b-lab'¶: Directory where the model to be trained is stored.

field nproc_per_node: int = 1¶: Number of GPUs to use for training. This value is not supported in legacy training or MacOS.

field num_epochs: int = 10¶: Number of epochs to run training for.

field phased_base_dir: str | None [Optional]¶: Base directory for organization of end-to-end intermediate outputs.

field phased_mt_bench_judge: str | None [Optional]¶: Judge model path for phased MT-Bench evaluation.

field phased_phase1_effective_batch_size: int | None = 128¶: Phased phase1 effective batch size.

field phased_phase1_learning_rate: float = 2e-05¶

Learning rate for phase1 knowledge training.

Constraints:

ge = 0

field phased_phase1_num_epochs: int | None = 7¶

Number of epochs to run training for during phase1 (experimentally optimal number is 7).

Constraints:

gt = 0

field phased_phase1_samples_per_save: int = 0¶

Number of samples the model should see before saving a checkpoint during phase1. Disabled when set to 0.

Constraints:

ge = 0

field phased_phase2_effective_batch_size: int | None = 3840¶: Phased phase2 effective batch size.

field phased_phase2_learning_rate: float = 6e-06¶

Learning rate for phase2 skills training.

Constraints:

ge = 0

field phased_phase2_num_epochs: int | None = 10¶

Number of epochs to run training for during phase2.

Constraints:

gt = 0

field phased_phase2_samples_per_save: int = 0¶

Number of samples the model should see before saving a checkpoint during phase2. Disabled when set to 0.

Constraints:

ge = 0

field pipeline: str = 'full'¶

Training pipeline to use. Simple is for systems with limited resources, full is for more capable consumer systems (64 GB of RAM), and accelerated is for systems with a dedicated GPU.

Constraints:

pattern = simple|full|accelerated

field save_samples: int = 250000¶: Number of samples the model should see before saving a checkpoint.

field training_journal: str | None = None¶: Optional path to a yaml file that tracks the progress of multiphase training.