CLI ConfigurationĀ¶
InstructLabās configuration is read from the $/XDG_CONFIG_DIR/instructlab/config.yaml
file.
The configuration is handled and valided by a Pydantic schema.
- pydantic model instructlab.configuration.ConfigĀ¶
Configuration for the InstructLab CLI. Config options are defined by the respective subclasses and are loaded into a single āConfigā object here Instantation of this object should be done via āget_default_config()ā Note that values here can be overriden by a users āconfig.yamlā or command line overrides in some cases
Show JSON schema
{ "title": "Config", "description": "Configuration for the InstructLab CLI.\nConfig options are defined by the respective subclasses and are loaded into a single 'Config' object here\nInstantation of this object should be done via 'get_default_config()'\nNote that values here can be overriden by a users 'config.yaml' or command line overrides in some cases", "type": "object", "properties": { "chat": { "$ref": "#/$defs/_chat", "description": "Chat configuration section." }, "generate": { "$ref": "#/$defs/_generate", "description": "Generate configuration section." }, "serve": { "$ref": "#/$defs/_serve", "description": "Serve configuration section." }, "train": { "$ref": "#/$defs/_train", "description": "Train configuration section." }, "evaluate": { "$ref": "#/$defs/_evaluate", "description": "Evaluate configuration section." }, "general": { "$ref": "#/$defs/_general", "description": "General configuration section." }, "version": { "default": "1.0.0", "description": "Configuration file structure version.", "title": "Version", "type": "string" }, "metadata": { "$ref": "#/$defs/_metadata", "description": "Metadata pertaining to the specifics of the system which the Configuration is meant to be applied to." } }, "$defs": { "DistributedBackend": { "enum": [ "fsdp", "deepspeed" ], "title": "DistributedBackend", "type": "string" }, "_chat": { "description": "Class describing configuration of the 'chat' sub-command.", "properties": { "model": { "description": "Model to be used for chatting with.", "title": "Model", "type": "string" }, "vi_mode": { "default": false, "description": "Enable vim keybindings for chat.", "title": "Vi Mode", "type": "boolean" }, "visible_overflow": { "default": true, "description": "Renders vertical overflow if enabled, displays ellipses otherwise.", "title": "Visible Overflow", "type": "boolean" }, "context": { "default": "default", "description": "Predefined setting or environment that influences the behavior and responses of the chat assistant. Each context is associated with a specific prompt that guides the assistant on how to respond to user inputs. Available contexts: default, cli_helper.", "title": "Context", "type": "string" }, "session": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Filepath of a dialog session file.", "title": "Session" }, "logs_dir": { "description": "Directory where chat logs are stored.", "title": "Logs Dir", "type": "string" }, "max_tokens": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "The maximum number of tokens that can be generated in the chat completion. Be aware that larger values use more memory.", "title": "Max Tokens" }, "temperature": { "default": 1.0, "description": "Controls the randomness of the model's responses. Lower values make the output more deterministic, while higher values produce more random results.", "title": "Temperature", "type": "number" } }, "title": "_chat", "type": "object" }, "_evaluate": { "description": "Class describing configuration of the 'evaluate' sub-command.", "properties": { "model": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Model to be evaluated", "title": "Model" }, "base_model": { "default": "instructlab/granite-7b-lab", "description": "Base model to compare with 'model' for mt_bench_branch and mmlu_branch.", "title": "Base Model", "type": "string" }, "branch": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Taxonomy branch containing custom skills/knowledge that should be used for evaluation runs.", "title": "Branch" }, "base_branch": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Base taxonomy branch", "title": "Base Branch" }, "gpus": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Number of GPUs to use for running evaluation.", "title": "Gpus" }, "mmlu": { "$ref": "#/$defs/_mmlu", "description": "MMLU benchmarking settings" }, "mmlu_branch": { "$ref": "#/$defs/_mmlubranch", "description": "Settings to run MMLU against a branch of taxonomy containing custom skills/knowledge used for training." }, "mt_bench": { "$ref": "#/$defs/_mtbench", "description": "Multi-turn benchmarking settings for skills." }, "mt_bench_branch": { "$ref": "#/$defs/_mtbenchbranch", "description": "Settings to run MT-Bench against a branch of taxonomy containing custom skills/knowledge used for training" } }, "title": "_evaluate", "type": "object" }, "_general": { "description": "Class describing various top-level configuration options for all commands.", "properties": { "log_level": { "default": "INFO", "description": "Log level for logging.", "title": "Log Level", "type": "string" }, "debug_level": { "default": 0, "description": "Debug level for logging.", "title": "Debug Level", "type": "integer" }, "log_format": { "default": "%(levelname)s %(asctime)s %(name)s:%(lineno)d: %(message)s", "description": "Log format. https://docs.python.org/3/library/logging.html#logrecord-attributes", "title": "Log Format", "type": "string" }, "use_legacy_tmpl": { "default": false, "description": "Use legacy IBM Granite chat template (default uses 3.0 Instruct template)", "title": "Use Legacy Tmpl", "type": "boolean" } }, "title": "_general", "type": "object" }, "_generate": { "description": "Class describing configuration of the 'generate' sub-command.", "properties": { "pipeline": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "full", "description": "Data generation pipeline to use. Available: 'simple', 'full', or a valid path to a directory of pipeline workflow YAML files. Note that 'full' requires a larger teacher model, Mixtral-8x7b.", "title": "Pipeline" }, "max_num_tokens": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": 4096, "description": "The maximum amount of tokens for the model to generate during knowledge generation. A lower number yields less data but a faster SDG run. It is reccomended to use this on consumer hardware", "title": "Max Num Tokens" }, "model": { "description": "Teacher model that will be used to synthetically generate training data.", "title": "Model", "type": "string" }, "taxonomy_path": { "description": "Directory where taxonomy is stored and accessed from.", "title": "Taxonomy Path", "type": "string" }, "taxonomy_base": { "default": "origin/main", "description": "Branch of taxonomy used to calculate diff against.", "title": "Taxonomy Base", "type": "string" }, "teacher": { "$ref": "#/$defs/_serve", "description": "Teacher configuration" }, "num_cpus": { "default": 10, "description": "Number of CPU cores to use for generation.", "exclusiveMinimum": 0, "title": "Num Cpus", "type": "integer" }, "chunk_word_count": { "default": 1000, "description": "Maximum number of words per chunk.", "exclusiveMinimum": 0, "title": "Chunk Word Count", "type": "integer" }, "num_instructions": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": -1, "deprecated": true, "description": "Number of instructions to use", "title": "Num Instructions" }, "sdg_scale_factor": { "anyOf": [ { "exclusiveMinimum": 0, "type": "integer" }, { "type": "null" } ], "default": 30, "description": "The total number of instructions to be generated.", "title": "Sdg Scale Factor" }, "output_dir": { "description": "Directory where generated datasets are stored.", "title": "Output Dir", "type": "string" } }, "title": "_generate", "type": "object" }, "_metadata": { "properties": { "cpu_info": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Manufacturer, Family, and SKU of the system CPU, ex: Apple M3 Max", "title": "Cpu Info" }, "gpu_manufacturer": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Manufacturer of the system GPU, ex: Nvidia", "title": "Gpu Manufacturer" }, "gpu_family": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Family of the system GPU, ex: H100", "title": "Gpu Family" }, "gpu_count": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Amount of GPUs on the system, ex: 8", "title": "Gpu Count" }, "gpu_sku": { "anyOf": [ { "items": { "type": "string" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "Specific SKU related information about the given GPU, ex: PCIe, NVL", "title": "Gpu Sku" } }, "title": "_metadata", "type": "object" }, "_mmlu": { "description": "Class describing configuration of MMLU evaluation benchmark.", "properties": { "few_shots": { "default": 5, "description": "Number of question-answer pairs provided in the context preceding the question used for evaluation.", "title": "Few Shots", "type": "integer" }, "batch_size": { "anyOf": [ { "type": "string" }, { "type": "integer" } ], "default": "auto", "description": "Batch size for evaluation. Valid values are a positive integer or 'auto' to select the largest batch size that will fit in memory.", "title": "Batch Size" } }, "title": "_mmlu", "type": "object" }, "_mmlubranch": { "description": "Class describing configuration of MMLUBranch evaluation benchmark.", "properties": { "tasks_dir": { "description": "Directory where custom MMLU tasks are stored.", "title": "Tasks Dir", "type": "string" } }, "title": "_mmlubranch", "type": "object" }, "_mtbench": { "description": "Class describing configuration of MTBench evaluation benchmark.", "properties": { "judge_model": { "description": "Judge model for mt_bench and mt_bench_branch.", "title": "Judge Model", "type": "string" }, "output_dir": { "description": "Directory where evaluation results are stored.", "title": "Output Dir", "type": "string" }, "max_workers": { "anyOf": [ { "type": "string" }, { "type": "integer" } ], "default": "auto", "description": "Number of workers to use for evaluation with mt_bench or mt_bench_branch. Must be a positive integer or 'auto'.", "title": "Max Workers" } }, "title": "_mtbench", "type": "object" }, "_mtbenchbranch": { "description": "Class describing configuration of MTBenchBranch evaluation benchmark.", "properties": { "taxonomy_path": { "description": "Path to where base taxonomy is stored.", "title": "Taxonomy Path", "type": "string" } }, "title": "_mtbenchbranch", "type": "object" }, "_serve": { "description": "Class describing configuration of the 'serve' sub-command.", "properties": { "vllm": { "$ref": "#/$defs/_serve_vllm", "description": "vLLM serving settings." }, "llama_cpp": { "$ref": "#/$defs/_serve_llama_cpp", "description": "llama-cpp serving settings." }, "model_path": { "description": "Directory where model to be served is stored.", "title": "Model Path", "type": "string" }, "server": { "$ref": "#/$defs/_serve_server", "default": { "host": "127.0.0.1", "port": 8000 }, "description": "Server configuration including host and port." }, "chat_template": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Chat template to supply to the model. Possible values: 'auto'(default), 'tokenizer', a path to a jinja2 file.", "examples": [ "auto", "tokenizer", "A filesystem path expressing the location of a custom template" ], "title": "Chat Template" }, "backend": { "anyOf": [ { "pattern": "vllm|llama-cpp", "type": "string" }, { "type": "null" } ], "default": null, "description": "Serving backend to use to host the model.", "examples": [ "vllm", "llama-cpp" ], "title": "Backend" } }, "title": "_serve", "type": "object" }, "_serve_llama_cpp": { "description": "Class describing configuration of llama-cpp serving backend.", "properties": { "gpu_layers": { "default": -1, "description": "Number of model layers to offload to GPU. -1 means all layers.", "title": "Gpu Layers", "type": "integer" }, "max_ctx_size": { "default": 4096, "description": "Maximum number of tokens that can be processed by the model.", "exclusiveMinimum": 0, "title": "Max Ctx Size", "type": "integer" }, "llm_family": { "default": "", "description": "Large Language Model Family", "examples": [ "granite", "mixtral" ], "title": "Llm Family", "type": "string" } }, "title": "_serve_llama_cpp", "type": "object" }, "_serve_server": { "description": "Class describing configuration of server serving backend.", "properties": { "host": { "default": "127.0.0.1", "description": "Host to serve on.", "title": "Host", "type": "string" }, "port": { "default": 8000, "description": "Port to serve on.", "title": "Port", "type": "integer" } }, "title": "_serve_server", "type": "object" }, "_serve_vllm": { "description": "Class describing configuration of vLLM serving backend.", "properties": { "llm_family": { "default": "", "description": "Large Language Model Family", "examples": [ "granite", "mixtral" ], "title": "Llm Family", "type": "string" }, "max_startup_attempts": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": 120, "description": "Maximum number of attempts to start the vLLM server.", "title": "Max Startup Attempts" }, "gpus": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Number of GPUs to use.", "title": "Gpus" }, "vllm_args": { "anyOf": [ { "items": { "type": "string" }, "type": "array" }, { "type": "null" } ], "description": "vLLM specific arguments. All settings can be passed as a list of strings, see: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html", "examples": [ [ "--dtype", "auto" ], [ "--lora-alpha", "32" ] ], "title": "Vllm Args" } }, "title": "_serve_vllm", "type": "object" }, "_train": { "description": "Class describing configuration of the 'train' sub-command.", "properties": { "pipeline": { "default": "full", "description": "Training pipeline to use. Simple is for systems with limited resources, full is for more capable consumer systems (64 GB of RAM), and accelerated is for systems with a dedicated GPU.", "examples": [ "simple", "full", "accelerated" ], "pattern": "simple|full|accelerated", "title": "Pipeline", "type": "string" }, "model_path": { "default": "instructlab/granite-7b-lab", "description": "Directory where the model to be trained is stored.", "title": "Model Path", "type": "string" }, "device": { "default": "cpu", "description": "PyTorch device to use. Use 'cpu' for 'simple' and 'full' training on Linux. Use 'mps' for 'full' training on MacOS Metal Performance Shader. Use 'cuda' for Nvidia CUDA / AMD ROCm GPUs. Use 'hpu' for Intel Gaudi GPUs.", "examples": [ "cpu", "mps", "cuda", "hpu" ], "pattern": "cpu|mps|cuda|hpu", "title": "Device", "type": "string" }, "data_path": { "description": "For the training library (primary training method), this specifies the path to the dataset file. For legacy training (MacOS/Linux), this specifies the path to the directory.", "title": "Data Path", "type": "string" }, "ckpt_output_dir": { "description": "Directory where periodic training checkpoints are stored.", "title": "Ckpt Output Dir", "type": "string" }, "data_output_dir": { "description": "Directory where the processed training data is stored (post filtering/tokenization/masking).", "title": "Data Output Dir", "type": "string" }, "max_seq_len": { "default": 4096, "description": "Maximum sequence length to be included in the training set. Samples exceeding this length will be dropped.", "title": "Max Seq Len", "type": "integer" }, "max_batch_len": { "default": 5000, "description": "Maximum tokens per gpu for each batch that will be handled in a single step. If running into out-of-memory errors, this value can be lowered but not below the `max_seq_len`.", "title": "Max Batch Len", "type": "integer" }, "num_epochs": { "default": 10, "description": "Number of epochs to run training for.", "title": "Num Epochs", "type": "integer" }, "effective_batch_size": { "default": 64, "description": "The number of samples in a batch that the model should see before its parameters are updated.", "title": "Effective Batch Size", "type": "integer" }, "save_samples": { "default": 250000, "description": "Number of samples the model should see before saving a checkpoint.", "title": "Save Samples", "type": "integer" }, "checkpoint_at_epoch": { "default": true, "description": "Save a checkpoint at the end of each epoch.", "title": "Checkpoint At Epoch", "type": "boolean" }, "deepspeed_cpu_offload_optimizer": { "default": false, "description": "Allow CPU offload for deepspeed optimizer.", "title": "Deepspeed Cpu Offload Optimizer", "type": "boolean" }, "fsdp_cpu_offload_optimizer": { "default": false, "description": "Allow CPU offload for FSDP optimizer.", "title": "Fsdp Cpu Offload Optimizer", "type": "boolean" }, "distributed_backend": { "$ref": "#/$defs/DistributedBackend", "default": "fsdp", "description": "Pick a distributed training backend framework for GPU accelerated full fine-tuning." }, "lora_rank": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": 0, "description": "Rank of low rank matrices to be used during training.", "title": "Lora Rank" }, "lora_quantize_dtype": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "nf4", "description": "The data type for quantization in LoRA training. Valid options are 'None' and 'nf4'.", "examples": [ "nf4" ], "title": "Lora Quantize Dtype" }, "is_padding_free": { "default": false, "description": "Boolean to indicate if the model being trained is a padding-free transformer model such as Granite.", "title": "Is Padding Free", "type": "boolean" }, "nproc_per_node": { "default": 1, "description": "Number of GPUs to use for training. This value is not supported in legacy training or MacOS.", "title": "Nproc Per Node", "type": "integer" }, "disable_flash_attn": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": false, "description": "Whether or not we should disable the use of flash-attention during training. This is useful when using older GPUs.", "title": "Disable Flash Attn" }, "additional_args": { "description": "Additional arguments to pass to the training script. These arguments are passed as key-value pairs to the training script.", "title": "Additional Args", "type": "object" }, "phased_phase1_num_epochs": { "anyOf": [ { "exclusiveMinimum": 0, "type": "integer" }, { "type": "null" } ], "default": 7, "description": "Number of epochs to run training for during phase1 (experimentally optimal number is 7).", "title": "Phased Phase1 Num Epochs" }, "phased_phase1_samples_per_save": { "default": 0, "description": "Number of samples the model should see before saving a checkpoint during phase1. Disabled when set to 0.", "minimum": 0, "title": "Phased Phase1 Samples Per Save", "type": "integer" }, "phased_phase1_learning_rate": { "default": 2e-05, "description": "Learning rate for phase1 knowledge training.", "minimum": 0.0, "title": "Phased Phase1 Learning Rate", "type": "number" }, "phased_phase1_effective_batch_size": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": 128, "description": "Phased phase1 effective batch size.", "title": "Phased Phase1 Effective Batch Size" }, "phased_phase2_num_epochs": { "anyOf": [ { "exclusiveMinimum": 0, "type": "integer" }, { "type": "null" } ], "default": 10, "description": "Number of epochs to run training for during phase2.", "title": "Phased Phase2 Num Epochs" }, "phased_phase2_samples_per_save": { "default": 0, "description": "Number of samples the model should see before saving a checkpoint during phase2. Disabled when set to 0.", "minimum": 0, "title": "Phased Phase2 Samples Per Save", "type": "integer" }, "phased_phase2_learning_rate": { "default": 6e-06, "description": "Learning rate for phase2 skills training.", "minimum": 0.0, "title": "Phased Phase2 Learning Rate", "type": "number" }, "phased_phase2_effective_batch_size": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": 3840, "description": "Phased phase2 effective batch size.", "title": "Phased Phase2 Effective Batch Size" }, "phased_mt_bench_judge": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "description": "Judge model path for phased MT-Bench evaluation.", "title": "Phased Mt Bench Judge" }, "phased_base_dir": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "description": "Base directory for organization of end-to-end intermediate outputs.", "title": "Phased Base Dir" }, "training_journal": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Optional path to a yaml file that tracks the progress of multiphase training.", "title": "Training Journal" } }, "title": "_train", "type": "object" } } }
- Fields:
- field metadata: _metadata [Optional]Ā¶
Metadata pertaining to the specifics of the system which the Configuration is meant to be applied to.
- field version: str = '1.0.0'Ā¶
Configuration file structure version.
GeneralĀ¶
- pydantic model instructlab.configuration._generalĀ¶
Class describing various top-level configuration options for all commands.
- field debug_level: int = 0Ā¶
Debug level for logging.
- Validated by:
after_debug_level
- field log_format: Annotated[str, Strict(strict=True)] = '%(levelname)s %(asctime)s %(name)s:%(lineno)d: %(message)s'Ā¶
Log format. https://docs.python.org/3/library/logging.html#logrecord-attributes
- Constraints:
strict = True
- Validated by:
after_debug_level
validate_log_format
- field log_level: Annotated[str, Strict(strict=True)] = 'INFO'Ā¶
Log level for logging.
- Constraints:
strict = True
- Validated by:
after_debug_level
validate_log_level
- field use_legacy_tmpl: bool = FalseĀ¶
Use legacy IBM Granite chat template (default uses 3.0 Instruct template)
- Validated by:
after_debug_level
MetadataĀ¶
- pydantic model instructlab.configuration._metadataĀ¶
- Fields:
- field cpu_info: str | None = NoneĀ¶
Manufacturer, Family, and SKU of the system CPU, ex: Apple M3 Max
- field gpu_count: int | None = NoneĀ¶
Amount of GPUs on the system, ex: 8
- field gpu_family: str | None = NoneĀ¶
Family of the system GPU, ex: H100
- field gpu_manufacturer: str | None = NoneĀ¶
Manufacturer of the system GPU, ex: Nvidia
- field gpu_sku: list[str] | None = NoneĀ¶
Specific SKU related information about the given GPU, ex: PCIe, NVL
ilab model chatĀ¶
- pydantic model instructlab.configuration._chatĀ¶
Class describing configuration of the āchatā sub-command.
- Fields:
- field context: str = 'default'Ā¶
Predefined setting or environment that influences the behavior and responses of the chat assistant. Each context is associated with a specific prompt that guides the assistant on how to respond to user inputs. Available contexts: default, cli_helper.
- field logs_dir: str [Optional]Ā¶
Directory where chat logs are stored.
- field max_tokens: int | None = NoneĀ¶
The maximum number of tokens that can be generated in the chat completion. Be aware that larger values use more memory.
- field model: str [Optional]Ā¶
Model to be used for chatting with.
- field session: str | None = NoneĀ¶
Filepath of a dialog session file.
- field temperature: float = 1.0Ā¶
Controls the randomness of the modelās responses. Lower values make the output more deterministic, while higher values produce more random results.
- field vi_mode: bool = FalseĀ¶
Enable vim keybindings for chat.
- field visible_overflow: bool = TrueĀ¶
Renders vertical overflow if enabled, displays ellipses otherwise.
ilab model evaluateĀ¶
- pydantic model instructlab.configuration._evaluateĀ¶
Class describing configuration of the āevaluateā sub-command.
- Fields:
- field base_branch: str | None = NoneĀ¶
Base taxonomy branch
- field base_model: str = 'instructlab/granite-7b-lab'Ā¶
Base model to compare with āmodelā for mt_bench_branch and mmlu_branch.
- field branch: str | None = NoneĀ¶
Taxonomy branch containing custom skills/knowledge that should be used for evaluation runs.
- field gpus: int | None = NoneĀ¶
Number of GPUs to use for running evaluation.
- field mmlu_branch: _mmlubranch [Optional]Ā¶
Settings to run MMLU against a branch of taxonomy containing custom skills/knowledge used for training.
- field model: str | None = NoneĀ¶
Model to be evaluated
- field mt_bench_branch: _mtbenchbranch [Optional]Ā¶
Settings to run MT-Bench against a branch of taxonomy containing custom skills/knowledge used for training
- pydantic model instructlab.configuration._mmluĀ¶
Class describing configuration of MMLU evaluation benchmark.
- field batch_size: str | int = 'auto'Ā¶
Batch size for evaluation. Valid values are a positive integer or āautoā to select the largest batch size that will fit in memory.
- field few_shots: int = 5Ā¶
Number of question-answer pairs provided in the context preceding the question used for evaluation.
- pydantic model instructlab.configuration._mmlubranchĀ¶
Class describing configuration of MMLUBranch evaluation benchmark.
- Fields:
- field tasks_dir: str [Optional]Ā¶
Directory where custom MMLU tasks are stored.
- pydantic model instructlab.configuration._mtbenchĀ¶
Class describing configuration of MTBench evaluation benchmark.
- field judge_model: str [Optional]Ā¶
Judge model for mt_bench and mt_bench_branch.
- field max_workers: str | int = 'auto'Ā¶
Number of workers to use for evaluation with mt_bench or mt_bench_branch. Must be a positive integer or āautoā.
- field output_dir: str [Optional]Ā¶
Directory where evaluation results are stored.
ilab data generateĀ¶
- pydantic model instructlab.configuration._generateĀ¶
Class describing configuration of the āgenerateā sub-command.
- Fields:
- field chunk_word_count: Annotated[int, Gt(gt=0)] = 1000Ā¶
Maximum number of words per chunk.
- Constraints:
gt = 0
- field max_num_tokens: int | None = 4096Ā¶
The maximum amount of tokens for the model to generate during knowledge generation. A lower number yields less data but a faster SDG run. It is reccomended to use this on consumer hardware
- field model: Annotated[str, Strict(strict=True)] [Optional]Ā¶
Teacher model that will be used to synthetically generate training data.
- Constraints:
strict = True
- field num_cpus: Annotated[int, Gt(gt=0)] = 10Ā¶
Number of CPU cores to use for generation.
- Constraints:
gt = 0
- field output_dir: Annotated[str, Strict(strict=True)] [Optional]Ā¶
Directory where generated datasets are stored.
- Constraints:
strict = True
- field pipeline: str | None = 'full'Ā¶
Data generation pipeline to use. Available: āsimpleā, āfullā, or a valid path to a directory of pipeline workflow YAML files. Note that āfullā requires a larger teacher model, Mixtral-8x7b.
- field sdg_scale_factor: Annotated[int, Gt(gt=0)] | None = 30Ā¶
The total number of instructions to be generated.
- field taxonomy_base: Annotated[str, Strict(strict=True)] = 'origin/main'Ā¶
Branch of taxonomy used to calculate diff against.
- Constraints:
strict = True
- field taxonomy_path: Annotated[str, Strict(strict=True)] [Optional]Ā¶
Directory where taxonomy is stored and accessed from.
- Constraints:
strict = True
- num_instructions: int | NoneĀ¶
Data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.
- msgĀ¶
The deprecation message to be emitted.
- wrapped_propertyĀ¶
The property instance if the deprecated field is a computed field, or None.
- field_nameĀ¶
The name of the field being deprecated.
ilab model serveĀ¶
- pydantic model instructlab.configuration._serveĀ¶
Class describing configuration of the āserveā sub-command.
- Fields:
- field backend: str | None = NoneĀ¶
Serving backend to use to host the model.
- Constraints:
pattern = vllm|llama-cpp
- field chat_template: str | None = NoneĀ¶
Chat template to supply to the model. Possible values: āautoā(default), ātokenizerā, a path to a jinja2 file.
- field llama_cpp: _serve_llama_cpp [Optional]Ā¶
llama-cpp serving settings.
- field model_path: Annotated[str, Strict(strict=True)] [Optional]Ā¶
Directory where model to be served is stored.
- Constraints:
strict = True
- field server: _serve_server = _serve_server(host='127.0.0.1', port=8000)Ā¶
Server configuration including host and port.
- field vllm: _serve_vllm [Optional]Ā¶
vLLM serving settings.
- api_base()Ā¶
Returns server API URL, based on the configured host and port
- pydantic model instructlab.configuration._serve_llama_cppĀ¶
Class describing configuration of llama-cpp serving backend.
- field gpu_layers: int = -1Ā¶
Number of model layers to offload to GPU. -1 means all layers.
- field llm_family: str = ''Ā¶
Large Language Model Family
- field max_ctx_size: Annotated[int, Gt(gt=0)] = 4096Ā¶
Maximum number of tokens that can be processed by the model.
- Constraints:
gt = 0
- pydantic model instructlab.configuration._serve_vllmĀ¶
Class describing configuration of vLLM serving backend.
- Fields:
- field gpus: int | None = NoneĀ¶
Number of GPUs to use.
- field llm_family: str = ''Ā¶
Large Language Model Family
- field max_startup_attempts: int | None = 120Ā¶
Maximum number of attempts to start the vLLM server.
- field vllm_args: list[str] | None [Optional]Ā¶
vLLM specific arguments. All settings can be passed as a list of strings, see: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
- pydantic model instructlab.configuration._serve_serverĀ¶
Class describing configuration of server serving backend.
- Fields:
- field host: Annotated[str, Strict(strict=True)] = '127.0.0.1'Ā¶
Host to serve on.
- Constraints:
strict = True
- field port: Annotated[int, Strict(strict=True)] = 8000Ā¶
Port to serve on.
- Constraints:
strict = True
ilab model trainĀ¶
- pydantic model instructlab.configuration._trainĀ¶
Class describing configuration of the ātrainā sub-command.
- field additional_args: dict[str, Any] [Optional]Ā¶
Additional arguments to pass to the training script. These arguments are passed as key-value pairs to the training script.
- field checkpoint_at_epoch: bool = TrueĀ¶
Save a checkpoint at the end of each epoch.
- field ckpt_output_dir: str [Optional]Ā¶
Directory where periodic training checkpoints are stored.
- field data_output_dir: str [Optional]Ā¶
Directory where the processed training data is stored (post filtering/tokenization/masking).
- field data_path: str [Optional]Ā¶
For the training library (primary training method), this specifies the path to the dataset file. For legacy training (MacOS/Linux), this specifies the path to the directory.
- field deepspeed_cpu_offload_optimizer: bool = FalseĀ¶
Allow CPU offload for deepspeed optimizer.
- field device: str = 'cpu'Ā¶
PyTorch device to use. Use ācpuā for āsimpleā and āfullā training on Linux. Use āmpsā for āfullā training on MacOS Metal Performance Shader. Use ācudaā for Nvidia CUDA / AMD ROCm GPUs. Use āhpuā for Intel Gaudi GPUs.
- Constraints:
pattern = cpu|mps|cuda|hpu
- field disable_flash_attn: bool | None = FalseĀ¶
Whether or not we should disable the use of flash-attention during training. This is useful when using older GPUs.
- field distributed_backend: DistributedBackend = DistributedBackend.FSDPĀ¶
Pick a distributed training backend framework for GPU accelerated full fine-tuning.
- field effective_batch_size: int = 64Ā¶
The number of samples in a batch that the model should see before its parameters are updated.
- field fsdp_cpu_offload_optimizer: bool = FalseĀ¶
Allow CPU offload for FSDP optimizer.
- field is_padding_free: bool = FalseĀ¶
Boolean to indicate if the model being trained is a padding-free transformer model such as Granite.
- field lora_quantize_dtype: str | None = 'nf4'Ā¶
The data type for quantization in LoRA training. Valid options are āNoneā and ānf4ā.
- field lora_rank: int | None = 0Ā¶
Rank of low rank matrices to be used during training.
- field max_batch_len: int = 5000Ā¶
Maximum tokens per gpu for each batch that will be handled in a single step. If running into out-of-memory errors, this value can be lowered but not below the max_seq_len.
- field max_seq_len: int = 4096Ā¶
Maximum sequence length to be included in the training set. Samples exceeding this length will be dropped.
- field model_path: str = 'instructlab/granite-7b-lab'Ā¶
Directory where the model to be trained is stored.
- field nproc_per_node: int = 1Ā¶
Number of GPUs to use for training. This value is not supported in legacy training or MacOS.
- field num_epochs: int = 10Ā¶
Number of epochs to run training for.
- field phased_base_dir: str | None [Optional]Ā¶
Base directory for organization of end-to-end intermediate outputs.
- field phased_mt_bench_judge: str | None [Optional]Ā¶
Judge model path for phased MT-Bench evaluation.
- field phased_phase1_effective_batch_size: int | None = 128Ā¶
Phased phase1 effective batch size.
- field phased_phase1_learning_rate: float = 2e-05Ā¶
Learning rate for phase1 knowledge training.
- Constraints:
ge = 0
- field phased_phase1_num_epochs: int | None = 7Ā¶
Number of epochs to run training for during phase1 (experimentally optimal number is 7).
- Constraints:
gt = 0
- field phased_phase1_samples_per_save: int = 0Ā¶
Number of samples the model should see before saving a checkpoint during phase1. Disabled when set to 0.
- Constraints:
ge = 0
- field phased_phase2_effective_batch_size: int | None = 3840Ā¶
Phased phase2 effective batch size.
- field phased_phase2_learning_rate: float = 6e-06Ā¶
Learning rate for phase2 skills training.
- Constraints:
ge = 0
- field phased_phase2_num_epochs: int | None = 10Ā¶
Number of epochs to run training for during phase2.
- Constraints:
gt = 0
- field phased_phase2_samples_per_save: int = 0Ā¶
Number of samples the model should see before saving a checkpoint during phase2. Disabled when set to 0.
- Constraints:
ge = 0
- field pipeline: str = 'full'Ā¶
Training pipeline to use. Simple is for systems with limited resources, full is for more capable consumer systems (64 GB of RAM), and accelerated is for systems with a dedicated GPU.
- Constraints:
pattern = simple|full|accelerated
- field save_samples: int = 250000Ā¶
Number of samples the model should see before saving a checkpoint.
- field training_journal: str | None = NoneĀ¶
Optional path to a yaml file that tracks the progress of multiphase training.