API Reference
Core Classes
- class outformer.core.jsonformer.Jsonformer(model, tokenizer, *, debug=False, max_array_length=10, max_tokens_number=6, max_tokens_string=10, temperature=0.7, generation_marker='|GENERATION|', max_attempts=3)[source]
Bases:
objectA class that generates structured JSON outputs from language models.
Only generates content values, not structural elements
Follows the provided JSON schema
Builds the JSON object incrementally
Uses a token processor to stop generation at the appropriate time
This ensures that the output is always a valid JSON object conforming to the specified schema.
- __init__(model, tokenizer, *, debug=False, max_array_length=10, max_tokens_number=6, max_tokens_string=10, temperature=0.7, generation_marker='|GENERATION|', max_attempts=3)[source]
Initialize a Jsonformer instance.
- Parameters:
model (PreTrainedModel) – The model to use for generation
tokenizer (PreTrainedTokenizer) – The tokenizer to use for generation
debug (bool) – Whether to print debug information
max_array_length (int) – The maximum number of elements in an array
max_tokens_number (int) – The maximum number of tokens in a number
max_tokens_string (int) – The maximum number of tokens in a string
temperature (float) – The temperature to use for generation
generation_marker (str) – The marker used to track the current generation position in the JSON
max_attempts (int) – The maximum number of attempts for value generation (currently used in number generation)
- generate(schema, prompt, *, debug=None, max_array_length=None, max_tokens_number=None, max_tokens_string=None, temperature=None, max_attempts=None)[source]
Generate a JSON object according to the schema and prompt.
- Parameters:
schema (Dict[str, Any]) – The schema defining the JSON structure
prompt (str) – The prompt guiding the generation
debug (Optional[bool]) – Whether to enable debug mode
max_array_length (Optional[int]) – The maximum length of arrays to generate
max_tokens_number (Optional[int]) – The maximum number of tokens to generate for numbers
max_tokens_string (Optional[int]) – The maximum number of tokens to generate for strings
temperature (Optional[float]) – The temperature for the generation
max_attempts (Optional[int]) – The maximum number of attempts for value generation (currently used in number generation)
- Returns:
The generated JSON object conforming to the schema
- Return type:
Dict[str, Any]
- Raises:
ValueError – If schema is invalid or prompt is empty
Token Processors
- class outformer.core.token_processors.StringStoppingCriteria(tokenizer, prompt_length)[source]
Bases:
StoppingCriteriaStops string generation when a closing quote is encountered.
- class outformer.core.token_processors.NumberStoppingCriteria(tokenizer, prompt_length, precision=3)[source]
Bases:
StoppingCriteriaStops number generation when a complete number has been generated. A number is considered complete when:
It contains more than one decimal point (invalid, so stop)
It has a decimal point and has exceeded the specified precision
A non-digit character like space or newline is found after digits
- class outformer.core.token_processors.OutputNumbersTokens(tokenizer)[source]
Bases:
LogitsProcessorRestricts token generation to only those that can be part of a valid number.
- class outformer.core.token_processors.OutputCommaAndBracketTokens(tokenizer)[source]
Bases:
LogitsProcessorLogitsProcessor that constrains generation to only comma and closing bracket tokens.
This processor is specifically used in array generation to determine whether to: 1. Continue the array (when comma is generated) 2. End the array (when closing bracket is generated)
It ensures that the model can only choose between these two structural elements, preventing any other tokens from being generated at array element boundaries.
Formatters
- outformer.formatters.highlight.highlight_values(values, color='magenta', on_color=None, attrs=None)[source]
Recursively prints a JSON object with highlighted values.
- Parameters:
- Return type: