API Reference

Core Classes

class outformer.core.jsonformer.Jsonformer(model, tokenizer, *, debug=False, max_array_length=10, max_tokens_number=6, max_tokens_string=10, temperature=0.7, generation_marker='|GENERATION|', max_attempts=3)[source]

Bases: object

A class that generates structured JSON outputs from language models.

Only generates content values, not structural elements
Follows the provided JSON schema
Builds the JSON object incrementally
Uses a token processor to stop generation at the appropriate time

This ensures that the output is always a valid JSON object conforming to the specified schema.

__init__(model, tokenizer, *, debug=False, max_array_length=10, max_tokens_number=6, max_tokens_string=10, temperature=0.7, generation_marker='|GENERATION|', max_attempts=3)[source]

Initialize a Jsonformer instance.

Parameters:

model (PreTrainedModel) – The model to use for generation
tokenizer (PreTrainedTokenizer) – The tokenizer to use for generation
debug (bool) – Whether to print debug information
max_array_length (int) – The maximum number of elements in an array
max_tokens_number (int) – The maximum number of tokens in a number
max_tokens_string (int) – The maximum number of tokens in a string
temperature (float) – The temperature to use for generation
generation_marker (str) – The marker used to track the current generation position in the JSON
max_attempts (int) – The maximum number of attempts for value generation (currently used in number generation)

generate(schema, prompt, *, debug=None, max_array_length=None, max_tokens_number=None, max_tokens_string=None, temperature=None, max_attempts=None)[source]

Generate a JSON object according to the schema and prompt.

Parameters:

schema (Dict[str, Any]) – The schema defining the JSON structure
prompt (str) – The prompt guiding the generation
debug (Optional[bool]) – Whether to enable debug mode
max_array_length (Optional[int]) – The maximum length of arrays to generate
max_tokens_number (Optional[int]) – The maximum number of tokens to generate for numbers
max_tokens_string (Optional[int]) – The maximum number of tokens to generate for strings
temperature (Optional[float]) – The temperature for the generation
max_attempts (Optional[int]) – The maximum number of attempts for value generation (currently used in number generation)

Returns:

The generated JSON object conforming to the schema

Return type:

Dict[str, Any]

Raises:

ValueError – If schema is invalid or prompt is empty

Token Processors

class outformer.core.token_processors.StringStoppingCriteria(tokenizer, prompt_length)[source]

Bases: StoppingCriteria

Stops string generation when a closing quote is encountered.

__init__(tokenizer, prompt_length)[source]

Parameters:

tokenizer (PreTrainedTokenizer) – The tokenizer to use.
prompt_length (int) – The length of the prompt.

class outformer.core.token_processors.NumberStoppingCriteria(tokenizer, prompt_length, precision=3)[source]

Bases: StoppingCriteria

Stops number generation when a complete number has been generated. A number is considered complete when:

It contains more than one decimal point (invalid, so stop)

It has a decimal point and has exceeded the specified precision

A non-digit character like space or newline is found after digits

__init__(tokenizer, prompt_length, precision=3)[source]

Parameters:

tokenizer (PreTrainedTokenizer) – The tokenizer to use.
prompt_length (int) – The length of the prompt.
precision (int) – The precision of the number.

class outformer.core.token_processors.OutputNumbersTokens(tokenizer)[source]

Bases: LogitsProcessor

Restricts token generation to only those that can be part of a valid number.

__init__(tokenizer)[source]

Parameters:: tokenizer (PreTrainedTokenizer) – The tokenizer to use.

class outformer.core.token_processors.OutputCommaAndBracketTokens(tokenizer)[source]

Bases: LogitsProcessor

LogitsProcessor that constrains generation to only comma and closing bracket tokens.

This processor is specifically used in array generation to determine whether to: 1. Continue the array (when comma is generated) 2. End the array (when closing bracket is generated)

It ensures that the model can only choose between these two structural elements, preventing any other tokens from being generated at array element boundaries.

__init__(tokenizer)[source]

Parameters:: tokenizer (PreTrainedTokenizer) – The tokenizer to use.

Formatters

outformer.formatters.highlight.highlight_values(values, color='magenta', on_color=None, attrs=None)[source]

Recursively prints a JSON object with highlighted values.

Parameters:

values (Union[dict, list, str]) – The JSON object to print
color (Union[str, tuple[int, int, int], None]) – The color to use for the highlighted values
on_color (Union[str, tuple[int, int, int], None]) – The color to use for the background of the highlighted values
attrs (Union[Iterable[str], None]) – Additional attributes to use for the highlighted values

Return type:

None