API Reference

Core Classes

class outformer.core.jsonformer.Jsonformer(model, tokenizer, *, debug=False, max_array_length=10, max_tokens_number=6, max_tokens_string=10, temperature=0.7, generation_marker='|GENERATION|', max_attempts=3)[source]

Bases: object

A class that generates structured JSON outputs from language models.

  1. Only generates content values, not structural elements

  2. Follows the provided JSON schema

  3. Builds the JSON object incrementally

  4. Uses a token processor to stop generation at the appropriate time

This ensures that the output is always a valid JSON object conforming to the specified schema.

__init__(model, tokenizer, *, debug=False, max_array_length=10, max_tokens_number=6, max_tokens_string=10, temperature=0.7, generation_marker='|GENERATION|', max_attempts=3)[source]

Initialize a Jsonformer instance.

Parameters:
  • model (PreTrainedModel) – The model to use for generation

  • tokenizer (PreTrainedTokenizer) – The tokenizer to use for generation

  • debug (bool) – Whether to print debug information

  • max_array_length (int) – The maximum number of elements in an array

  • max_tokens_number (int) – The maximum number of tokens in a number

  • max_tokens_string (int) – The maximum number of tokens in a string

  • temperature (float) – The temperature to use for generation

  • generation_marker (str) – The marker used to track the current generation position in the JSON

  • max_attempts (int) – The maximum number of attempts for value generation (currently used in number generation)

generate(schema, prompt, *, debug=None, max_array_length=None, max_tokens_number=None, max_tokens_string=None, temperature=None, max_attempts=None)[source]

Generate a JSON object according to the schema and prompt.

Parameters:
  • schema (Dict[str, Any]) – The schema defining the JSON structure

  • prompt (str) – The prompt guiding the generation

  • debug (Optional[bool]) – Whether to enable debug mode

  • max_array_length (Optional[int]) – The maximum length of arrays to generate

  • max_tokens_number (Optional[int]) – The maximum number of tokens to generate for numbers

  • max_tokens_string (Optional[int]) – The maximum number of tokens to generate for strings

  • temperature (Optional[float]) – The temperature for the generation

  • max_attempts (Optional[int]) – The maximum number of attempts for value generation (currently used in number generation)

Returns:

The generated JSON object conforming to the schema

Return type:

Dict[str, Any]

Raises:

ValueError – If schema is invalid or prompt is empty

Token Processors

class outformer.core.token_processors.StringStoppingCriteria(tokenizer, prompt_length)[source]

Bases: StoppingCriteria

Stops string generation when a closing quote is encountered.

__init__(tokenizer, prompt_length)[source]
Parameters:
  • tokenizer (PreTrainedTokenizer) – The tokenizer to use.

  • prompt_length (int) – The length of the prompt.

class outformer.core.token_processors.NumberStoppingCriteria(tokenizer, prompt_length, precision=3)[source]

Bases: StoppingCriteria

Stops number generation when a complete number has been generated. A number is considered complete when:

  1. It contains more than one decimal point (invalid, so stop)

  2. It has a decimal point and has exceeded the specified precision

  3. A non-digit character like space or newline is found after digits

__init__(tokenizer, prompt_length, precision=3)[source]
Parameters:
  • tokenizer (PreTrainedTokenizer) – The tokenizer to use.

  • prompt_length (int) – The length of the prompt.

  • precision (int) – The precision of the number.

class outformer.core.token_processors.OutputNumbersTokens(tokenizer)[source]

Bases: LogitsProcessor

Restricts token generation to only those that can be part of a valid number.

__init__(tokenizer)[source]
Parameters:

tokenizer (PreTrainedTokenizer) – The tokenizer to use.

class outformer.core.token_processors.OutputCommaAndBracketTokens(tokenizer)[source]

Bases: LogitsProcessor

LogitsProcessor that constrains generation to only comma and closing bracket tokens.

This processor is specifically used in array generation to determine whether to: 1. Continue the array (when comma is generated) 2. End the array (when closing bracket is generated)

It ensures that the model can only choose between these two structural elements, preventing any other tokens from being generated at array element boundaries.

__init__(tokenizer)[source]
Parameters:

tokenizer (PreTrainedTokenizer) – The tokenizer to use.

Formatters

outformer.formatters.highlight.highlight_values(values, color='magenta', on_color=None, attrs=None)[source]

Recursively prints a JSON object with highlighted values.

Parameters:
  • values (Union[dict, list, str]) – The JSON object to print

  • color (Union[str, tuple[int, int, int], None]) – The color to use for the highlighted values

  • on_color (Union[str, tuple[int, int, int], None]) – The color to use for the background of the highlighted values

  • attrs (Union[Iterable[str], None]) – Additional attributes to use for the highlighted values

Return type:

None