v0.0.124

September 18, 2024 · One min read

Maintainer of Predacons

Added auto quantize for generation models reducing the max memory requirement by 4 folds

example

# load model with auto_quantize
model = predacons.load_model(model_name,auto_quantize="4bit")
<!-- truncate -->
tokenizer = predacons.load_tokenizer(model_name)

# generate response
sequence = "Explain the concept of acceleration in physics."
output,tokenizer =predacons.generate(model = model,
        sequence = sequence,
        max_length = 500,
        tokenizer = tokenizer,
        trust_remote_code = True)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

What's Changed

added auto quantize to all the model methods by @shouryashashank in https://github.com/Predacons/predacons/pull/37

Full Changelog: https://github.com/Predacons/predacons/compare/v0.0.123...v0.0.124

Github Release Page: https://github.com/Predacons/predacons/releases/tag/v0.0.124

example​

What's Changed​

example

What's Changed