Skip to main content

v0.0.124

· One min read
Shourya Shashank
Maintainer of Predacons
  • Added auto quantize for generation models reducing the max memory requirement by 4 folds

example

# load model with auto_quantize
model = predacons.load_model(model_name,auto_quantize="4bit")
<!-- truncate -->
tokenizer = predacons.load_tokenizer(model_name)

# generate response
sequence = "Explain the concept of acceleration in physics."
output,tokenizer =predacons.generate(model = model,
sequence = sequence,
max_length = 500,
tokenizer = tokenizer,
trust_remote_code = True)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

What's Changed

Full Changelog: https://github.com/Predacons/predacons/compare/v0.0.123...v0.0.124


Github Release Page: https://github.com/Predacons/predacons/releases/tag/v0.0.124