v0.0.124
· One min read
- Added auto quantize for generation models reducing the max memory requirement by 4 folds
example
# load model with auto_quantize
model = predacons.load_model(model_name,auto_quantize="4bit")
<!-- truncate -->
tokenizer = predacons.load_tokenizer(model_name)
# generate response
sequence = "Explain the concept of acceleration in physics."
output,tokenizer =predacons.generate(model = model,
sequence = sequence,
max_length = 500,
tokenizer = tokenizer,
trust_remote_code = True)
# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
What's Changed
- added auto quantize to all the model methods by @shouryashashank in https://github.com/Predacons/predacons/pull/37
Full Changelog: https://github.com/Predacons/predacons/compare/v0.0.123...v0.0.124
Github Release Page: https://github.com/Predacons/predacons/releases/tag/v0.0.124