Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1: The output text is truncated #1153

Open
Gumichocopengin8 opened this issue Jul 28, 2024 · 3 comments
Open

Llama 3.1: The output text is truncated #1153

Gumichocopengin8 opened this issue Jul 28, 2024 · 3 comments

Comments

@Gumichocopengin8
Copy link

Gumichocopengin8 commented Jul 28, 2024

Describe the bug

Found a similar issue with Llama 2 #717, but this is for Llama 3.1.
The output text is cut off and cannot see the entire text result.
Is there a way to extend the max length of the output text? What is the default max length?

Minimal reproducible example

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B"

pipeline = transformers.pipeline(
  "text-generation",
  model=model_id,
  model_kwargs={"torch_dtype": torch.bfloat16},
  device="cpu",
)

pipeline("Hey how are you doing today?")

Output

[{'generated_text': 'Hey how are you doing today? I’m doing good. I’m just here to talk about'}]

Runtime Environment

  • Model: meta-llama/Meta-Llama-3.1-8B
  • Using via huggingface?: yes
  • OS: Mac with Apple Silicon
  • GPU VRAM: N/A (used CPU)
  • Number of GPUs: N/A (used CPU)
  • GPU Make: N/A (used CPU)

Additional context
Add any other context about the problem or environment here.

@Gumichocopengin8 Gumichocopengin8 changed the title Llama 3.1: The output text is cut off Llama 3.1: The output text is truncated Jul 28, 2024
@lmntrx-sys
Copy link

while there are many possible issues with the environment which i cannot assist you virtually Model Configuration The configuration of the language model, such as the maximum token limit set for generation, can lead to truncation. If the max_gen_length parameter is set to a low value, the output will be cut off after reaching that limit. the source code specifically sets this parameter to 64 as default

@lmntrx-sys
Copy link

usage of a cpu may also be a reason for the truncated output. Running a large language model on a CPU can be resource-intensive. If the system runs out of memory or CPU resources, it might truncate the output to prevent crashes or excessive lag.

@irtiq7
Copy link

irtiq7 commented Sep 21, 2024

Anyone managed to solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants