[GRPO] Try returning hidden statex for GRPO by Datta0 · Pull Request #5142 · unslothai/unsloth

Datta0 · 2026-04-23T05:01:48Z

unslothai/unsloth-zoo#602 fixed an important issue where when some models return logits we were failing with shape mismatch. This is because for GRPO we generally expect hidden states to be returned with our wrappers (for most models ofc) adn lm_head is applied chunk wise to avoid materialising full large logits which are much larger than hidden states (4K vs 256K ish for eg)
This is an effort to make more models return hidden states for efficiency reasons. Orthogonal to the above mentioned PR :)

Ref: unslothai/unsloth-zoo#609

gemini-code-assist

Code Review

This pull request implements a fallback mechanism for GRPO to retrieve hidden states from models that do not natively support it by wrapping the forward method. Feedback suggests optimizing performance by pre-computing function signatures during installation and using them to robustly handle positional arguments. It was also recommended to log exceptions during argument binding to aid debugging.

gemini-code-assist · 2026-04-23T05:08:26Z

+
+def _get_num_logits_to_keep(kwargs):
+    num_logits_to_keep = kwargs.get("num_logits_to_keep", 0) or 0
+    logits_to_keep = kwargs.get("logits_to_keep", 0) or 0


The _get_num_logits_to_keep function currently only checks kwargs. If logits_to_keep or num_logits_to_keep are passed as positional arguments, they will be missed. It's better to use the model's signature to bind the arguments and extract the values robustly. Per repository rules, ensure exceptions are logged rather than silently ignored.

Suggested change

def _get_num_logits_to_keep(kwargs):

num_logits_to_keep = kwargs.get("num_logits_to_keep", 0) or 0

logits_to_keep = kwargs.get("logits_to_keep", 0) or 0

def _get_num_logits_to_keep(sig, args, kwargs):

try:

bound = sig.bind_partial(*args, **kwargs)

return max(bound.arguments.get("num_logits_to_keep", 0) or 0,

bound.arguments.get("logits_to_keep", 0) or 0)

except Exception as e:

import logging

logging.debug(f"Error binding signature: {e}")

return max(kwargs.get("num_logits_to_keep", 0) or 0,

kwargs.get("logits_to_keep", 0) or 0)

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-23T05:08:26Z

+            )
+            return outputs
+
+        hidden_states = hidden_states[-1]


Use the pre-computed signature and original arguments to correctly extract num_logits_to_keep, accounting for positional arguments.

Suggested change

hidden_states = hidden_states[-1]

num_logits_to_keep = _get_num_logits_to_keep(sig, args, kwargs)

gemini-code-assist · 2026-04-23T05:08:26Z

+            return True
+    return False
+
+


To improve performance, _drop_forward_kwargs_consumed_positionally should accept a pre-computed signature instead of calling inspect.signature() on every forward pass. This avoids redundant introspection overhead during training.

Suggested change

def _drop_forward_kwargs_consumed_positionally(sig, args, kwargs):

References

To improve efficiency, avoid redundant data iterations. Combine checks and transformations into a single loop and return computed values for callers to reuse.

gemini-code-assist · 2026-04-23T05:08:26Z

+    if len(args) == 0 or len(kwargs) == 0:
+        return kwargs
+
+    consumed_names = []


Use the passed signature instead of re-computing it.

Suggested change

consumed_names = []

for parameter in sig.parameters.values():

gemini-code-assist · 2026-04-23T05:08:26Z

+        return False
+


Pre-compute the signature once during installation to avoid the overhead of inspect.signature() on every forward pass.

Suggested change

return False

original_forward = target_model.forward

sig = inspect.signature(original_forward)

model_name = type(target_model).__name__

References

To improve efficiency, avoid redundant data iterations. Combine checks and transformations into a single loop and return computed values for callers to reuse.

gemini-code-assist · 2026-04-23T05:08:26Z

+            return original_forward(*args, **kwargs)
+
+        forward_kwargs = _drop_forward_kwargs_consumed_positionally(


Pass the pre-computed signature to the helper function.

Suggested change

return original_forward(*args, **kwargs)

forward_kwargs = _drop_forward_kwargs_consumed_positionally(

forward_kwargs = _drop_forward_kwargs_consumed_positionally(

sig, args, kwargs

)

for more information, see https://pre-commit.ci

Datta0 mentioned this pull request Apr 23, 2026

[GRPO] Try returning hidden statex for GRPO unslothai/unsloth-zoo#609

Draft

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

try hidden state return for GRPO

6be5ab1

Datta0 force-pushed the grpo-hidden-fallback branch from 520de3b to 6be5ab1 Compare April 27, 2026 06:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

f03a341

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GRPO] Try returning hidden statex for GRPO#5142

[GRPO] Try returning hidden statex for GRPO#5142
Datta0 wants to merge 2 commits intounslothai:mainfrom
Datta0:grpo-hidden-fallback

Datta0 commented Apr 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-def _get_num_logits_to_keep(kwargs):
-    num_logits_to_keep = kwargs.get("num_logits_to_keep", 0) or 0
-    logits_to_keep = kwargs.get("logits_to_keep", 0) or 0
+def _get_num_logits_to_keep(sig, args, kwargs):
+    try:
+        bound = sig.bind_partial(*args, **kwargs)
+        return max(bound.arguments.get("num_logits_to_keep", 0) or 0,
+                   bound.arguments.get("logits_to_keep", 0) or 0)
+    except Exception as e:
+        import logging
+        logging.debug(f"Error binding signature: {e}")
+        return max(kwargs.get("num_logits_to_keep", 0) or 0,
+                   kwargs.get("logits_to_keep", 0) or 0)

	hidden_states = hidden_states[-1]
	num_logits_to_keep = _get_num_logits_to_keep(sig, args, kwargs)


	def _drop_forward_kwargs_consumed_positionally(sig, args, kwargs):

	consumed_names = []
	for parameter in sig.parameters.values():

-        return False
+    original_forward = target_model.forward
+    sig = inspect.signature(original_forward)
+    model_name = type(target_model).__name__

		return original_forward(args, *kwargs)

		forward_kwargs = _drop_forward_kwargs_consumed_positionally(

Uh oh!

Conversation

Datta0 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Datta0 commented Apr 23, 2026 •

edited

Loading