Coding
PromptBeginner5 minmarkdown
Nano Banana Pro
Agent skill for nano-banana-pro
6
1) For the environment interaction that generates episodes, keep that code unchanged. Assume this produces episodes labeled ep_0, ep_1, ep_3, ... each with different lengths.
Sign in to like and favorite skills
Translate to English:
is_first sequence like True, False, False, ..., True for each rollout.is_first flags. Each chunk is what we feed to the GPU.is_first[b, t] == True, reset the model state h for that sequence at time t. In all other cases, carry h forward to the next timestep, and when a batch ends, retain h because the next batch continues the same sequence in time.is_first == True at different timesteps. To keep computations vectorized, we may need some tricks here. If you are unsure how to vectorize this, let me know.