Nano Banana Pro
Agent skill for nano-banana-pro
> For this project, this is the recommended list of crates.
Sign in to like and favorite skills
We're starting a new Vulkan 1.4 project, which will eventually render sprites, but first lets get the classic triangle up on the screen.
For this project, this is the recommended list of crates.
- vulkanalia provides the Vulkan API bindings
- vulkanalia_vma for efficient buffer and image memory management (designed specifically for vulkanalia)
- winit handles window creation and surface management
- glam for MVP matrices (if needed) and sprite transformations with SIMD performance
- image to load sprite textures from files
- shaderc for compiling GLSL shaders to SPIR-V at runtime
- anyhow for simplified error handling throughout your Vulkan code
- bytemuck for safely converting your vertex structs to bytes for GPU upload
ā What's Working:
šØ Features Implemented:
Major Issues Encountered
Rust Edition Compatibility (2024 vs 2021)
edition = "2024" which caused massive unsafe block warningsunsafe blocks even within unsafe fnedition = "2021" for smoother developmentwinit Version Incompatibility
winit = "0.30" failed due to API changesApplicationHandler trait and ActiveEventLoop don't exist in 0.28WindowEvent::RedrawRequested doesn't exist in 0.28winit = "0.28" and rewrote event handlingvulkanalia Feature Configuration
"window" and "libloading"LibloadingLoader wasn't available without the libloading featurevulkanalia = { version = "0.21", features = ["window", "libloading"] }vulkanalia-vma Version Mismatch
vulkanalia-vma = "0.21" to match vulkanalia versionvulkanalia-vma = "0.2" (though we didn't end up using it)Raw Window Handle Version Conflicts
raw-window-handle crate in dependency treeDevice Initialization Problem
Device::null() which doesn't existstd::mem::zeroed() which caused undefined behavior warningsVulkan API Version Confusion
api_version(1, 4, 0) but validation layers complained about GPU-AV requiring 1.1+api_version(1, 0, 0) for broader compatibilityValidation Layer Integration
Unsafe Block Management
Clippy Warnings
Dynamic State Configuration
VK_DYNAMIC_STATE_VIEWPORT and VK_DYNAMIC_STATE_SCISSORSemaphore Synchronization
Performance Optimizations
ONE_TIME_SUBMIT[0.0, 0.0, 0.0, 0.0] clear and proper flagsUnchanged vulkanalia v0.21.0 (available: v0.28.0) Unchanged winit v0.28.7 (available: v0.30.11)
Please upgrade vulkanalia in this project to latest. This may require a newer version of winit with a compatible version of raw-window-handle, which may also require fixes to breaking changes. I'm not concerned if we're on the latest version of winit -- just whatever is compatible with the latest vulkanalia.
Feel free to upgrade other crates as well, but perhaps not all at once.
ā Successfully upgraded all compatible dependencies:
This is a Vulkan triangle rendering app using Vulkanalia which I'd like to modernize. Vulkan 1.4 should be available, which I'd like to require. To start, let's also make sure we're using:
- VkViewport, vkCmdSetViewport
- VkRect2D, vkCmdSetScissor
ā Vulkan 1.4 Support
vk::make_version(1, 4, 0) in the instance creationā Fixed Validation Layer Errors The major issue was with semaphore synchronization. I implemented a proper solution:
image_available_semaphores and render_finished_semaphores for each swapchain image (typically 3) instead of per frame in flight (2)This is a Vulkan 1.4 app using Vulkanalia. I think the next extension to adopt as part of our modernization effort is:
- VK_KHR_synchronization2
Changes Made:
vk::KHR_SYNCHRONIZATION2_EXTENSION.name to the device extensions listVkPhysicalDeviceSynchronization2Features with synchronization2 set to truevk::SubmitInfo with vk::SubmitInfo2 using:
vk::SemaphoreSubmitInfo for more granular semaphore controlvk::CommandBufferSubmitInfo for command buffer submissionvk::PipelineStageFlags2 for enhanced pipeline stage specificationqueue_submit2() instead of queue_submit()VkPipelineStageFlags2When a KHR extension becomes core:
The next extension to adopt as part of our modernization effort is: VK_KHR_dynamic_rendering
Major Changes:
VkPhysicalDeviceDynamicRenderingFeatures with dynamicRendering enabledrender_pass field from AppDataframebuffers field from AppDatacreate_render_pass() functioncreate_framebuffers() functionVkPipelineRenderingCreateInfo instead of render passcmd_begin_render_pass() ā cmd_begin_rendering()cmd_end_render_pass() ā cmd_end_rendering()VkRenderingInfo with VkRenderingAttachmentInfoUNDEFINED ā COLOR_ATTACHMENT_OPTIMAL before renderingCOLOR_ATTACHMENT_OPTIMAL ā PRESENT_SRC_KHR after renderingDo we need feature enablement for VK_KHR_maintenance6?
Maintenance Extension Status in Vulkan 1.4:
| Extension | Core Status | Feature Structure | Requires Enablement |
|---|---|---|---|
| maintenance4 | Core in 1.3+ | | YES |
| maintenance5 | Core in 1.4 | | YES |
| maintenance6 | Core in 1.4 | None | **NO |
ā maintenance4 Features (now enabled):
LocalSizeId shader supportLet's proceed to replace the triangle with a textured quad using ./ferris.png loaded via the image crate. Use modern Vulkan 1.4 features such as push constants and push descriptors where appropriate.
Textured Quad Rendering:
cmd_push_descriptor_set() instead of traditional descriptor setsThe textured quad we have right now needs to become a Sprite. We want to set a logical resolution, such as 640x360 and have pixel-perfect scaling (clamp) to the window size. Use letterboxing/pillarboxing to maintain the aspect ratio. Add the glam crate so we can use the types it defines.
The sprite system is now:
Lets fill in a background color in the logical area so that the letterbox/pillarbox is more obvious. Lets define the window size we start with in constants as well.
INITIAL_WINDOW_WIDTH: 1920 pixelsINITIAL_WINDOW_HEIGHT: 1080 pixels[0.3, 0.5, 0.7, 1.0]) for the entire screenI made some changes to the window size, logical size, and initial sprite position. Please leave those alone for now. We need to enable a blend mode. The PNG has an alpha channel and should render with a transparent background, but it is currently rendering with a black background.
šÆ Issue Identified:
blend_enable(false)ONE and ZERO (no blending)blend_enable(true)SRC_ALPHA - uses the sprite's alpha channelONE_MINUS_SRC_ALPHA - uses the inverse of sprite's alphaADD - standard alpha blending formulaNice. Next we want to render a variable number of Sprites based on a command line argument, e.g.
.cargo run 100
šÆ New Features:
Command Line Argument Support
cargo run - renders 1 sprite (default)cargo run 10 - renders 10 spritescargo run 100 - renders 100 spritesDynamic Buffer Management
Now for the fun part. We want the sprites to start at random positions and bounce around, while handling collisions with the screen edges. This Go code is an example of the movement logic I'd like to implement: https://raw.githubusercontent.com/sedyh/ebitengine-bunny-mark/refs/heads/master/bench/bunny.go
š® Physics Constants (easily tweakable):
GRAVITY - Downward acceleration strengthBOUNCE_DAMPING - Energy loss on ground impactI think they jump higher in the Go version. It's probably related to the size of our sprite and the resolution I chose, but I'd like to keep those the same. Should we making the damping less?
šÆ Physics Improvements:
I like it! The next thing is some basic metrics. We don't have font rendering like the Go version, so logging to the console will have to suffice. https://raw.githubusercontent.com/sedyh/ebitengine-bunny-mark/refs/heads/master/bench/metrics.go Also see this DX11 version that is similar to what we're building, but with different movement. https://raw.githubusercontent.com/nathany/ferris-mark-dx/refs/heads/master/src/main.rs
šÆ Metrics Features Implemented:
Real-time Performance Tracking
System Information
š§ Technical Implementation:
InstantMy window scaling in Windows 11 is set to 150% on a 4K monitor. That would explain why the initial window is larger than 1/4 of my 4K screen, and why the resolution logged is like this:
=== Ferris Mark VK - Vulkan Sprite Benchmark === Rendering 1000 sprites Logical resolution: 1920x1080 Initial window size: 1920x1080 Physics: gravity=0.5, bounce_damping=0.9 Performance metrics will be logged every second...
GPU: AMD Radeon RX 9070 XT Vulkan API: 1.4.308 FPS: 133.4 | Frame time: 5.66ms | Sprites: 1000 | Sprites/sec: 133377 | Resolution: 2880x1620
Could we make it so the initial window size is 1920x1080 like requested, even if Windows 11 scaling is enabled?
DPI Scaling Fix Summary
I've successfully resolved the Windows DPI scaling issue! Here's what was changed:
LogicalSize gets multiplied by the DPI scale factorDebug build:
FPS: 165.0 | Frame time: 6.00ms | Sprites: 1000 | Sprites/sec: 164991 | Resolution: 1920x1080 FPS: 146.4 | Frame time: 6.76ms | Sprites: 1000 | Sprites/sec: 146408 | Resolution: 1920x1080 FPS: 165.0 | Frame time: 6.00ms | Sprites: 1000 | Sprites/sec: 165009 | Resolution: 1920x1080
Now we have our sprite rendering working, so it's time to optimize! I found an article about rendering many sprites in Vulkan. What do you think of it? Is it all applicable to a modern Vulkan 1.4 implementation or is there some improvements that could be made? Also, we're only rendering one sprite texture right now and no text, so we could keep things a little simpler. Validation layers for performance and best practices are also now enabled. Let's make a plan to make this run faster!
Optimization Plan
Here's a step-by-step plan to dramatically improve performance:
Goal: Remove the expensive vertex buffer mapping/updating
gl_VertexIndex techniqueGoal: GPU-driven sprite rendering
Goal: Squeeze out maximum performance
Goal: Prepare for more complex scenarios
Please proceed with phase 1. I do some research on the subsequent phases before we tackle those.
Issues Encountered & Solutions:
VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT flag[[f32; 4]; 4] instead of Mat4 for POD typesPlease correct all clippy warnings and have a look at the best practices mentioned in the validation layers (which I just re-enabled).
I'm not sure how far I want to go down the GPU-driven command, because I'm concerned about how difficult it may be to debug. Maybe we can take another look at the plan.
Validation Layer Best Practices Issues:
Current State - Phase 1 Complete ā **
Goal: Address validation layer warnings while keeping debugging simple
Goal: Optimize the CPU side while keeping GPU side simple
Goal: Better batching without losing debuggability
Instead of full GPU-driven rendering, consider:
- I'm not seeing a significant difference with batching. I'd like to disable vsync by default and have a --vsync command line flag to enable it.
- I'd like our optimizations to focus on consistent frame pacing. Which optimizations could benefit that most?
- Outside of optimization, I have no current plans to extend this particular application with multiple sprite textures, etc. We're using Vulkan 1.4, but my goal is to take advantage of the new features that make Vulkan easier to use. So let's keep it simple.
FPS: 1816.2 | Frame time: 0.51ms | Sprites: 1000 | Sprites/sec: 1816220 | Resolution: 1920x1080
The key optimizations for consistent frame pacing are:
Issue: Currently making small allocations each frame Solution: Pre-allocate or use memory pools
Issue: Currently using
map_memory every frame
Solution: Use persistent mapped memory or double-buffered approach
Issue: Resetting command buffers each frame Solution: Pre-record static parts
ā What We've Successfully Implemented:
VSync Control:
--vsync flag (defaults to disabled)
Modern Vulkan 1.4 Features (Simple & Effective):
Frame Pacing Optimizations:
Let's apply the glam improvement for SIMD.
FPS: 82.9 | Frame time: 11.99ms | Sprites: 50000 | Sprites/sec: 4146880 | Resolution: 1920x1080
š SIMD Optimization Results Summary
You mentioned some sorting and culling improvements? How much impact could that have? Would it cull sprites that are fully occluded by other sprites, taking into account alpha transparency? Or is it more just a viewport thing (we don't currently have sprites outside of the viewport)?
š¤ Your Specific Constraints:**
ā What Actually Helped:**
Alright. That was a good optimization session! Please summarize what we did, what we chose not to do (and why), and what we might yet do. Thanks.
šÆ Optimization Session Summary
gl_VertexIndexvkCmdDraw(6, sprite_count, 0, 0) call instead of multiple draw calls--vsync flag (defaults to disabled for benchmarking)map_memory/unmap_memory calls every framef32 components with glam::Vec2 for vectorized operationsš® What We Might Yet Do**
Vulkan Memory Allocator (VMA)
Frame Rate Limiting
--target-fps N flag for testing consistencySIMD Physics Further Optimization
Excellent. Thanks! I'm seeing a lot of different shader files. Do we still need them all? Beyond the validation layer differences in config.toml, do we have other differences between release/debug? Do we need to?
ā Build Configuration Optimized**
cargo run)cargo run --release)strip = trueabort for smaller binariesI think that the VALIDATION_ENABLED logging may be a little inaccurate? When it says "Validation layers: Disabled (Release)", are we sure the validation layers are disabled? Or are they still enabled because I have them enabled in vkconfig?
ā Validation Layer Reality Check**
Please add constants for sprite_width and sprite_height.
Please use the fastrand crate instead of generating our own pseudo-random values.
I wonder if we should entirely remove the VALIDATION_ENABLED and VALIDATION_LAYER stuff from our code and config.toml, and let that be completely controlled by vkconfig? I think we can also remove enabled_layer_names(&layers) in a few places if it's always going to be empty?
Is there any code that we're no longer using? Or any other small cleanups we could do?
cargo clippy -- -W clippy::all -W clippy::pedantic -W clippy::nursery
Code Cleanup Changes Made:
Simplified
function:log_gpu_info
Result<()> return type since it never failsOk(()) return and ? callRemoved unused variables:
viewport and scissor variables in create_pipelineFixed underscore-prefixed variable:
_ prefix from instance parameter in create_command_pool since it's actually usedImproved type conversions:
u64::from() instead of as u64 cast for lossless conversionThis may be a tricky one that will take some thought. Please make a list of functions that have unsafe blocks spanning the entire function and also determine which calls (e.g. to vulkanalia) need to be wrapped in unsafe blocks. From that, make a plan for which unsafe blocks could be narrowed or split such that more lines of code are considered safe.
Let's proceed with phase 1. Keep readability in mind. It may be better to have a few lines in an unsafe block that don't need to be if the code is more concise. But any large blocks of safe code should be outside of unsafe blocks if possible. Given the magnitude of these changes, it may be a good idea to tackle 1 function first and ensure everything compiles and works before moving on.
Well done. Proceed through each of the other 4 candidates you identified, but test the build after completing each function before continuing onto the next.
Feel free to remove the // SAFE: and // UNSAFE: comments that you added during this process.
Nice work. Are there any other functions that would benefit from a refactor of the unsafe blocks? Go for it with log_gpu_info(). Please also add your SAFE/UNSAFE comments to App::create and create_instance. Then we can both think about those two a little more.
Please proceed. It may be worth considering whether some of the safe lines can be moved. For example, does AppData need to come after loader/entry for it to work, or could it come before the unsafe block entirely? It that sort of consolidation is possible, we could both have fewer unsafe lines without increasing the number of unsafe blocks (making it less readable).
Total Impact Across All Refactoring:
Key Patterns Established:
Please go through all the code and add concise comments suitable for someone still learning Vulkan. If you identify any areas where the code itself can be made more readable and concise, please let me know.
2. Builder Pattern Consistency: Some areas could use more consistent builder patterns for better readability, particularly in swapchain creation where we mix direct assignment with builders.
3. Error Handling Simplification: The
.map_err(|e| anyhow!("{}", e)) pattern could be simplified with custom error types or wrapper functions.
4. Magic Numbers: A few remaining magic numbers could become named constants:
4 (bytes per pixel) ā RGBA_BYTES_PER_PIXEL60.0 (physics scaling) ā TARGET_FPS5. Resource Management: The cleanup code in
destroy() could potentially be reorganized for clearer resource lifetime management.
6. Swapchain Recreation: The swapchain recreation logic is scattered and could be consolidated for better maintainability.
We are currently working on some light refactoring to make this code more readable and maintainable.
- The
pattern could be simplified with custom error types or wrapper functions..map_err(|e| anyhow!("{}", e))- A few remaining magic numbers could become named constants:
(bytes per pixel) ā4RGBA_BYTES_PER_PIXEL (physics scaling) ā60.0TARGET_FPS
...
Let's take a look at swapchain creation and recreation next.
- Some areas could use more consistent builder patterns for better readability, particularly in swapchain creation where we mix direct assignment with builders.
- The swapchain recreation logic is scattered and could be consolidated for better maintainability.
Should we fix this validation warning from NVIDIA?
Validation Performance Warning: [ BestPractices-NVIDIA-ClearColor-NotCompressed ] | MessageID = 0x916108d1 vkCmdBeginRendering(): pRenderingInfo->pColorAttachments[0] [NVIDIA] Clearing image with format VK_FORMAT_B8G8R8A8_SRGB without a 1.0f or 0.0f clear color. The clear will not get compressed in the GPU, harming performance. This can be fixed using a clear color of VkClearColorValue{0.0f, 0.0f, 0.0f, 0.0f}, or VkClearColorValue{1.0f, 1.0f, 1.0f, 1.0f}. Alternatively, use VK_FORMAT_R8G8B8A8_UNORM, VK_FORMAT_B8G8R8A8_UNORM, VK_FORMAT_A8B8G8R8_UNORM_PACK32, VK_FORMAT_A2R10G10B10_UNORM_PACK32, VK_FORMAT_A2B10G10R10_UNORM_PACK32, VK_FORMAT_R16G16B16A16_UNORM, VK_FORMAT_R16G16B16A16_SNORM, VK_FORMAT_R16G16B16A16_UINT, VK_FORMAT_R16G16B16A16_SINT, VK_FORMAT_R16G16B16A16_SFLOAT, VK_FORMAT_R32G32B32A32_SFLOAT, or VK_FORMAT_B10G11R11_UFLOAT_PACK32. Objects: 1 [0] VkCommandBuffer 0x21915374020
Issue Fixed:** ā NVIDIA validation warning about clear color compression
Change Made:
[0.0, 0.0, 0.0, 1.0] to [0.0, 0.0, 0.0, 0.0] in the record_command_buffer functionVK_FORMAT_B8G8R8A8_SRGB swapchain formatWe can run our benchmark with a set number of sprites and see the FPS and other metrics every second. But I'd like a benchmark script (for Git Bash on Windows for a maybe build.rs).
- Run for a set number of frames, and take the average, but excluding the first 100 frames to allow for it to "warmup". We don't > currently have a command line option to terminate after a set number of frames (e.g. 10000 frames)
- Then increase the number of sprites and run it all again. The script that I'm basing this idea on is here: > https://raw.githubusercontent.com/unitoftime/experiments/refs/heads/master/gophermark/batch/test.sh
- From that data we should be able to generate a nice report using the averages from each of the runs.
--frames N to exit after N framesbenchmark_results.txtBENCHMARK_RESULT: X sprites, Y FPS, Z sprites/secA new version of Vulkanalia just came out, so I was able to switch the prelude from v1_3 to v1_4 to match with us using Vulkan 1.4. Are there any cleanups we can do based on this new prelude?
"Many modern applications use a hybrid approach - push descriptors for dynamic per-draw data (like transform matrices, material parameters) and traditional descriptor sets for longer-lived resources (like texture atlases, global uniform buffers)." Is that what you're doing here?
Streamlined Feature Enabling**:
sampler_anisotropy, dynamic_rendering, synchronization2)
- We actually don't need sampler_anisotropy for 2D sprites, since they are rendered parallel to the screen.
- We should decide what our minimum Vulkan version is based not only on VK_KHR_dynamic_rendering and VK_KHR_synchronization2, but everything else we're using.
- If we do want a comprehensive fall back, this is the breakdown for those extensions.
Vulkan 1.3:
VK_KHR_dynamic_rendering: Core (just enable the feature) VK_KHR_synchronization2: Extension (add to extensions list + enable feature)
Vulkan 1.4:
VK_KHR_dynamic_rendering: Core (just enable the feature) VK_KHR_synchronization2: Core (just enable the feature)
Version compatibility: VK_KHR_dynamic_rendering:
Available as extension from Vulkan 1.1+ (requires VK_KHR_get_physical_device_properties2) Core in 1.3+
VK_KHR_synchronization2:
Available as extension from Vulkan 1.0+ Core in 1.4+
ā Successfully Completed Changes
Proper Vulkan Version Requirements:
VK_KHR_dynamic_rendering extension support)VK_KHR_dynamic_rendering becomes core)VK_KHR_synchronization2 also becomes core)Smart Extension Management:
dynamic_rendering, adds VK_KHR_synchronization2 extensionVK_KHR_dynamic_rendering and VK_KHR_synchronization2 as extensionsPlease take a look at this extension and proposal.
https://registry.khronos.org/vulkan/specs/latest/man/html/VK_KHR_unified_image_layouts.html https://github.com/KhronosGroup/Vulkan-Docs/tree/main/proposals/VK_KHR_unified_image_layouts.adoc
Does Vulkanalia expose the new fields for VK_KHR_unified_image_layouts? How much would it improve our code and how would it impact our minimum requirements?
Final Assessment: VK_KHR_unified_image_layouts
CONFIRMED: Available in Vulkanalia 0.29.0!
vk::PhysicalDeviceUnifiedImageLayoutsFeaturesKHR and related enums compile successfullyCurrent Driver Support (as of late 2024/early 2025):
Current Image Layout Usage: Your code uses several image layout transitions:
UNDEFINED ā COLOR_ATTACHMENT_OPTIMAL ā PRESENT_SRC_KHRUNDEFINED ā TRANSFER_DST_OPTIMAL ā SHADER_READ_ONLY_OPTIMALSHADER_READ_ONLY_OPTIMAL** Reduced Code Complexity**:
transition_image_layout function for non-initialization casesGENERAL)"Could eliminate the entire
function for non-initialization cases" Since we would still need it for initialization cases, how much of a code reduction would that be?transition_image_layout
Lines that could be eliminated:**
TRANSFER_DST_OPTIMAL ā SHADER_READ_ONLY_OPTIMALtransition_image_layout (7 lines): The second caseCan we make it so the sprites don't spawn until I press the spacebar? And make that controller by a const boolean that I can toggle if I want them to spawn immediately instead.
SPAWN_SPRITES_IMMEDIATELY = true)SPAWN_SPRITES_IMMEDIATELY = false)We now have the vulkanalia-vma crate available for use. Please replace our manual memory allocation with VMA. Documentation is available at: https://docs.rs/vulkanalia-vma/latest/vulkanalia_vma/ This is a challenging upgrade, so we should make a plan first. And determine if we can break it down into smaller steps.
VMA Migration Complete! ā
Awesome! Please run the app to see the best practices suggestions from AMD and NVIDIA, several around memory allocation. Now that we are using VMA, would some of these changes be easier? What would the magnitude of the code change be? Would it be more difficult to understand afterwards? What might the benefits be?
Our application is already doing several things well with VMA:
MAPPED flag)AutoPreferDevice for GPU-only, Auto for CPU-to-GPU)The remaining warnings are unrelated to memory allocation:
Please review our extensions and features against what's available in core Vulkan versions to make sure everything is correct.
Please go through all the code and add concise comments suitable for someone still learning Vulkan. Since the order deallocation is important for proper cleanup, we should comment on that too (on struct fields, etc) as appropriate.
I think we should give it a try. We can do this work on a branch that we can test out. I am running Windows with a non-beta AMD driver, and I can also test on another Windows computer with an NVIDIA driver.
"Nearly all GPU vendors are ready to support this extension on current-generation hardware" "In practice these image layouts typically map to just a handful of actual physical layouts or compression formats."
From what I've heard, some people have already tried simplifying to use VK_IMAGE_LAYOUT_GENERAL even without the extension.
// Add RenderDoc API integration #[cfg(debug_assertions)] use renderdoc::RenderDoc; #[cfg(debug_assertions)] fn capture_frame(&self) { if let Ok(mut rd) = RenderDoc::new() { rd.start_frame_capture(std::ptr::null(), std::ptr::null()); // Your rendering code here rd.end_frame_capture(std::ptr::null(), std::ptr::null()); } }