CUDA Operations
Anonymous contributor
Published Feb 6, 2025
Contribute to Docs
PyTorch’s CUDA operations are essential for efficient GPU programming, enabling fine-grained control over GPU resources and execution flow. They help optimize memory usage, manage parallel execution streams, and coordinate CPU-GPU operations.
Syntax
Memory Management
torch.cuda.memory_allocated()
- Returns the number of bytes currently allocated on the GPU
- Essential for monitoring GPU memory usage
torch.cuda.empty_cache()
- Releases unused cached memory
- Helps prevent out-of-memory errors
Stream Management
torch.cuda.Stream()
- Creates new CUDA stream for parallel execution
- Parameters: device (optional), priority (optional)
Device Control
torch.cuda.device_count()
- Returns number of available GPUs
- Used for multi-GPU setups
torch.cuda.current_device()
- Returns current device index
- Useful for device-specific operations
Synchronization
torch.cuda.synchronize()
- Waits for all GPU operations to complete
- Essential for timing and debugging
Example
The following example demonstrates essential CUDA operations for GPU memory monitoring, stream creation, device management, and synchronization:
import torch# Memory monitoringprint(f"Initial memory: {torch.cuda.memory_allocated()}")# Create tensor on GPUx = torch.randn(1000, 1000, device='cuda')print(f"After allocation: {torch.cuda.memory_allocated()}")# Stream managementstream = torch.cuda.Stream()with torch.cuda.stream(stream):y = torch.matmul(x, x)# Device informationprint(f"GPU count: {torch.cuda.device_count()}")print(f"Current device: {torch.cuda.current_device()}")# Synchronization and cleanuptorch.cuda.synchronize()torch.cuda.empty_cache()
The output of the above code will be:
Initial memory: 0After allocation: 4000256GPU count: 1Current device: 0
All contributors
- Anonymous contributor
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.