AI Model Providers
Cerebras
Lightning-fast inference powered by the world's largest AI chip. Run Llama 3.1 models at over 2000 tokens per second for instant code assistance.
Overview
Cerebras Inference delivers the fastest AI inference available through their custom Wafer-Scale Engine chips. When integrated with Autohand, you get:
- Inference speeds over 2000 tokens per second
- Near-instant responses for code completion
- Access to Llama 3.1 8B and 70B models
- OpenAI-compatible API for easy integration
- Competitive pricing for high-speed inference
Speed advantage: Cerebras inference is 10-20x faster than typical cloud providers. This makes it ideal for interactive coding sessions where latency matters.
Setup
Get started with Cerebras inference.
Get your API key
- Go to cloud.cerebras.ai and create an account
- Navigate to API Keys in your dashboard
- Create a new API key and copy it
Configure Autohand
# Set environment variable
export CEREBRAS_API_KEY="csk-xxxxxxxxxxxxxxxxxxxx"
# Or configure via CLI
autohand config set cerebras.apiKey "csk-xxxxxxxxxxxxxxxxxxxx"
Verify your configuration:
# Start with Cerebras provider
autohand --provider cerebras --model llama3.1-70b
# Test with a prompt
autohand --prompt "Hello, which model are you?"
CLI configuration
Configure Cerebras in your ~/.autohand/config.json:
{
"provider": "cerebras",
"cerebras": {
"apiKey": "${CEREBRAS_API_KEY}",
"model": "llama3.1-70b",
"maxTokens": 8192,
"temperature": 0.7
}
}
Configuration options
| Option | Description | Default |
|---|---|---|
apiKey | Your Cerebras API key | - |
model | Model to use | llama3.1-70b |
maxTokens | Maximum tokens in response | 8192 |
temperature | Response randomness (0-1.5) | 0.7 |
baseUrl | API endpoint | https://api.cerebras.ai/v1 |
Available models
Cerebras offers optimized Llama models on their infrastructure.
| Model | Context | Speed | Best for |
|---|---|---|---|
llama3.1-8b | 128K | 2100+ tok/s | Fast tasks, simple queries |
llama3.1-70b | 128K | 2000+ tok/s | Complex reasoning, coding |
Switch models
# Set default model
autohand config set model "llama3.1-70b"
# Use during a session
/model cerebras/llama3.1-8b
Best practices
- Use for interactive work: Cerebras excels at interactive sessions where response time matters.
- Choose 70B for coding: The larger model handles complex code better.
- Use 8B for simple tasks: Quick questions and simple completions.
- Monitor usage: Check your Cerebras dashboard for usage and costs.
Recommended configuration
{
"cerebras": {
"model": "llama3.1-70b",
"temperature": 0.3,
"maxTokens": 4096
}
}
Resources
Troubleshooting
Common issues
| Issue | Solution |
|---|---|
| Invalid API key | Verify key at cloud.cerebras.ai |
| Rate limited | Wait or upgrade your plan |
| Connection timeout | Check your network or try again |
Debug mode
# Enable verbose logging
AUTOHAND_DEBUG=true autohand --provider cerebras