Cerebras Integration - Autohand Docs

Overview

Cerebras Inference delivers the fastest AI inference available through their custom Wafer-Scale Engine chips. When integrated with Autohand, you get:

Inference speeds over 2000 tokens per second
Near-instant responses for code completion
Access to Llama 3.1 8B and 70B models
OpenAI-compatible API for easy integration
Competitive pricing for high-speed inference

Speed advantage: Cerebras inference is 10-20x faster than typical cloud providers. This makes it ideal for interactive coding sessions where latency matters.

Setup

Get started with Cerebras inference.

Get your API key

Go to cloud.cerebras.ai and create an account
Navigate to API Keys in your dashboard
Create a new API key and copy it

Configure Autohand

# Set environment variable
export CEREBRAS_API_KEY="csk-xxxxxxxxxxxxxxxxxxxx"

# Or configure via CLI
autohand config set cerebras.apiKey "csk-xxxxxxxxxxxxxxxxxxxx"

Verify your configuration:

# Start with Cerebras provider
autohand --provider cerebras --model llama3.1-70b

# Test with a prompt
autohand --prompt "Hello, which model are you?"

CLI configuration

Configure Cerebras in your ~/.autohand/config.json:

{
  "provider": "cerebras",
  "cerebras": {
    "apiKey": "${CEREBRAS_API_KEY}",
    "model": "llama3.1-70b",
    "maxTokens": 8192,
    "temperature": 0.7
  }
}

Configuration options

Option	Description	Default
`apiKey`	Your Cerebras API key	-
`model`	Model to use	`llama3.1-70b`
`maxTokens`	Maximum tokens in response	`8192`
`temperature`	Response randomness (0-1.5)	`0.7`
`baseUrl`	API endpoint	`https://api.cerebras.ai/v1`

Available models

Cerebras offers optimized Llama models on their infrastructure.

Model	Context	Speed	Best for
`llama3.1-8b`	128K	2100+ tok/s	Fast tasks, simple queries
`llama3.1-70b`	128K	2000+ tok/s	Complex reasoning, coding

Switch models

# Set default model
autohand config set model "llama3.1-70b"

# Use during a session
/model cerebras/llama3.1-8b

Best practices

Use for interactive work: Cerebras excels at interactive sessions where response time matters.
Choose 70B for coding: The larger model handles complex code better.
Use 8B for simple tasks: Quick questions and simple completions.
Monitor usage: Check your Cerebras dashboard for usage and costs.

Recommended configuration

{
  "cerebras": {
    "model": "llama3.1-70b",
    "temperature": 0.3,
    "maxTokens": 4096
  }
}

Resources

Troubleshooting

Common issues

Issue	Solution
Invalid API key	Verify key at cloud.cerebras.ai
Rate limited	Wait or upgrade your plan
Connection timeout	Check your network or try again

Debug mode

# Enable verbose logging
AUTOHAND_DEBUG=true autohand --provider cerebras