Overview

Cerebras Inference delivers the fastest AI inference available through their custom Wafer-Scale Engine chips. When integrated with Autohand, you get:

  • Inference speeds over 2000 tokens per second
  • Near-instant responses for code completion
  • Access to Llama 3.1 8B and 70B models
  • OpenAI-compatible API for easy integration
  • Competitive pricing for high-speed inference

Speed advantage: Cerebras inference is 10-20x faster than typical cloud providers. This makes it ideal for interactive coding sessions where latency matters.

Setup

Get started with Cerebras inference.

Get your API key

  1. Go to cloud.cerebras.ai and create an account
  2. Navigate to API Keys in your dashboard
  3. Create a new API key and copy it

Configure Autohand

# Set environment variable
export CEREBRAS_API_KEY="csk-xxxxxxxxxxxxxxxxxxxx"

# Or configure via CLI
autohand config set cerebras.apiKey "csk-xxxxxxxxxxxxxxxxxxxx"

Verify your configuration:

# Start with Cerebras provider
autohand --provider cerebras --model llama3.1-70b

# Test with a prompt
autohand --prompt "Hello, which model are you?"

CLI configuration

Configure Cerebras in your ~/.autohand/config.json:

{
  "provider": "cerebras",
  "cerebras": {
    "apiKey": "${CEREBRAS_API_KEY}",
    "model": "llama3.1-70b",
    "maxTokens": 8192,
    "temperature": 0.7
  }
}

Configuration options

OptionDescriptionDefault
apiKeyYour Cerebras API key-
modelModel to usellama3.1-70b
maxTokensMaximum tokens in response8192
temperatureResponse randomness (0-1.5)0.7
baseUrlAPI endpointhttps://api.cerebras.ai/v1

Available models

Cerebras offers optimized Llama models on their infrastructure.

ModelContextSpeedBest for
llama3.1-8b128K2100+ tok/sFast tasks, simple queries
llama3.1-70b128K2000+ tok/sComplex reasoning, coding

Switch models

# Set default model
autohand config set model "llama3.1-70b"

# Use during a session
/model cerebras/llama3.1-8b

Best practices

  • Use for interactive work: Cerebras excels at interactive sessions where response time matters.
  • Choose 70B for coding: The larger model handles complex code better.
  • Use 8B for simple tasks: Quick questions and simple completions.
  • Monitor usage: Check your Cerebras dashboard for usage and costs.

Recommended configuration

{
  "cerebras": {
    "model": "llama3.1-70b",
    "temperature": 0.3,
    "maxTokens": 4096
  }
}

Resources

Troubleshooting

Common issues

IssueSolution
Invalid API keyVerify key at cloud.cerebras.ai
Rate limitedWait or upgrade your plan
Connection timeoutCheck your network or try again

Debug mode

# Enable verbose logging
AUTOHAND_DEBUG=true autohand --provider cerebras