---
title: "Cerebras Integration"
source: https://docs.autohand.ai/integrations/cerebras
---

# Cerebras

Lightning-fast inference powered by the world's largest AI chip. Run Llama 3.1 models at over 2000 tokens per second for instant code assistance.

## Overview

Cerebras Inference delivers the fastest AI inference available through their custom Wafer-Scale Engine chips. When integrated with Autohand, you get:

-   Inference speeds over 2000 tokens per second
-   Near-instant responses for code completion
-   Access to Llama 3.1 8B and 70B models
-   OpenAI-compatible API for easy integration
-   Competitive pricing for high-speed inference

**Speed advantage:** Cerebras inference is 10-20x faster than typical cloud providers. This makes it ideal for interactive coding sessions where latency matters.

## Setup

Get started with Cerebras inference.

### Get your API key

1.  Go to [cloud.cerebras.ai](https://cloud.cerebras.ai) and create an account
2.  Navigate to **API Keys** in your dashboard
3.  Create a new API key and copy it

### Configure Autohand

``` bash
# Set environment variable
export CEREBRAS_API_KEY="csk-xxxxxxxxxxxxxxxxxxxx"

# Or configure via CLI
autohand config set cerebras.apiKey "csk-xxxxxxxxxxxxxxxxxxxx"
```

Verify your configuration:

``` bash
# Start with Cerebras provider
autohand --provider cerebras --model llama3.1-70b

# Test with a prompt
autohand --prompt "Hello, which model are you?"
```

## CLI configuration

Configure Cerebras in your `~/.autohand/config.json`:

``` json
{
  "provider": "cerebras",
  "cerebras": {
    "apiKey": "${CEREBRAS_API_KEY}",
    "model": "llama3.1-70b",
    "maxTokens": 8192,
    "temperature": 0.7
  }
}
```

### Configuration options

| Option | Description | Default |
|---|---|---|
| apiKey | Your Cerebras API key | - |
| model | Model to use | llama3.1-70b |
| maxTokens | Maximum tokens in response | 8192 |
| temperature | Response randomness (0-1.5) | 0.7 |
| baseUrl | API endpoint | https://api.cerebras.ai/v1 |

## Available models

Cerebras offers optimized Llama models on their infrastructure.

| Model | Context | Speed | Best for |
|---|---|---|---|
| llama3.1-8b | 128K | 2100+ tok/s | Fast tasks, simple queries |
| llama3.1-70b | 128K | 2000+ tok/s | Complex reasoning, coding |

### Switch models

``` bash
# Set default model
autohand config set model "llama3.1-70b"

# Use during a session
/model cerebras/llama3.1-8b
```

## Best practices

-   **Use for interactive work**: Cerebras excels at interactive sessions where response time matters.
-   **Choose 70B for coding**: The larger model handles complex code better.
-   **Use 8B for simple tasks**: Quick questions and simple completions.
-   **Monitor usage**: Check your Cerebras dashboard for usage and costs.

### Recommended configuration

``` json
{
  "cerebras": {
    "model": "llama3.1-70b",
    "temperature": 0.3,
    "maxTokens": 4096
  }
}
```

## Resources

-   [Cerebras Website](https://cerebras.ai)
-   [API Documentation](https://inference-docs.cerebras.ai)
-   [Cloud Console](https://cloud.cerebras.ai)
-   [API Reference](https://inference-docs.cerebras.ai/api-reference)
-   [Inference Pricing](https://cerebras.ai/inference)

## Troubleshooting

### Common issues

| Issue | Solution |
|---|---|
| Invalid API key | Verify key at cloud.cerebras.ai |
| Rate limited | Wait or upgrade your plan |
| Connection timeout | Check your network or try again |

### Debug mode

``` bash
# Enable verbose logging
AUTOHAND_DEBUG=true autohand --provider cerebras
```