> For the complete documentation index, see [llms.txt](https://docs.datacake.de/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.datacake.de/cake-red/datacake-nodes/datacake-ai-nodes.md).

# Datacake AI Nodes

### Overview

The Datacake AI nodes enable you to analyze IoT sensor data, generate reports, and extract insights using advanced AI models from OpenAI. Perfect for:

* 📊 Automated data analysis and anomaly detection
* 📝 Report generation (Markdown, HTML, PDF-ready)
* 💡 Predictive maintenance insights
* 📈 Time series analysis and trend detection
* 🔍 Root cause analysis
* 💻 Python code execution for calculations and visualizations

### Configuration

#### Datacake AI Config

Configuration node for storing OpenAI API credentials securely.

**Settings**

* **Name** - Optional descriptive name for this configuration
* **OpenAI API Key** - Your OpenAI API key (starts with `sk-...`)

**How to Get Your API Key**

1. Visit [OpenAI Platform](https://platform.openai.com/api-keys)
2. Sign in or create an account
3. Click "Create new secret key"
4. Copy the key (starts with `sk-...`)
5. Store it securely in the Datacake AI Config node

⚠️ **Important:** Keep your API key secure. Never share it or commit it to version control.

***

### Datacake AI Node

Execute AI-powered analysis and report generation using OpenAI's models.

#### Configuration

**Model Selection**

Choose the AI model based on your needs:

| Model          | Speed   | Capability | Cost (per 1M tokens)  | Best For                           |
| -------------- | ------- | ---------- | --------------------- | ---------------------------------- |
| **gpt-5**      | Slower  | Highest    | $1.25 in / $10.00 out | Complex analysis, research         |
| **gpt-5-mini** | Medium  | High       | $0.25 in / $2.00 out  | **Recommended for most use cases** |
| **gpt-5-nano** | Fastest | Good       | $0.05 in / $0.40 out  | Simple tasks, quick responses      |

💡 The UI shows real-time cost estimates based on your max token setting.

**Prompt Configuration**

**Prompt Source:**

* **From msg.prompt** - Dynamic prompts from incoming messages (flexible)
* **From Configuration** - Static prompt configured in the node (reusable)

**Include msg.payload:**

* When enabled, automatically includes incoming data in the prompt
* Perfect for CSV, JSON, or text data analysis
* Data type is automatically detected and formatted

**Tools**

**Code Interpreter:**

* ✅ Enable for any data analysis tasks
* Executes Python code for calculations, statistics, and visualizations
* Can read CSV/JSON data, perform complex calculations, generate charts
* Essential for accurate numerical analysis

**Web Search:**

* 🌐 Enable for real-time information retrieval
* Access up-to-date information from the web
* Useful for market research, trend analysis, fact-checking
* Configure search context size (low/medium/high)

**Advanced Options**

**Max Output Tokens:**

* Controls maximum response length
* Higher values = longer responses but higher cost
* Default: 16,000 tokens (\~12,000 words)
* Typical reports use 2,000-8,000 tokens

#### Input Properties

| Property                 | Type   | Required | Description                                                              |
| ------------------------ | ------ | -------- | ------------------------------------------------------------------------ |
| `msg.prompt`             | string | No\*     | The AI prompt/instruction (\*required when using "From msg.prompt" mode) |
| `msg.payload`            | any    | No       | Data to analyze (CSV, JSON, text) when "Include msg.payload" is enabled  |
| `msg.previousResponseId` | string | No       | Previous response ID for conversational context                          |
| `msg.model`              | string | No       | Override the configured model (e.g., "gpt-5-mini")                       |

#### Output Properties

| Property         | Type   | Description                                          |
| ---------------- | ------ | ---------------------------------------------------- |
| `msg.payload`    | string | Clean AI response text (user-friendly, ready to use) |
| `msg.responseId` | string | Response ID for follow-up questions with context     |
| `msg.openai`     | object | Detailed metadata about the request                  |

**Output Metadata (`msg.openai`)**

```javascript
{
  model: "gpt-5-mini",                    // Model used
  promptSource: "msg",                     // Where prompt came from
  toolsUsed: ["code_interpreter"],         // Tools enabled
  hadContext: false,                       // Whether context was used
  responseId: "resp_abc123...",            // Response ID
  usage: {                                 // Token usage
    input_tokens: 1234,
    cached_tokens: 0,
    output_tokens: 567,
    total_tokens: 1801
  },
  cost: {                                  // Cost breakdown in USD
    inputTokens: 1234,
    cachedTokens: 0,
    outputTokens: 567,
    totalTokens: 1801,
    inputCost: 0.000308,                   // $0.000308
    cachedCost: 0.000000,
    outputCost: 0.001134,                  // $0.001134
    totalCost: 0.001442,                   // $0.001442
    currency: "USD"
  },
  fullResponse: { /* Complete raw API response */ }
}
```

#### Status Display

The node shows real-time status in the editor:

* **🟢 Success ($0.00142)** - Request completed with cost
* **🔵 Processing...** - Request in progress
* **🔴 Error** - Request failed with error message

***

### Use Cases & Examples

#### 1. IoT Sensor Data Analysis

Analyze CSV sensor data and identify anomalies, trends, and patterns.

**Flow:**

```
[File Read: sensor-data.csv]
  ↓
[Datacake AI]
  Model: gpt-5-mini
  Code Interpreter: ✅ Enabled
  Include msg.payload: ✅ Yes
  Prompt: "Analyze this IoT sensor data and create a report including:
          1. Data overview and statistics
          2. Temperature and humidity trends
          3. Anomaly detection
          4. Correlations between sensors
          5. Recommendations"
  ↓
[Debug/File Write]
```

**Example Output:**

```
# IoT Sensor Data Analysis Report

## Executive Summary
- Analyzed 10,000 data points from 15 sensors over 7 days
- Average temperature: 22.3°C (range: 18.5°C to 28.1°C)
- Average humidity: 55.2% (range: 42% to 78%)

## Key Findings
1. **Temperature Spike Detected**: Sensor #7 showed abnormal reading of 35°C on 2025-02-15
2. **Strong Correlation**: Temperature and humidity show inverse correlation (r=-0.82)
3. **Daily Pattern**: Temperature peaks around 2 PM daily

## Recommendations
1. Investigate Sensor #7 for potential malfunction
2. Consider HVAC optimization based on temperature patterns
3. Monitor humidity levels in zones with readings >70%
```

***

#### 2. Automated Markdown Report Generation

Generate professional reports with actual calculations from data.

**Flow:**

```
[Datacake GraphQL History]
  ↓
[Function: Prepare data]
  msg.payload = msg.payload.data.TEMPERATURE; // Time series data
  msg.prompt = `Create a professional markdown report analyzing temperature data.
                Include:
                - Statistical summary (mean, median, std dev)
                - Trend analysis with real calculations
                - Anomaly detection
                - Hourly/daily patterns
                
                Output ONLY the markdown report (no preamble).`;
  return msg;
  ↓
[Datacake AI]
  Model: gpt-5-mini
  Code Interpreter: ✅ Enabled
  Include msg.payload: ✅ Yes
  Prompt Source: From msg.prompt
  ↓
[File Write: temperature-report.md]
```

**Tips for Report Generation:**

* Always enable Code Interpreter for accurate calculations
* Request "Output ONLY the markdown" to get clean results
* Use specific format requirements (tables, sections, etc.)
* Request charts/graphs descriptions for visualization needs

***

#### 3. Conversational Data Analysis

Have a multi-turn conversation about your data with context.

**First Message:**

```
[Function: Initial prompt]
  msg.payload = deviceHistoryCSV;
  msg.prompt = "Analyze this device performance data and identify the top 3 issues";
  return msg;
  ↓
[Datacake AI]
  ↓
[Function: Store response ID]
  flow.set('lastResponseId', msg.responseId);
  return msg;
  ↓
[Debug: View initial analysis]
```

**Follow-up Message:**

```
[Function: Follow-up question]
  msg.prompt = "For issue #1, what are your detailed recommendations and cost estimates?";
  msg.previousResponseId = flow.get('lastResponseId');
  return msg;
  ↓
[Datacake AI]
  ↓
[Debug: View detailed recommendations]
```

**Benefits:**

* Maintain context across multiple questions
* Drill down into specific findings
* Build on previous analysis
* More efficient token usage (context is cached)

***

#### 4. Predictive Maintenance Analysis

Analyze device trends and predict potential failures.

**Flow:**

```
[Datacake GraphQL History]
  Device: Industrial Pump
  Time Range: Last 30 days
  Fields: TEMPERATURE, VIBRATION, PRESSURE
  ↓
[Function: Prepare maintenance prompt]
  msg.prompt = `Analyze this industrial pump data for predictive maintenance:
                1. Identify degradation patterns
                2. Detect anomalies in temperature, vibration, and pressure
                3. Predict potential failure points
                4. Calculate remaining useful life estimate
                5. Recommend maintenance schedule
                
                Provide actionable insights with confidence levels.`;
  return msg;
  ↓
[Datacake AI]
  Model: gpt-5-mini
  Code Interpreter: ✅ Enabled
  ↓
[Function: Parse recommendations]
  ↓
[Switch: Check urgency level]
  ↓
[Email/SMS Alert if urgent]
```

***

#### 5. Fleet Health Summary with AI Insights

Combine fleet data with AI analysis for executive reports.

**Flow:**

```
[Datacake GraphQL Fleet Health]
  ↓
[Function: Prepare fleet summary]
  msg.fleetData = msg.payload; // Store fleet data
  msg.prompt = `Analyze this IoT fleet health data and create an executive summary:
                
                ${JSON.stringify(msg.payload, null, 2)}
                
                Include:
                1. Overall fleet health assessment (grade A-F)
                2. Key concerns and their severity
                3. Devices requiring immediate attention
                4. 30-day trend analysis
                5. Cost impact of identified issues
                6. Priority action items
                
                Keep it concise for executives (max 500 words).`;
  return msg;
  ↓
[Datacake AI]
  Model: gpt-5-mini
  ↓
[Email to Management]
```

***

#### 6. Energy Consumption Optimization

Analyze consumption patterns and suggest optimizations.

**Flow:**

```
[Datacake GraphQL Consumption]
  Include Monthly Breakdown: ✅ Yes
  ↓
[Function: Prepare optimization prompt]
  msg.prompt = `Analyze this energy consumption data and provide optimization recommendations:
                
                1. Identify consumption patterns and anomalies
                2. Compare actual vs expected consumption
                3. Detect potential waste or inefficiencies
                4. Calculate potential savings
                5. Recommend specific actions with ROI estimates
                
                Focus on actionable insights.`;
  return msg;
  ↓
[Datacake AI]
  Model: gpt-5-mini
  Code Interpreter: ✅ Enabled
  ↓
[Function: Extract savings opportunities]
  ↓
[Dashboard + Email Report]
```

***

#### 7. Real-time Market Research

Combine IoT data with web research for competitive analysis.

**Flow:**

```
[Datacake GraphQL Product Stats]
  ↓
[Function: Prepare research prompt]
  msg.prompt = `Our IoT device fleet shows these statistics:
                ${JSON.stringify(msg.payload)}
                
                Research and compare:
                1. Latest IoT industry benchmarks for 2025
                2. Our performance vs competitors
                3. Emerging trends that could affect our devices
                4. Recommended improvements based on market leaders
                
                Provide sources for key claims.`;
  return msg;
  ↓
[Datacake AI]
  Model: gpt-5-mini
  Web Search: ✅ Enabled
  Search Context: High
  ↓
[Debug/Report]
```

***

#### 8. Anomaly Detection with Root Cause Analysis

Detect anomalies and automatically investigate root causes.

**Flow:**

```
[Datacake GraphQL Device]
  ↓
[Function: Detect anomaly]
  if (msg.payload.TEMPERATURE > threshold) {
    // Anomaly detected - get history
    msg.deviceId = msg.payload.id;
    return msg;
  }
  ↓
[Datacake GraphQL History]
  Time Range: Last 24 hours
  Fields: All sensor fields
  ↓
[Function: Root cause prompt]
  msg.prompt = `ANOMALY DETECTED: Temperature reading of ${msg.payload.currentTemp}°C (threshold: ${threshold}°C)
                
                Historical data:
                ${JSON.stringify(msg.payload.data)}
                
                Perform root cause analysis:
                1. When did the anomaly start?
                2. What other parameters changed around the same time?
                3. What are the most likely causes?
                4. What should be checked first?
                5. Is this a sensor issue or real environmental change?`;
  return msg;
  ↓
[Datacake AI]
  Model: gpt-5-mini
  Code Interpreter: ✅ Enabled
  ↓
[Email Alert with Analysis]
```

***

### Advanced Features

#### Conversational Context

Use `msg.previousResponseId` to maintain context across multiple AI interactions:

```javascript
// First message
msg.prompt = "Analyze this data";
msg.payload = sensorData;

// AI responds with responseId

// Follow-up message (maintains context)
msg.prompt = "Now create a summary table";
msg.previousResponseId = flow.get('previousResponseId');
// AI remembers the previous analysis
```

**Benefits:**

* More efficient (context tokens are cached and cheaper)
* AI understands references to previous analysis
* Can build complex analysis step-by-step
* Natural conversation flow

#### Dynamic Model Selection

Override the configured model based on task complexity:

```javascript
// Use fast model for simple tasks
if (msg.taskType === 'simple') {
  msg.model = 'gpt-5-nano';
} else if (msg.taskType === 'complex') {
  msg.model = 'gpt-5';
} else {
  msg.model = 'gpt-5-mini'; // default
}
```

#### Error Handling

Always include error handling for AI nodes:

```
[Datacake AI]
  ↓
[Catch Node]
  ↓
[Function: Log error and notify]
  node.error(msg.error);
  msg.payload = {
    error: true,
    message: "AI analysis failed",
    details: msg.error
  };
  return msg;
```

***

### Cost Management

#### Understanding Costs

The node automatically calculates costs for every request:

**Token Types:**

* **Input tokens:** Your prompt + data
* **Cached tokens:** Reused context (90% discount)
* **Output tokens:** AI response

**Cost Example:**

```javascript
{
  inputTokens: 1234,      // 1,234 tokens @ $0.25/1M
  cachedTokens: 500,      // 500 tokens @ $0.025/1M
  outputTokens: 2000,     // 2,000 tokens @ $2.00/1M
  totalCost: 0.004309     // $0.00431 total
}
```

#### Cost Optimization Tips

1. **Choose the right model:**
   * Use gpt-5-nano for simple tasks ($0.40/1M output)
   * Use gpt-5-mini for most tasks ($2.00/1M output)
   * Reserve gpt-5 for complex analysis ($10.00/1M output)
2. **Optimize prompts:**
   * Be specific but concise
   * Avoid sending unnecessary data
   * Use conversational context to reduce repeated information
3. **Control output length:**
   * Set appropriate Max Output Tokens
   * Request concise outputs when appropriate
   * Use bullet points instead of paragraphs
4. **Use context efficiently:**
   * Pass `previousResponseId` for follow-up questions
   * Cached tokens cost 90% less
   * Build on previous analysis instead of repeating
5. **Monitor usage:**
   * Check `msg.openai.cost` after each request
   * Log costs to track spending over time
   * Set up alerts for high-cost requests
   * View usage at [OpenAI Platform](https://platform.openai.com/usage)

#### Cost Estimation

The node shows real-time cost estimates in the configuration UI based on your Max Output Tokens setting.

**Typical Costs:**

* Simple analysis: $0.001 - $0.005
* Medium report: $0.005 - $0.020
* Complex analysis: $0.020 - $0.100
* With web search: +$0.005 - $0.020

***

### Best Practices

#### Prompts

1. **Be specific:** "Analyze temperature data and identify anomalies above 30°C" vs "Analyze data"
2. **Request structure:** "Provide output as JSON with fields: issue, severity, recommendation"
3. **Set expectations:** "Keep response under 200 words" or "Provide detailed analysis"
4. **Include context:** Explain what the data represents and what you need
5. **Request calculations:** "Calculate actual statistics" when Code Interpreter is enabled

#### Data Handling

1. **Format data properly:** CSV or JSON for structured data
2. **Limit data size:** Max \~100KB for optimal performance
3. **Preprocess when needed:** Filter/aggregate large datasets before sending
4. **Include headers:** CSV headers or JSON keys are important for understanding
5. **Add metadata:** Include timestamps, units, device names

#### Performance

1. **Typical response times:**
   * Simple queries: 5-15 seconds
   * With Code Interpreter: 10-30 seconds
   * With Web Search: 15-60 seconds
   * Complex analysis: 30-90 seconds
2. **Timeout:** 5 minutes (600 seconds)
3. **Optimization tips:**
   * Use simpler models for faster responses
   * Reduce output token limit for quicker replies
   * Avoid web search when not needed

#### Error Handling

1. **Always use Catch nodes** for AI nodes
2. **Check for rate limits** and implement backoff
3. **Validate inputs** before sending to AI
4. **Log errors with context** for debugging
5. **Provide fallback logic** for critical flows

***

### Troubleshooting

#### Common Issues

**"Missing API key" error**

* Ensure Datacake AI Config node has valid OpenAI API key
* Verify key starts with `sk-`
* Check that key is still valid at OpenAI Platform

**"Rate limit exceeded" error**

* You've exceeded OpenAI API rate limits
* Wait and retry
* Consider upgrading your OpenAI plan
* Implement request throttling in your flows

**"Quota exceeded" error**

* Your OpenAI account has insufficient credits
* Add credits at [OpenAI Platform](https://platform.openai.com/account/billing)
* Check your usage limits

**Request timeout**

* Query is too complex or data too large
* Try with smaller dataset
* Reduce Max Output Tokens
* Simplify the prompt
* Use a faster model (gpt-5-nano)

**Unexpected response format**

* Check `msg.openai.fullResponse` for debugging
* AI response format can vary
* Request specific format in prompt
* Add response format examples in prompt

**Cost higher than expected**

* Check `msg.openai.cost` to see actual costs
* Large data inputs increase input tokens
* Long responses increase output tokens
* Web search adds cost
* Code execution adds compute time

***

### Resources

* **OpenAI API Documentation:** [platform.openai.com/docs](https://platform.openai.com/docs)
* **Responses API Guide:** [platform.openai.com/docs/guides/responses](https://platform.openai.com/docs/guides/responses)
* **Pricing:** [openai.com/api/pricing](https://openai.com/api/pricing/)
* **Usage Dashboard:** [platform.openai.com/usage](https://platform.openai.com/usage)
* **OpenAI Status:** [status.openai.com](https://status.openai.com)

***

### Pricing Reference

| Model      | Input (per 1M tokens) | Cached Input (per 1M tokens) | Output (per 1M tokens) |
| ---------- | --------------------- | ---------------------------- | ---------------------- |
| gpt-5      | $1.25                 | $0.125                       | $10.00                 |
| gpt-5-mini | $0.25                 | $0.025                       | $2.00                  |
| gpt-5-nano | $0.05                 | $0.005                       | $0.40                  |

**Note:** Cached tokens (from context reuse) cost 90% less than regular input tokens.

***

### Example Prompts

#### Data Analysis

```
Analyze the provided sensor data and create a comprehensive report including:
1. Statistical overview (mean, median, std dev, min, max)
2. Time series trends and patterns
3. Anomaly detection with severity ratings
4. Correlation analysis between parameters
5. Key insights and findings
6. Actionable recommendations

Execute Python code to calculate real statistics.
```

#### Report Generation

```
Create a professional markdown report from this IoT data:

## Required Sections:
- Executive Summary (3-5 bullet points)
- Data Quality Assessment
- Key Metrics and Statistics (in a table)
- Trend Analysis with visualizations description
- Anomalies and Alerts
- Recommendations (prioritized)
- Conclusion

Output ONLY the markdown report with no explanations.
Use actual calculated values, not examples.
```

#### Predictive Maintenance

```
Analyze this industrial equipment sensor data for predictive maintenance:

1. Identify normal operating ranges for each parameter
2. Detect deviations and degradation patterns
3. Assess failure risk (Low/Medium/High)
4. Estimate remaining useful life
5. Recommend maintenance timing and type
6. Prioritize actions by urgency and impact

Provide confidence levels for each prediction.
```

#### Optimization Recommendations

```
Analyze this energy consumption data and provide optimization recommendations:

1. Identify consumption patterns (daily, weekly, seasonal)
2. Detect inefficiencies and waste
3. Compare to industry benchmarks (use web search if enabled)
4. Calculate potential savings for each recommendation
5. Estimate ROI and payback period
6. Prioritize actions by impact and feasibility

Focus on actionable, specific recommendations.
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.datacake.de/cake-red/datacake-nodes/datacake-ai-nodes.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
