← Back to Benchmark Results

anthropic/claude-opus-4-5-20251101@thinking=50000

76.8%
Pass Rate
43/56
Tasks Passed
3
Runs
75.0%
pass@1
76.8%
pass@3
96.4%
Consistency
0.1
Temperature
50,000
Thinking
553,648
Tokens
$9.26
Cost
1st: 1052nd: 21Failed: 1343/56 passed

Known Shortcomings (8)

Sorted by occurrence count (most frequent first)

# Concept AL Concept Count Affected Tasks
1 page-extension-with-table-extension page-extension-and-table-extension 1 CG-AL-E006

Description: The model failed to generate any code at all ('Generated code not found'). The task required creating both a table extension (to add the new fields 'Preferred Contact Method', 'Customer Notes', and 'VIP Customer' to the Customer table) and a page extension (to add those fields to the Customer Card page's General group). The tests reference these fields both on the Customer record and on the TestPage 'Customer Card', meaning the model needed to produce a tableextension object extending table 18 (Customer) with the three new fields, plus a pageextension object (ID 70000) extending page 21 (Customer Card) adding those fields to the General group. The model produced no output, resulting in compilation errors because the fields don't exist on either the table or the page.

Correct Pattern:
tableextension 70000 "Customer Table Extension" extends Customer
{
    fields
    {
        field(50000; "Preferred Contact Method"; Option)
        {
            Caption = 'Preferred Contact Method';
            OptionMembers = Email,Phone,Mail,SMS;
            OptionCaption = 'Email,Phone,Mail,SMS';
        }
        field(50001; "Customer Notes"; Text[500])
        {
            Caption = 'Customer Notes';
        }
        field(50002; "VIP Customer"; Boolean)
        {
            Caption = 'VIP Customer';
        }
    }
}

pageextension 70000 "Customer Card Extension" extends "Customer Card"
{
    layout
    {
        addlast(General)
        {
            field("Preferred Contact Method"; Rec."Preferred Contact Method")
            {
                ApplicationArea = All;
                Caption = 'Preferred Contact Method';
            }
            field("Customer Notes"; Rec."Customer Notes")
            {
                ApplicationArea = All;
                Caption = 'Customer Notes';
            }
            field("VIP Customer"; Rec."VIP Customer")
            {
                ApplicationArea = All;
                Caption = 'VIP Customer';
            }
        }
    }
}
Incorrect Pattern:
// Generated code not found

Error Codes: AL0132

2 reserved-keyword-as-parameter-name al-reserved-keywords 1 CG-AL-H014

Description: The model used 'Key' as a parameter name in the SafeGetText procedure. In AL, 'key' is a reserved keyword and cannot be used as an identifier. The task definition itself specifies the parameter name as 'Key' which is a reserved word in AL, but the model should have known to escape it or use an alternative name. The compilation error AL0105 at line 52 indicates 'key' is being parsed as a keyword rather than an identifier. The model failed to recognize that 'Key' is reserved in AL and should have used a different parameter name (e.g., 'KeyName', 'PropertyKey') or escaped it with double quotes.

Correct Pattern:
procedure SafeGetText(Obj: JsonObject; "Key": Text; DefaultValue: Text): Text — or use a non-reserved name like KeyName: Text
Incorrect Pattern:
procedure SafeGetText(Obj: JsonObject; Key: Text; DefaultValue: Text): Text

Error Codes: AL0105

3 dictionary-iteration-syntax dictionary-foreach-keys 1 CG-AL-H020

Description: The model used 'Key' as a variable name or keyword incorrectly on lines 59 and 102 of the generated code. In AL, iterating over Dictionary keys requires using the Dictionary.Keys() method to get a List of keys, then iterating with foreach. The model likely tried to use a pattern like 'foreach Key in Dict' or used 'Key' as an undeclared/reserved identifier, causing AL0519 ('Key' is not valid value in this context) and subsequent syntax errors. The correct pattern is to declare a variable for the key, call Dict.Keys() to get a List, and iterate over that list.

Correct Pattern:
var
    KeyList: List of [Text];
    KeyValue: Text;
begin
    KeyList := Dict.Keys();
    foreach KeyValue in KeyList do begin
        // process KeyValue
    end;
end;
Incorrect Pattern:
Key (used as invalid identifier/keyword at lines 59 and 102)

Error Codes: AL0519

4 empty-or-malformed-code-generation interface-definition 1 CG-AL-M009

Description: The model failed to generate any valid AL code. The compilation errors (AL0198 at line 1:1 expecting application object keywords, unexpected backtick characters at line 5, improperly terminated text literals) indicate the model either produced empty output, markdown-wrapped output (backticks), or completely malformed content instead of proper AL interface and codeunit definitions. The task required creating an interface 'Shipping Provider' and an implementing codeunit 'Standard Shipping Provider' (ID 70004), plus the test file references a mock codeunit 'CG-AL-M009 Mock Shipping' that also needed to be generated. The model produced no usable AL code at all.

Correct Pattern:
interface "Shipping Provider"
{
    procedure CalculateShippingCost(Weight: Decimal; FromCountry: Text; ToCountry: Text): Decimal;
    procedure EstimateDeliveryTime(FromCountry: Text; ToCountry: Text; ServiceType: Text): Integer;
    procedure CreateShipment(OrderNumber: Text; FromAddress: Text; ToAddress: Text; Weight: Decimal): Text[50];
    procedure TrackShipment(TrackingNumber: Text): Text[100];
    procedure ValidateAddress(Street: Text; City: Text; State: Text; ZipCode: Text; Country: Text): Boolean;
}

codeunit 70004 "Standard Shipping Provider" implements "Shipping Provider"
{
    // ... implementation of all interface procedures
}

codeunit 70005 "CG-AL-M009 Mock Shipping" implements "Shipping Provider"
{
    // ... mock implementation for tests
}
Incorrect Pattern:
// Generated code not found (appears to contain markdown backticks and no valid AL objects)

Error Codes: AL0198

5 temporary-table-parameter-handling temporary-table 1 CG-AL-H003

Description: The test TestHighInventoryDiscount fails because the generated code does not correctly populate the TempResult temporary record with items matching the criteria. The test finds an Item with Inventory >= 100 and Unit Price > 0, then calls ProcessItemsWithDiscount with MinDiscount=15, and expects to find that item in TempResult. The assertion 'High inventory item should be in results' fails, meaning the generated codeunit's procedure either doesn't correctly loop through items, doesn't correctly calculate discounts based on inventory thresholds, doesn't properly insert records into the temporary table parameter, or has an issue with the line numbering/key assignment. Since the generated code was not captured ('Generated code not found' in the prompt but the app did compile and publish successfully on attempt 2), the model produced code that compiles but has a logic error in how it processes items and populates the temporary table. This is a model knowledge gap in correctly implementing temporary table processing with proper item filtering and discount calculation logic in AL.

Correct Pattern:
procedure ProcessItemsWithDiscount(var TempResult: Record "CG Discount Result" temporary; MinDiscount: Decimal)
var
    Item: Record Item;
    LineNo: Integer;
    DiscountPct: Decimal;
begin
    TempResult.Reset();
    TempResult.DeleteAll();
    LineNo := 0;

    Item.SetFilter("Unit Price", '>0');
    if Item.FindSet() then
        repeat
            if Item.Inventory >= 100 then
                DiscountPct := 15
            else if Item.Inventory >= 50 then
                DiscountPct := 10
            else if Item.Inventory >= 10 then
                DiscountPct := 5
            else
                DiscountPct := 0;

            if DiscountPct >= MinDiscount then begin
                LineNo += 1;
                TempResult.Init();
                TempResult."Line No." := LineNo;
                TempResult."Item No." := Item."No.";
                TempResult."Original Price" := Item."Unit Price";
                TempResult."Discount Percent" := DiscountPct;
                TempResult."Final Price" := Round(Item."Unit Price" * (1 - DiscountPct / 100), 0.01);
                TempResult.Insert();
            end;
        until Item.Next() = 0;
end;
Incorrect Pattern:
// Generated code compiled but was not captured in the failure report. The codeunit published successfully but TestHighInventoryDiscount failed with: Assert.IsTrue failed. High inventory item should be in results
6 multiline-string-literals al-string-syntax 1 CG-AL-E050

Description: The model failed to generate valid AL code. The compilation errors indicate the model likely used backtick-based multiline strings (like JavaScript template literals) or other non-AL syntax for multiline strings. AL does not support backtick-delimited strings. In AL, multiline text can be achieved through string concatenation with newline characters (e.g., using carriage return/line feed characters), or in newer AL versions (runtime 12.0+) through verbatim string literals. The model produced code with unexpected '`' characters (AL0183 errors) and improperly terminated text literals (AL0360), indicating it confused AL string syntax with another language's multiline string syntax.

Correct Pattern:
In AL, multiline strings should use string concatenation with CR/LF characters, e.g.:

procedure GetSqlQuery(): Text
var
    CrLf: Text[2];
    Result: Text;
begin
    CrLf[1] := 13;
    CrLf[2] := 10;
    Result := 'SELECT CustomerNo, Name, Balance' + CrLf + 'FROM Customer' + CrLf + 'WHERE Active = true' + CrLf + 'ORDER BY Name';
    exit(Result);
end;

For GetEmailBody, use StrSubstNo or Replace to insert the customer name parameter into the template.
Incorrect Pattern:
Code contained backtick characters (```) and improperly terminated text literals, suggesting template literal syntax from JavaScript or markdown code fences were included in the output

Error Codes: AL0183

7 complex-report-with-helper-codeunit-structure report-definition-and-codeunit-syntax 1 CG-AL-M007

Description: The model generated AL code for the report and mock calculator codeunit but produced a syntax error at line 436, indicating malformed code structure. The error 'end expected' and 'Syntax error' at the same position suggest the model failed to properly close begin/end blocks, procedure definitions, or object definitions in the generated code. This is a complex task requiring both a Report 70001 'Sales Performance Analysis' and a Codeunit 'CG-AL-M007 Mock Calculator' with multiple procedures (Initialize, AddSalesLine, GetRunningTotalByCustomer, GetRunningTotalByRegion, CalculateAverageOrderValue, GetCustomerRank, GetTopProduct, GetProductSalesQuantity, CalculateYoYComparison, CalculateOrderFrequency, GetTotalSales, GetCustomerCount). The model failed to produce syntactically valid AL code, likely mismatching begin/end blocks or improperly structuring the codeunit with temporary table-based data storage and dictionary-like accumulation patterns.

Correct Pattern:
The generated code should include: 1) Report 70001 'Sales Performance Analysis' with proper dataitem hierarchy (Customer > Sales Header > Sales Line), request page with date filters, and proper trigger implementations. 2) Codeunit 'CG-AL-M007 Mock Calculator' with proper temporary record storage or List/Dictionary variables to track sales lines, with all required procedures properly structured with matching begin/end blocks. All objects must have properly closed blocks ending with the closing curly brace '}'.
Incorrect Pattern:
// Code at line 436 has syntax error - likely unclosed begin/end block or misplaced keyword in the generated report or codeunit definition

Error Codes: AL0104

8 http-client-error-handling-in-tests httpclient-usage 1 CG-AL-M005

Description: The model generated code that actually attempts to make HTTP requests using HttpClient in SendPaymentRequest and HandlePaymentWebhook, but did not handle the case where no valid URL is configured. The error 'Failed to create the URL for the request. Please, make sure to set a valid URL.' indicates the model's implementation tries to call HttpClient.Send() with an empty or invalid URL. In a test/sandbox environment, the codeunit should either use try-catch patterns (TryFunction or if-then error handling) around HTTP calls, or the implementation should gracefully return false when the URL is not configured rather than throwing an unhandled error. The model failed to implement proper error handling around HttpClient operations, which is essential for testability without actual external services.

Correct Pattern:
procedure SendPaymentRequest(OrderId: Text; Amount: Decimal; Currency: Text; var ResponseJson: JsonObject): Boolean
var
    Client: HttpClient;
    RequestMessage: HttpRequestMessage;
    ResponseMessage: HttpResponseMessage;
    IsSuccessful: Boolean;
begin
    // Validate inputs and build request
    if BaseUrl = '' then
        exit(false);
    
    // Use [TryFunction] pattern or wrap in if not Client.Send(...) then exit(false)
    if not Client.Send(RequestMessage, ResponseMessage) then
        exit(false);
    
    exit(true);
end;
Incorrect Pattern:
// Line 61 of SendPaymentRequest attempts HttpClient.Send() with an unconfigured/empty URL, causing runtime error