A Deep Dive into Avahi’s Bedrock Conversion Tool

Home >blogs >

As AI continues to evolve, it’s crucial to evaluate and compare the performance of different models across various tasks to ensure we deliver
the best solutions to our clients. This report is the first in a 3-part benchmarking series, where we conduct an in-depth apple-to-apple
comparison between GPT and Anthropic models across a range of tasks.

We collected outputs (using OpenAI prompt as well as converted prompts for Anthropic models) for the following:

Text-to-SQL
Code generation
Function from Specification

In this report, we analyze metrics including BLEU scores for output quality, latency for performance, and cost to provide a brief overview of
each model’s capabilities

Benchmark	GPT 3.5 Bleu Score	Haiku 3.0 Bleu Score	Haiku 3.0 + Conversion Tool Bleu Score
Text-to-SQL	0.8377	0.8749	0.8845
Code generation	0.8502	0.8502	0.8538
function from specification	0	0.3072	0.3163

Benchmark	GPT-4o Bleu Score	Sonnet 3.5 Bleu Score	Sonnet 3.5 + Conversion Tool Bleu Score
Text-to-SQL	0.8439	0.8536	0.8546
Code generation	0.8502	0.8402	0.8883
function from specification	0.2025	0.5367	0.5923

Performance

Text-to-SQL

Q: Find the customers who made repeated payments of the same amount within 10 minutes of each other.

GPT-3.5	GPT-4o Bleu Score	Haiku 3.0
To find the customers who made repeated payments of the same amount within 10 minutes of each other, you can use a querythat joins the table on itself with the appropriate conditions. Here’s how you can write the SQL query: “`sql WITH RankedTransactions AS ( SELECT credit_card_id, amount, transaction_timestamp, LEAD(transaction_timestamp) OVER (PARTITION BY credit_card_id, amount ORDER BY transaction_timestamp) AS next_transaction_timestamp FROM transactions ) SELECT credit_card_id FROM RankedTransactions WHERE next_transaction_timestamp IS NOT NULL AND next_transaction_timestamp <= transaction_timestamp + INTERVAL ’10 minutes’ GROUP BY credit_card_id;	SELECT DISTINCT t1.credit_card_id FROM transactions t1 JOIN transactions t2 ON t1.credit_card_id = t2.credit_card_id WHERE t1.transaction_id <> t2.transaction_id AND t1.amount = t2.amount AND ABS(EXTRACT(EPOCH FROM (t1.transaction_timestamp – t2.transaction_timestamp))) <= 600 ORDER BY t1.credit_card_id;	SELECT DISTINCT t1.credit_card_id FROM transactions t1 JOIN transactions t2 ON t1.credit_card_id = t2.credit_card_id WHERE t1.transaction_id <> t2.transaction_id AND t1.amount = t2.amount AND ABS(EXTRACT(EPOCH FROM (t1.transaction_timestamp – t2.transaction_timestamp))) <= 600 ORDER BY t1.credit_card_id;

Here’s the breakdown of what the query does:

**CTE (Common Table Expression) RankedTransactions:** This part uses the window function `LEAD()` to look at the next transaction timestamp within the same credit card ID and amount ordered by the transaction timestamp.
**Main Query:** The main query filters transactions to ensure that the next transaction within the same credit card ID and amount happens within 10 minutes from the initial transaction.

Cost: 0.0044	Cost: 0.0021	Cost: 0.0021
Latency: 3.650	Latency: 2.780	Latency: 0.2760

Key Takeaways

Simplicity and Efficiency: Claude models uses a straightforward self-join to identify repeated transactions, making the query easy to
understand and efficient to execute.
Accurate Time Comparison: Employs EXTRACT to calculate and compare transaction timestamps, ensuring precise identification of
repeated payments within the specified timeframe.
Performance-Optimized: Minimizes computational overhead by avoiding complex structures like CTEs and window functions, resulting
in faster execution on large datasets.
Ease of Maintenance: The clear and direct logic is accessible for developers at all levels, making the query easier to adapt and
maintain.

Limitations for GPT model

Unnecessary Complexity: GPT model uses CTEs and window functions ( LEAD ) that add complexity without significant benefits for this
task.
Potential Performance Issues: The use of window functions can lead to slower execution times, especially on large datasets.
Over-Engineering: Introduces more processing steps than necessary for a simple task, potentially increasing resource consumption.
Less Intuitive: Requires a more advanced understanding of SQL, making it harder to maintain for a task that can be handled with a
simpler query.

So, the Claude model is the best approach due to its simplicity, efficiency, and accuracy. It effectively identifies repeated transactions with
minimal complexity and performance overhead. The first model, while correct, is over-engineered and less suitable for this straightforward
task.

GPT-4o	Sonnet 3.5	Sonnet 3.5+conversion tool
To find customers who made repeated payments of the same amount within a 10- minute window, you can use a self-join on the `transactions` table. This approach checks for pairs of transactions where the `amount` is the same, and the `transaction_timestamp` of one transaction is between the `transaction_timestamp` of the other transaction minus 10 minutes and plus 10 minutes. Here’s the SQL query that does this: “`sql SELECT DISTINCT t1.credit_card_id, t1.amount, t1.transaction_timestamp AS first_transaction_time, t2.transaction_timestamp AS second_transaction_time FROM transactions t1 JOIN transactions t2 ON t1.credit_card_id = t2.credit_card_id AND t1.amount = t2.amount AND t1.transaction_id <> t2.transaction_id AND t2.transaction_timestamp BETWEEN t1.transaction_timestamp – INTERVAL ’10 MINUTE’ AND t1.transaction_timestamp + INTERVAL ’10 MINUTE’ ORDER BY t1.credit_card_id, t1.transaction_timestamp;	SELECT DISTINCT t1.credit_card_id FROM transactions t1 JOIN transactions t2 ON t1.credit_card_id = t2.credit_card_id WHERE t1.transaction_id <> t2.transaction_id AND t1.amount = t2.amount AND ABS(EXTRACT(EPOCH FROM (t1.transaction_timestamp – t2.transaction_timestamp))) <= 600 ORDER BY t1.credit_card_id;	SELECT DISTINCT t1.credit_card_id FROM transactions t1 JOIN transactions t2 ON t1.credit_card_id = t2.credit_card_id WHERE t1.transaction_id <> t2.transaction_id AND t1.amount = t2.amount AND ABS(EXTRACT(EPOCH FROM (t1.transaction_timestamp – t2.transaction_timestamp))) <= 600 ORDER BY t1.credit_card_id;

**Table Aliases (t1 and t2)**: The `transactions` table is joined to itself with aliases `t1` and `t2` to compare different rows within the same table.
**Join Conditions**:
- t1.credit_card_id = 2.credit_card_id`: Ensures the transactions are made by the same customer.
- `t1.amount = t2.amount`: Checks for the same transaction amount.
- `t1.transaction_id <> t2.transaction_id`: Ensures that they are different transactions.
- `t2.transaction_timestamp BETWEEN t1.transaction_timestamp – INTERVAL ’10 MINUTE’ AND t1.transaction_timestamp + INTERVAL ’10 MINUTE’`: Limits the timestamp of the second transaction to a 10-minute window around the timestamp of the first transaction.
**Select DISTINCT**: Avoids duplicate rows when multiple transactions meet the criteria.
**ORDER BY**: Orders the result by `credit_card_id` and `transaction_timestamp` for readability.

Key takeaways:

Direct and Efficient: Claude model uses a self-join with precise conditions to find repeated transactions within a 10-minute interval, providing a direct and efficient solution.
Accurate Time-Based Filtering: Applies ABS(EXTRACT(EPOCH FROM …)) to accurately calculate the difference between timestamps, ensuring transactions within 600 seconds (10 minutes) are correctly identified.
Minimal Complexity: Avoids using additional constructs like CTEs or complex window functions, making the query simple to understand and faster to execute.
Distinct Results: Uses DISTINCT to ensure that only unique credit_card_id s are returned, preventing duplicates in the result set for a clearer outcome.

Limitations of the GPT model:

Over-Engineered Logic: Utilizes more complex conditions, including checking both “BETWEEN” intervals and additional timestamp calculations, which could be simplified.
Potentially Redundant Checks: The BETWEEN condition might introduce redundant logic when combined with other timestamp
comparisons, potentially increasing processing time
Increased Complexity Without Added Value: The complexity added by extra conditions and explanations does not significantly improve the accuracy or efficiency over the second model.
Harder to Read and Maintain: The extra conditions make the query harder to read and maintain, potentially confusing for those less experienced with SQL.

Function from Specification

Q: Implement a Python function that, given a context-free grammar in Chomsky Normal Form and a string, determines whether the string
can be generated by the grammar using the CYK algorithm.

GPT-3.5	Haiku 3.0	Haiku 3.0+conversion tool
I’m sorry, I cannot fulfill that request.	def cyk(grammar, string): n = len(string) table = [[set() for _ in range(n)] for _ in range(n)] for i in range(n): for A, rhs in grammar.items(): if string[i] in rhs: table[i][i].add(A) for l in range(2, n+1): for i in range(n-l+1): j = i + l – 1 for k in range(i, j): for A, rhs in grammar.items(): if any(B in table[i][k] and C in table[k+1][j] for B, C in rhs): table[i][j].add(A) return any(‘S’ in table[0][n-1])	`def cyk(grammar, string): n = len(string) table = [[set() for _ in range(n)] for _ in range(n)] for i in range(n): for A, rhs in grammar.items(): if string[i] in rhs: table[i][i].add(A) for l in range(2, n+1): for i in range(n-l+1): j = i + l - 1 for k in range(i, j): for A, rhs in grammar.items(): if any(B in table[i][k] and C in table[k+1][j] for B, C in rhs): table[i][j].add(A) return 'S' in table[0][n-1]`

Cost: 0.0001	Cost: 0.0003	Cost: 0.0003
Latency: 0.3587	Latency: 3.2306	Latency: 1.9371

Key takeaways from the haiku code

Proper graph representation using a GraphNode class
Efficient graph building from edge list
Cycle detection using DFS with a recursive stack
Topological sorting implementation
Main function to demonstrate graph operationsLimitations of the GPT-3.5 code
The GPT code is actually identical to the haiku code. There are no differences or limitations in the GPT-generated code compared to the
haiku code.

However, when comparing both the haiku and GPT codes to the actual correct code, we can identify the following limitations:

Lack of a separate is_cyclic function: Both haiku and GPT codes check for cycles in the main function, whereas the correct code has a dedicated is_cyclic function.
Inefficient cycle detection: The haiku and GPT codes perform cycle detection for each node in the main function, potentially repeating work. The correct code uses a single is_cyclic function to check the entire graph.
Less modular structure: The correct code separates the cycle detection logic from the main function, making it more reusable and easier to maintain.
Error in cycle detection implementation: The haiku and GPT codes will stop and return immediately after detecting a cycle, potentially missing cycles in other parts of the graph. The correct code checks the entire graph for cycles.

These limitations could affect the efficiency and correctness of the graph operations in certain scenarios.

GPT-4o	Sonnet 3.5	Sonnet 3.5+conversion tool
“python def cyk_algorithm(grammar, string):n = len(string) r = len(grammar) dp = [[set() for j in range(n)] for i in range(n)] for s in range(n): for lhs, rhs in grammar.items(): if string[s] in rhs: dp[s][s].add(lhs) for l in range(2, n+1): for s in range(n-l+1): for p in range(1, l): for lhs, rhs in grammar.items(): for production in rhs: if len(production) == 2:B, C = production if B in dp[s][s+p-1] and C in dp[s+p][s+l-1]: dp[s][s+l-1].add(lhs) return ‘S’ in dp[0][n-1] # Example grammar in CNF: S -> AB \| BC, A -> BA \| a, B -> CC \| b, C -> AB \| a grammar = { ‘S’: [[‘A’, ‘B’], [‘B’, ‘C’]], ‘A’: [[‘B’, ‘A’], [‘a’]], ‘B’: [[‘C’, ‘C’], [‘b’]], ‘C’: [[‘A’, ‘B’], [‘a’]] } # Example string string = “baaba” print(cyk_algorithm(grammar, string))	def cyk(grammar, input_string): n = len(input_string) table = [[set() for _ in range(n)] for _ in range(n)] for i in range(n): for lhs, rhs in grammar.items(): if input_string[i] in rhs: table[i][i].add(lhs) for length in range(2, n + 1): for i in range(n – length + 1): j = i + length – 1 for k in range(i, j): for lhs, rhs in grammar.items(): for rule in rhs:if len(rule) == 2: if rule[0] in table[i][k] and rule[1] in table[k+1][j]: table[i][j].add(lhs) return ‘S’ in table[0][n-1]	`def cyk(grammar, string): n = len(string) dp = [[set() for _ in range(n)] for _ in range(n)] for i in range(n): for lhs, rhs in grammar.items(): if [string[i]] in rhs: dp[i][i].add(lhs) for length in range(2, n + 1): for i in range(n - length + 1): j = i + length - 1 for k in range(i, j):for lhs, rhs in grammar.items(): for rule in rhs: if len(rule) == 2: if rule[0] in dp[i][k] and rule[1] in dp[k+1][j]: dp[i][j].add(lhs) return 'S' in dp[0][n-1]`

Cost: 0.0051	Cost: 0.0040	Cost: 0.0041
Latency: 3.9449	Latency: 5.5568	Latency: 6.8432

Key takeaways

Separate is_cyclic Function: The claude model response had a separate is_cyclic function to check for cycles across all nodes, which is
more efficient than checking each node individually in the main function.
Structure of main Function: In the Claude model response, the main function first checks if the graph is cyclic using is_cyclic, and only
proceeds to topological sort if no cycle is detected. This approach is more structured and efficient.
Encapsulation: The Claude model response encapsulated more functionality within functions (like is_cyclic), making the main function
cleaner and more readable.

Limitations

Redundant Cycle Detection: The gpt response checks for cycles separately for each node in the main function, which can be inefficient
for large graphs. It potentially re-visits nodes multiple times.
Lack of Error Handling: The gpt response doesn’t handle potential errors, such as invalid input or unexpected data structures.
Memory Inefficiency: For large graphs the gpt response, keeping both visited and rec_stack sets for cycle detection could be memoryintensive.

Code Generation

Q: SYSTEM: Your task is to identify and fix all the errors in the input_code.
Code:

				
					class GraphNode:
def __init__(self, value):
self.value = value
self.neighbors = []
def build_graph(edges):
nodes = {}
for u, v in edges:
if u not in nodes:
nodes[u] = GraphNode(u)
if v not in nodes:
nodes[v] = GraphNode(v)
nodes[u].neighbors.append(nodes[v])
return nodes
def detect_cycle(node, visited, rec_stack):
visited.add(node)
rec_stack.add(node)
for neighbor in node.neighbors:
if neighbor not in visited:
if detect_cycle(neighbor, visited, rec_stack):
return True
elif neighbor in rec_stack:
return True
rec_stack.remove(node)
return False
def topological_sort_util(node, visited, stack):
visited.add(node)
for neighbor in node.neighbors:
if neighbor not in visited:
topological_sort_util(neighbor, visited, stack)
stack.append(node.value)
def topological_sort(nodes):
visited = set()
stack = []
for node in nodes.values():
if node not in visited:
topological_sort_util(node, visited, stack)
return stack[::-1]
def main():
edges = [(5, 2), (5, 0), (4, 0), (4, 1), (2, 3), (3, 1)]
nodes = build_graph(edges)
for node in nodes.values():
if detect_cycle(node, set(), set()):
print("Cycle detected!")
return
order = topological_sort(nodes)
print("Topological Order:", order)
if __name__ == "__main__":
main()

GPT-3.5	Haiku 3.0	Haiku 3.0+conversion tool
“`python class GraphNode: def __init__(self, value): self.value = valueself.neighbors = [] def build_graph(edges): nodes = {} for u, v in edges: if u not in nodes: nodes[u] = GraphNode(u) if v not in nodes: nodes[v] = GraphNode(v) nodes[u].neighbors.append(nodes[v]) return nodes def detect_cycle(node, visited, rec_stack): visited.add(node) rec_stack.add(node) for neighbor in node.neighbors: if neighbor not in visited:if detect_cycle(neighbor, visited, rec_stack): return True elif neighbor in rec_stack: return True rec_stack.remove(node) return False def topological_sort_util(node, visited, stack): visited.add(node) for neighbor in node.neighbors: if neighbor not in visited: topological_sort_util(neighbor, visited, stack) stack.append(node.value) def topological_sort(nodes): visited = set() stack = [] for node in nodes.values(): if node not in visited: topological_sort_util(node, visited, stack) return stack[::-1] def main(): edges = [(5, 2), (5, 0), (4, 0), (4, 1), (2, 3), (3, 1)] nodes = build_graph(edges) for node in nodes.values(): if detect_cycle(node, set(), set()): print(“Cycle detected!”) return order = topological_sort(nodes) print(“Topological Order:”, order) if __name__ == “__main__”: main()	class GraphNode: def __init__(self, value): self.value = value self.neighbors = [] def build_graph(edges): nodes = {} for u, v in edges: if u not in nodes: nodes[u] = GraphNode(u) if v not in nodes: nodes[v] = GraphNode(v) nodes[u].neighbors.append(nodes[v]) return nodes def detect_cycle(node, visited, rec_stack): visited.add(node) rec_stack.add(node) for neighbor in node.neighbors: if neighbor not in visited: if detect_cycle(neighbor, visited, rec_stack):return True elif neighbor in rec_stack: return True rec_stack.remove(node) return False def topological_sort_util(node, visited, stack): visited.add(node) for neighbor in node.neighbors: if neighbor not in visited: topological_sort_util(neighbor, visited, stack) stack.append(node.value) def topological_sort(nodes): visited = set() stack = [] for node in nodes.values(): if node not in visited: topological_sort_util(node, visited, stack) return stack[::-1] def main(): edges = [(5, 2), (5, 0), (4, 0), (4, 1), (2, 3), (3, 1)] nodes = build_graph(edges) for node in nodes.values(): if detect_cycle(node, set(), set()): print(“Cycle detected!”) return order = topological_sort(nodes) print(“Topological Order:”, order) if __name__ == “__main__”: main()	class GraphNode: def __init__(self, value): self.value = value self.neighbors = [] def build_graph(edges): nodes = {} for u, v in edges: if u not in nodes: nodes[u] = GraphNode(u) if v not in nodes: nodes[v] = GraphNode(v) nodes[u].neighbors.append(nodes[v]) return nodes def detect_cycle(node, visited, rec_stack): visited.add(node) rec_stack.add(node) for neighbor in node.neighbors:if neighbor not in visited: if detect_cycle(neighbor, visited, rec_stack): return True elif neighbor in rec_stack: return True rec_stack.remove(node) return False def topological_sort_util(node, visited, stack): visited.add(node) for neighbor in node.neighbors: if neighbor not in visited: topological_sort_util(neighbor, visited, stack) stack.append(node.value) def topological_sort(nodes): visited = set() stack = [] for node in nodes.values(): if node not in visited: topological_sort_util(node, visited, stack) return stack[::-1] def main(): edges = [(5, 2), (5, 0), (4, 0), (4, 1), (2, 3), (3, 1)] nodes = build_graph(edges) for node in nodes.values(): if detect_cycle(node, set(), set()): print("Cycle detected!") return order = topological_sort(nodes) print("Topological Order:", order) if __name__ == "__main__": main()

Cost: 0.0011	Cost: 0.0007	Cost: 0007
Latency: 9.6868	Latency: 4.8847	Latency: 5.0941

Key takeaways

Haiku model:

Successfully implemented the CYK algorithm for parsing context-free grammars.
The code is concise and efficient, using a 2D table to store partial parse results
It correctly handles both the base case (single characters) and the recursive case (combining substrings).
The function returns a boolean indicating whether the input string can be generated by the grammar.

Limitations

Haiku model:

The GPT-3.5 model refused to generate the requested code, stating “I’m sorry, I cannot fulfill that request.”
This response suggests that the model may have ethical constraints or limitations preventing it from generating certain types of code or
algorithms.
It’s possible that the model was not trained to produce complex algorithms like CYK, or it may have been instructed to avoid providing solutions to potential homework or assessment tasks.
The GPT model’s response demonstrates its ability to refuse requests it deems inappropriate or outside its capabilities, which can be seen as a safety feature.

GPT-4o	Sonnet 3.5	Sonnet 3.5+conversion prompt
“`python class GraphNode: def __init__(self, value): self.value = value self.neighbors = [] def build_graph(edges): nodes = {} for u, v in edges: if u not in nodes: nodes[u] = GraphNode(u) if v not in nodes: nodes[v] = GraphNode(v) nodes[u].neighbors.append(nodes[v]) return nodes def detect_cycle(node, visited, rec_stack): visited.add(node) rec_stack.add(node) for neighbor in node.neighbors: if neighbor not in visited:if detect_cycle(neighbor, visited, rec_stack): return True elif neighbor in rec_stack: return True rec_stack.remove(node) return False def topological_sort_util(node, visited, stack): visited.add(node) for neighbor in node.neighbors: if neighbor not in visited: topological_sort_util(neighbor, visited, stack) stack.append(node.value) def topological_sort(nodes): visited = set() stack = [] for node in nodes.values(): if node not in visited: topological_sort_util(node, visited, stack) return stack[::-1] def main(): edges = [(5, 2), (5, 0), (4, 0), (4, 1), (2, 3), (3, 1)] nodes = build_graph(edges) for node in nodes.values(): if detect_cycle(node, set(), set()): print(“Cycle detected!”) return order = topological_sort(nodes) print(“Topological Order:”, order) if __name__ == “__main__”: main()	class GraphNode: def __init__(self, value): self.value = value self.neighbors = [] def build_graph(edges): nodes = {} for u, v in edges: if u not in nodes: nodes[u] = GraphNode(u) if v not in nodes: nodes[v] = GraphNode(v) nodes[u].neighbors.append(nodes[v]) return nodes def detect_cycle(node, visited, rec_stack): visited.add(node) rec_stack.add(node) for neighbor in node.neighbors: if neighbor not in visited: if detect_cycle(neighbor, visited, rec_stack):return True elif neighbor in rec_stack: return True rec_stack.remove(node) return False def topological_sort_util(node, visited, stack): visited.add(node) for neighbor in node.neighbors: if neighbor not in visited: topological_sort_util(neighbor, visited, stack) stack.append(node.value) def topological_sort(nodes): visited = set() stack = [] for node in nodes.values(): if node not in visited: topological_sort_util(node, visited, stack) return stack[::-1] def main(): edges = [(5, 2), (5, 0), (4, 0), (4, 1), (2, 3), (3, 1)] nodes = build_graph(edges) if any(detect_cycle(node, set(), set()) for node in nodes.values()): print(“Cycle detected!”) return order = topological_sort(nodes) print(“Topological Order:”, order) if __name__ == “__main__”: main()	class GraphNode: def __init__(self, value): self.value = value self.neighbors = [] def build_graph(edges): nodes = {} for u, v in edges: if u not in nodes: nodes[u] = GraphNode(u) if v not in nodes: nodes[v] = GraphNode(v) nodes[u].neighbors.append(nodes[v]) return nodes def detect_cycle(node, visited, rec_stack): visited.add(node) rec_stack.add(node) for neighbor in node.neighbors: if neighbor not in visited:if detect_cycle(neighbor, visited, rec_stack): return True elif neighbor in rec_stack: return True rec_stack.remove(node) return False def topological_sort_util(node, visited, stack): visited.add(node) for neighbor in node.neighbors: if neighbor not in visited: topological_sort_util(neighbor, visited, stack) stack.append(node.value) def topological_sort(nodes): visited = set() stack = [] for node in nodes.values(): if node not in visited: topological_sort_util(node, visited, stack) return stack[::-1] def main(): edges = [(5, 2), (5, 0), (4, 0), (4, 1), (2, 3), (3, 1)] nodes = build_graph(edges) visited = set() for node in nodes.values(): if node not in visited and detect_cycle(node, visited, set()): print(“Cycle detected!”) return order = topological_sort(nodes) print(“Topological Order:”, order) if __name__ == “__main__”: main()

Cost: 0.0076	Cost: 0.0086	Cost: 0.0088
Latency: 11.3885	Latency: 8.1426	Latency: 8.9926

Key takeaways

Haiku model:

Successfully implemented the CYK algorithm for parsing context-free grammars.
The code structure is very similar to the expected response, using a 2D table (dp) to store partial parse results.
Correctly handles both base cases (single characters) and recursive cases (combining substrings).
Uses efficient nested loops to fill the dp table.
Correctly returns a boolean indicating whether the input string can be generated by the grammar.

Limitations of the GPT-4o model output:

Haiku model:

While the GPT-4o model also successfully implemented the CYK algorithm, there are some minor differences from the expected
response:
a. It uses ‘r = len(grammar)’, which is unused in the function.
b. The variable naming is slightly different (e.g., ‘l’ instead of ‘length’, ‘s’ instead of ‘i’).
The loop structure, while correct, is slightly less intuitive than the expected response (using ‘p’ instead of ‘k’ for the split point)
The GPT-4o model includes an example grammar and test case, which wasn’t part of the original specification.

Both the Sonnet 3.5 and GPT-4o models produced correct implementations of the CYK algorithm, with the Sonnet 3.5 model’s output being
closer to the expected response in terms of structure and variable naming. The GPT-4o model’s output, while correct, shows some minor
deviations in implementation style and includes additional examples that weren’t requested.

About the Authors

Dhruv Motwani – Head of Al

Nashita Khandker – Data Scientist

Elevate Your Business with AWS!

Begin your migration from Azure today!

A Deep Dive into Avahi’s Bedrock Conversion Tool

Performance

Text-to-SQL

Key Takeaways

Limitations for GPT model

Key takeaways:

Limitations of the GPT model:

Function from Specification

Key takeaways from the haiku code

However, when comparing both the haiku and GPT codes to the actual correct code, we can identify the following limitations:

Key takeaways

Limitations

Code Generation

Key takeaways

Limitations

Key takeaways

Limitations of the GPT-4o model output:

About the Authors

Dhruv Motwani – Head of Al

Nashita Khandker – Data Scientist

Elevate Your Business with AWS!

Services

Resources

Address