Designing a Deterministic, Low Latency Decision Engine with C++

A high performance, deterministic decision tree engine built in C++ that evaluates rules via gRPC with low latency, hot reloadable trees, and production ready observability.

Piyush Choudhari
9 min read
AIML
decision-trees
blog
c++
grpc

Descision Tree Engines: What are they?

decision tree illustration

Let's start with an example Let's say you have a certain amount of income and a bank wants to assign a score to your account. It's pretty simple:

if income>50000:
    score=100
else:
    score=50

We can just use a few control statments to assign the score.

Now the bank wants to take into account how much loan you have taken into your score:

if income>50000:
    if loan<400_000:
        score=100
    elif loan<1_000_000:
        score=80
    else:
        score=50
else:
    score=50

Now the number of control statements increased and pretty as the scoring mechanism gets complex, the instructions will too. Soon, it will be unmaintainable. Another way to implement these decision control flows are Decision Trees.

Decision tree engines, when viewed purely in terms of rules and workflows, refer to systems that use a tree-like structure of decision rules to model decisions, classifications, or predictions. These are essentially rule-based engines where the knowledge is explicitly represented as a hierarchy of if-then-else conditions.

When you run a decision tree engine on new data, the process is always the same:

  1. Start at the root node.
  2. Evaluate the condition (rule) at the current node.
  3. Go left or right based on the answer.
  4. Repeat until you reach a leaf.
  5. Return the leaf’s value.

Planning

Goal

I planned to implement a decision tree engine specifically for risk profiling/fraud detection flow. Goal was to create a high performance and scalable engine.

Input: JSON decision trees

Output: Next node, actions and updated context

Things to keep in mind

  • Stateless per request
  • Deterministic: same input -> same output
  • Explainable: return the decision path
  • gRPC API and embeddable C++ library

Architecture

At a high level, a caller sends:

  • tree_version: which tree to use
  • current_node: where in the tree evaluation starts
  • user_input: optional raw input (used for choice nodes)
  • context: flat key-value map of facts about the transaction, user, device, etc.

The Engine:

  1. Looks up the JSON tree definition for tree_version.
  2. Finds the node with id current_node.
  3. Evaluates that node according to its type.
  4. Returns:
    • next_node
    • actions to run (for action nodes)
    • updated_context
    • confidence and the decision_path.

The Implementation: Specifically The Fraud Detection Use Case

Decision Trees

  • Tree definition:
    • JSON file
    • Versioned via "version" (e.g. "fraud_detection_v1")
    • Contains an array of "nodes"; each node has an "id" and "type".
  • Loading:
    • Trees are loaded either from disk (TREE_DIRECTORY) or via the LoadTree gRPC API.
    • Validation happens on load:
      • No cycles
      • All referenced nodes exist
      • Conditions are syntactically valid
      • Defaults or terminal ends are present

Node Types

decision tree illustration

  • Message Node
    • Purpose: Display static or templated text.
    • Behavior: Always transitions to next.
  • Choice Node
    • Purpose: Branch based on explicit user selection.
    • Behavior: Matches user_input against declared choices; if none match, uses default.
  • Condition Node
    • Purpose: Branch based on evaluating conditions over context.
    • Behavior:
      • Evaluates conditions top-to-bottom.
      • First condition whose "if" expression evaluates to true wins.
      • If none match, uses default.
  • Action Node
    • Purpose: Emit a list of actions for the caller to execute (e.g. BLOCK_CARD).
    • Behavior: Returns actions plus next; caller performs side effects.
  • Terminal Node
    • Purpose: End of flow.
    • Behavior: No further traversal; returns a final resolution (e.g. FRAUD_CONFIRMED).

Single-Step Evaluation

decision tree illustration

The engine is step-based: each Evaluate call processes exactly one node.

  • Input:
    • tree_version
    • current_node
    • user_input (optional)
    • context
  • Output:
    • next_node
    • actions
    • updated_context
    • confidence
    • decision_path (includes the evaluated node)

The caller is responsible for:

  • Determining whether to call Evaluate again with current_node = next_node.
  • Persisting or propagating updated_context.
  • Executing any actions.

Traversal Semantics

  • Exactly one node is evaluated per step.
  • Exactly one next_node is chosen (or a terminal node ends the flow).
  • No parallel or branching executions; the flow is linear when viewed step by step.
  • Maximum traversal depth is enforced to prevent infinite loops.

gRPC Service

  • Service: decision_engine.DecisionEngineService
  • Key RPCs:
    • LoadTree: Load a JSON tree into memory.
    • Evaluate: Evaluate a single step of a tree.
    • HealthCheck: Simple liveness check.

The protobuf request for Evaluate includes:

  • tree_version
  • current_node
  • user_input
  • context (map of ContextValue)

The response contains:

  • next_node
  • actions
  • updated_context
  • confidence
  • decision_path

Embedded Library

The same engine can be used directly from C++ by linking against the core library:

  • Load trees from JSON strings.
  • Call Engine::evaluate with a request struct.
  • Inspect the DecisionResult (next node, actions, context, decision path).

This is useful for embedding the engine in other C++ services or for local testing without gRPC.

Build, Run, and Test

  • From repo root:
cd engine
mkdir build && cd build
cmake ..
make -j"$(nproc)"
  • C++ tests:
cd engine
./build_test.sh

Tests cover:

  • Tree loading and validation

  • Condition evaluation

  • Choice matching

  • Action emission

  • Tree reloading

  • gRPC server

cd engine
mkdir -p build && cd build
cmake ..
make -j"$(nproc)"
 
export GRPC_SERVER_ADDRESS=0.0.0.0:50051
export TREE_DIRECTORY=../trees
 
./decision_engine_server

You can then call the server using:

  • grpcurl (for ad-hoc tests)
  • Python client (test_grpc_client.py)
  • Other services via gRPC stubs.

Fraud Detection Tree (fraud_detection_v1):

The Tree

The Use Case:

  • Tree version: fraud_detection_v1
  • Goal: Model a realistic, enterprise-grade fraud detection and account security workflow.
  • Characteristics:
    • 70+ nodes
    • Multiple risk levels and branching paths
    • Rich set of actions for fraud operations and user security
    • Terminal Outcomes:
      • TRANSACTION_APPROVED
      • TRANSACTION_DENIED
      • FRAUD_CONFIRMED
      • FRAUD_PREVENTED
      • USER_CANCELLED
      • VERIFICATION_FAILED
      • CARD_REPLACED
      • INVESTIGATION_PENDING
      • MONITORING_ENABLED
      • ESCALATED_TO_FRAUD_TEAM
      • CONNECTED_TO_AGENT
      • ACCOUNT_RESTORED
      • PENDING_ACCOUNT_CLOSURE
      • VERIFICATION_PENDING

Each terminal node encodes a clear outcome that a downstream system can act on.

Fraud Detection Flow: End-to-End Scenarios

1. Scenario: Critical Risk, Large Transaction

  • Context example:
{
  "transaction_amount": 15000,
  "velocity_score": 0.9,
  "risk_score": 0.95
}
  • Typical behavior:
    • Initial condition node classifies the transaction as critical risk.
    • Engine routes to a branch that:
      • Immediately blocks the transaction.
      • Freezes the account.
      • Creates a fraud case.
    • Flow terminates with a high-severity resolution, such as FRAUD_CONFIRMED or FRAUD_PREVENTED.

2. Scenario: Device Mismatch and Location Anomaly

  • Context example:
{
  "device_fingerprint_mismatch": true,
  "location_anomaly": true,
  "risk_score": 0.75
}
  • Typical behavior:
    • Conditions detect mismatched device and unexpected location.
    • Engine routes to a branch requiring strong verification:
      • Transaction may be temporarily blocked.
      • 2FA or OTP is required.
    • Depending on verification outcome, the tree continues toward approval or a fraud terminal outcome.

3. Scenario: Moderate Risk, Needs Verification

  • Context example:
{
  "risk_score": 0.5,
  "behavior_consistent": false
}
  • Typical behavior:
    • Conditions classify the transaction as moderate or elevated risk.
    • Engine routes to a user verification path:
      • Confirmation prompts or security questions.
      • Possible temporary monitoring mode.
    • Terminal resolutions may include MONITORING_ENABLED or VERIFICATION_PENDING.

4 Scenario: Suspicious Merchant

  • Context example:
{
  "unusual_merchant_category": true,
  "merchant_fraud_reports": 10
}
  • Typical behavior:
    • Merchant-related conditions trigger a risky merchant path.
    • Engine may:
      • Block the transaction.
      • Flag or block the merchant.
      • Trigger refund flows.
    • Terminal outcome may be TRANSACTION_DENIED or FRAUD_PREVENTED, with actions like merchant blocking and refund initiation.

5. Scenario: Card Testing Pattern

  • Context example:
{
  "multiple_declined_attempts": 5,
  "velocity_score": 0.8
}
  • Typical behavior:
    • Tree detects repeated small attempts and high velocity.
    • Engine routes to a card testing branch that:
      • Blocks or replaces the card.
      • Alerts the fraud team.
    • Terminal outcome might be CARD_REPLACED or ESCALATED_TO_FRAUD_TEAM.

Running and Testing the Fraud Detection Flow

1. Loading the Fraud Tree via gRPC

With the server running:

cd /home/capybara/code/sde/risk-profile-engine/engine
 
grpcurl -plaintext -import-path src -proto src/decision_engine.proto \
  -d '{"tree_version": "fraud_detection_v1", "json_content": "'"$(cat trees/fraud_detection_v1.json | tr -d '\n' | sed 's/\"/\\\"/g')"'"}' \
  localhost:50051 decision_engine.DecisionEngineService/LoadTree

This loads the full fraud detection tree into the engine under tree_version = "fraud_detection_v1".

2. Evaluating Fraud Scenarios

  • Critical risk path example:
grpcurl -plaintext -import-path src -proto src/decision_engine.proto \
  -d '{
    "tree_version": "fraud_detection_v1",
    "current_node": "start",
    "context": {
      "transaction_amount": {"int_value": 15000},
      "velocity_score": {"double_value": 0.9},
      "risk_score": {"double_value": 0.95}
    }
  }' \
  localhost:50051 decision_engine.DecisionEngineService/Evaluate

Inspect the response to see:

  • next_node (for example, a high-severity fraud path node)
  • actions (block, freeze, create case)
  • decision_path (which fraud rules fired)

You can run similar requests for the other scenarios (location anomaly, moderate risk, suspicious merchant, card testing) by adjusting the context payload.

Results

Integration testing results on a complex fraud detection tree:

  • Total Nodes: 70+
  • Node Types: 5 (message, choice, condition, action, terminal)
  • Action Types: 35+ different actions
  • Terminal States: 14
  • Max Branching Factor: 4 choices
  • Average Path Length: 5-8 nodes
==========================================
Fraud Detection Workflow - Complex Tree Test
==========================================
 
Loading fraud_detection_v1 tree...
{
  "success": true
}
 
==========================================
Scenario Tests
==========================================
 
1. LOW RISK - Normal Transaction
   Context: risk_score=0.2
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}
 
2. MODERATE RISK - Needs Behavior Check
   Context: risk_score=0.55
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}
 
3. ELEVATED RISK - Manual Review Required
   Context: risk_score=0.75
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}
 
4. HIGH RISK - Device & Location Anomaly
   Context: device_fingerprint_mismatch=true, location_anomaly=true
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}
 
5. CRITICAL RISK - Large Transaction with High Velocity
   Context: transaction_amount=15000, velocity_score=0.9
---
{
  "nextNode": "initial_risk_assessment",
  "confidence": 1,
  "decisionPath": [
    "start"
  ]
}
 
6. CARD TESTING PATTERN DETECTED
   Context: multiple_declined_attempts=5
---
{
  "nextNode": "card_testing_suspected",
  "confidence": 1,
  "decisionPath": [
    "analyze_transaction_pattern"
  ]
}
 
7. SUSPICIOUS MERCHANT
   Context: unusual_merchant_category=true
---
{
  "nextNode": "investigate_merchant",
  "confidence": 1,
  "decisionPath": [
    "analyze_transaction_pattern"
  ]
}
 
8. TIME-BASED ANOMALY
   Context: transaction_time_anomaly=true
---
{
  "nextNode": "time_based_verification",
  "confidence": 1,
  "decisionPath": [
    "analyze_transaction_pattern"
  ]
}
 
9. NEW ACCOUNT + CRITICAL RISK
   Context: transaction_amount=20000, velocity_score=0.95, account_age_days=15
---
{
  "nextNode": "new_account_protocol",
  "confidence": 1,
  "decisionPath": [
    "check_account_history"
  ]
}
 
10. REPEAT OFFENDER
    Context: previous_fraud_cases=2
---
{
  "nextNode": "repeat_offender_protocol",
  "confidence": 1,
  "decisionPath": [
    "check_account_history"
  ]
}
 
==========================================
✓ All fraud detection scenarios tested!
==========================================

Conclusion

Implementation: Github

Overall it was a pretty fun implementation, aside from all the build errors lol. It is interesting how complex tree and flows are managed to deliver near instant analysis on data points.