Data Schema & Codebook: NGO SRM ROI Calculator

Version: 1.1
Status: Draft
Date: 2025-10-17
Estimated Reading Time: 25-30 minutes

Introduction
Entity Overview
Entity Schemas
Taxonomies and Codebooks
Validation Rules
CSV/XLSX Import Templates
Data Provenance Requirements
Version Control and Updates
Error Handling and Recovery
References

1. Introduction

1.1 Purpose

This Data Schema & Codebook provides authoritative documentation for all data entities, fields, validation rules, and accepted taxonomies used in the NGO Security Risk Management ROI Calculator. Target Audience: NGO practitioners preparing data for import, data analysts validating datasets, pilot facilitators supporting NGO data preparation, and software developers implementing or auditing the calculator.

1.2 Scope

This document covers:

Complete field-by-field specifications for all five core entities
Data types, units of measurement, required/optional status, and allowed ranges
Accepted values for categorical fields (incident types, cost categories, qualitative criteria)
Validation rules with exact constraints and error messages
Example records aligned with synthetic dataset
Cross-references to CSV/XLSX import templates

This document does NOT cover:

Calculation methodologies (see Methods Note)
Data collection procedures (see Pilot Pack & Data Readiness Guide)
Software implementation details (see source code documentation)

1.3 Document Conventions

Field Specifications

Field Specifications:

Field Name: Technical name used in CSV/XLSX templates and API
Data Type: string, number, decimal, integer, boolean, enum
Unit: Measurement unit (e.g., USD, years, percentage)
Required/Optional: Whether field must be provided
Allowed Range: Valid values (e.g., 0.00-1.00, ≥0, 1-10)
Validation Rule: Specific constraint with error message
Example Value: Sample data from synthetic dataset

Notation

Notation:

≥0 means “greater than or equal to zero”
0.00-1.00 means “decimal value from 0 to 1 inclusive”
[value1, value2, ...] means “one of the listed values”

2. Entity Overview

2.1 Core Entities

The calculator uses eight core data entities supporting both simplified and advanced workflows:

Entity	Description	Cardinality	Purpose
Incident	Security incident type with frequency and cost	0 to N per Scenario	Quantify baseline risk (advanced mode only)
HistoricalCostBaseline	Historical incident cost data	0 or 1 per Scenario	Simplified baseline EAL calculation
ScenarioPreset	Evidence-based baseline estimates	0 or 1 per Scenario	Fallback baseline when historical data unavailable
ReductionRateSelection	Expected risk reduction from investment	1 per Scenario	Define intervention effectiveness
Cost	Security risk management investment cost	1 to N per Scenario	Calculate NPV of SRM investment
Assumptions	Temporal parameters	1 per Scenario	Provide time horizon (discounting fixed at 0%)
QualitativeModel	Qualitative benefits model	0 or 1 per Scenario	Value non-financial outcomes (access, continuity, acceptance, wellbeing)
Scenario	Complete risk/cost analysis scenario	1 to N per User	Bundle inputs for ROI calculation

2.2 Entity Relationships

Scenario (1)
  ├── Baseline Mode Selection (1)
  │   ├── HistoricalCostBaseline (0..1) [if historical]
  │   ├── ScenarioPreset (0..1) [if scenario]
  │   └── Incidents (0..N) [if advanced]
  ├── ReductionRateSelection (1)
  ├── Costs (1..N)
  ├── Assumptions (1)
  └── QualitativeModel (0..1)

Interpretation:

Each Scenario selects one baseline mode (historical, scenario, or advanced)
HistoricalCostBaseline provides simplified input for organizations with cost data
ScenarioPreset provides evidence-based estimates when historical data unavailable
Incidents provide granular modeling for advanced users
ReductionRateSelection defines expected risk reduction from investments
Multiple Scenarios can be created for comparison (baseline vs. intervention scenarios)

2.3 Data Flow

Input

Input: User provides baseline data (historical costs, scenario preset, or incidents), reduction rate selection, costs, assumptions, and optionally qualitative model

Validation

Validation: Each field validated against schema constraints (see Section 5)

Calculation

Calculation: Calculator processes validated data to produce Results using historical baseline methodology

Output

Output: ROI%, NPV $, EAL$ , Payback Period (years), and breakdown details

3. Entity Schemas

3.1 Incident Entity

Purpose: Represents a security incident type with expected frequency (ARO) and financial impact (SLE).

Schema

Field Name	Data Type	Unit	Required	Allowed Range	Validation Rule	Example Value
`incidentType`	string	-	Required	Min 1 character	”Incident type is required"	"Data Breach (Phishing)“
`aro`	decimal	frequency per year	Required	≥ 0	”ARO must be at least 0”	0.30
`sle`	number	USD	Required	≥0	”SLE must be non-negative”	45000
`notes`	string	-	Optional	-	-	“Loss of beneficiary data, regulatory fines”
`source`	string	-	Optional	-	-	“Historical incident reports”

Field Descriptions

incidentType (string, required)

Description: Name or category of the security incident
Accepted Values: See Section 4.1 for taxonomy
Examples: “Data Breach (Phishing)”, “Vehicle Theft”, “Staff Kidnapping”, “Office Burglary”
Guidance: Use specific, descriptive names to differentiate incident types

aro (decimal, required)

Description: Annualized Rate of Occurrence – expected number of times the incident occurs per year (0 or greater; values above 1 indicate multiple occurrences within the same year)
Interpretation:
- 0.00 = Incident will not occur (risk eliminated)
- 0.25 = Expected once every 4 years
- 0.50 = Expected once every 2 years
- 1.00 = Expected once per year
- 12.00 = Expected monthly (12 times per year)
Data Source: Historical incident count ÷ Number of years observed
Example: 3 vehicle thefts over 10 years → ARO = 0.30

sle (number, required)

Description: Single Loss Expectancy - expected cost in USD if the incident occurs once
Components:
- Direct costs: Property damage, theft, medical expenses, ransom
- Indirect costs: Downtime, productivity loss, legal fees, reputation damage
Data Source: Historical average cost per incident, or industry benchmarks for rare events
Example: Average data breach cost = $45,000 (IT recovery + regulatory fines + reputation damage)

notes (string, optional)

Description: Free-text field for additional context, assumptions, or data sources
Example: “Loss of beneficiary data, regulatory fines, reputation damage”

source (string, optional)

Description: Origin of ARO/SLE estimates (e.g., “Historical incident reports”, “Industry benchmarks”, “Expert judgment”)
Purpose: Data provenance for audit trail

Example Records

{
  "incidentType": "Data Breach (Phishing)",
  "aro": 0.30,
  "sle": 45000,
  "notes": "Loss of beneficiary data, regulatory fines, reputation damage",
  "source": "Historical incident reports"
}

{
  "incidentType": "Vehicle Theft",
  "aro": 0.15,
  "sle": 35000,
  "notes": "Loss of field vehicle and equipment",
  "source": "Regional security data"
}

{
  "incidentType": "Staff Kidnapping",
  "aro": 0.05,
  "sle": 250000,
  "notes": "Ransom, evacuation, medical, legal costs",
  "source": "Industry benchmarks"
}

3.2 Historical Cost Baseline Entity

Purpose: Simplified input for organizations with historical incident cost data. Replaces incident-level ARO × SLE modeling with actual historical spend. Mirrors the HistoricalCostInput TypeScript interface (year1Cost, year2Cost?, year3Cost?).

Schema

Field Name	Data Type	Unit	Required	Allowed Range	Validation Rule	Example Value
`year1Cost`	number	USD	Required	≥0	”Most recent year cost is required”	50000
`year2Cost`	number	USD	Optional	≥0	”Cost must be non-negative”	45000
`year3Cost`	number	USD	Optional	≥0	”Cost must be non-negative”	55000

Field Descriptions

year1Cost (number, required)

Description: Total incident costs for the most recent year
Components: Direct costs (evacuation, medical, equipment replacement) + Indirect costs (program delays, reputational response)
Data Source: Financial records, incident reports, insurance claims
Example: 50000 (represents $50,000 in incident costs for 2024)

year2Cost (number, optional)

Description: Total incident costs for the previous year
Purpose: Improves baseline accuracy through multi-year averaging
Guidance: Include if available; single-year data is acceptable
Example: 45000 (represents $45,000 in incident costs for 2023)
Validation: Must be ≥0 when provided; leave blank (null) if data unavailable

year3Cost (number, optional)

Description: Total incident costs for two years ago
Purpose: Further improves baseline accuracy and handles anomalous years
Guidance: Include if available; 2-year average is acceptable
Example: 55000 (represents $55,000 in incident costs for 2022)
Validation: Must be ≥0 when provided; leave blank (null) if data unavailable

Data quality guidance

Prefer three complete years; two is acceptable when only two exist; single-year baselines should be flagged for follow-up.
If only partial-year data exist, annualise (e.g., quarterly cost × 4) and document the assumption in scenario notes.
These constraints align with the helper calculateHistoricalBaselineEAL and the methodological guidance in Methods Note §3.1.1.

Calculation Logic

The baseline Expected Annual Loss (EAL) is calculated as:

EAL_baseline = (year1Cost + year2Cost + year3Cost) / N

Where N is the number of years with valid data (1-3).

Example Records

{
  "year1Cost": 50000,
  "year2Cost": 45000,
  "year3Cost": 55000
}

{
  "year1Cost": 75000,
  "year2Cost": null,
  "year3Cost": null
}

{
  "year1Cost": 120000,
  "year2Cost": 95000,
  "year3Cost": null
}

3.3 Scenario Preset Entity

Purpose: Fallback option for organizations without historical incident cost data. Provides evidence-based baseline EAL estimates by operational context. Mirrors the ScenarioSelection interface (presetId, customBaseline?).

Schema

Field Name	Data Type	Unit	Required	Allowed Range	Validation Rule	Example Value
`presetId`	string	-	Required	[low-risk, medium-risk, high-risk]	“Preset ID is required"	"medium-risk”
`customBaseline`	number	USD	Optional	≥0	”Custom baseline must be non-negative”	90000

Field Descriptions

presetId (string, required)

Description: Identifier for the operational context preset
Accepted Values:
- low-risk: Low-Risk Stable Environment ($15,000 baseline EAL)
- medium-risk: Medium-Risk Conflict-Affected Context ($75,000 baseline EAL)
- high-risk: High-Risk Active Conflict Environment ($200,000 baseline EAL)
Data Source: GISF member survey results, Humanitarian Outcomes datasets, expert elicitation
Guidance: Select preset that best matches operational context

customBaseline (number, optional)

Description: Override the preset baseline EAL with organization-specific estimate
Purpose: Allows fine-tuning when preset doesn’t match local conditions
Guidance: Use when organization has partial cost data or context differs significantly
Example: 90000 (override medium-risk preset $75,000 with$ 90,000)
Validation: Must be ≥0 when provided. Leave blank when adopting preset defaults.

Example Records

{
  "presetId": "medium-risk",
  "customBaseline": null
}

{
  "presetId": "high-risk",
  "customBaseline": 250000
}

{
  "presetId": "low-risk",
  "customBaseline": null
}

3.4 Reduction Rate Selection Entity

Purpose: Captures the expected risk reduction from security investments. Uses evidence-based ranges by intervention type. Mirrors the ReductionRateSelection interface (interventionType, estimateType, customRate?) and associated enums InterventionType and ReductionEstimate.

Schema

Field Name	Data Type	Unit	Required	Allowed Range	Validation Rule	Example Value
`interventionType`	string	-	Required	[physical, training, technology, comprehensive]	“Intervention type is required"	"physical”
`estimateType`	string	-	Required	[conservative, moderate, optimistic]	“Estimate type is required"	"moderate”
`customRate`	decimal	-	Optional	0.00-0.95	”Custom rate must be between 0% and 95%“	0.35

Field Descriptions

interventionType (string, required)

Description: Category of security investment being evaluated
Accepted Values:
- physical: Physical Security (guarding, barriers, safe rooms)
- training: Training & Capacity Building (HEAT, protocols)
- technology: Technology & Monitoring (GPS, communications, monitoring)
- comprehensive: Comprehensive SRM Program (layered approach)
Data Source: GISF research, Humanitarian Outcomes datasets, practitioner interviews

estimateType (string, required)

Description: Confidence level for the reduction estimate
Accepted Values:
- conservative: Lower-bound estimate, minimal assumptions
- moderate: Mid-range estimate based on typical outcomes (recommended)
- optimistic: Upper-bound estimate, ideal conditions
Guidance: Choose based on implementation confidence and contextual fit

customRate (decimal, optional)

Description: Override evidence-based reduction rate with custom percentage
Range: 0.00 (0%) to 0.95 (95%)
Purpose: Allows fine-tuning when evidence ranges don’t match specific context
Guidance: Document rationale for custom rates in assumption log
Example: 0.35 (35% reduction)

Evidence-Based Reduction Rates

Intervention Type	Conservative	Moderate	Optimistic
Physical Security	20%	30%	50%
Training & Protocols	15%	25%	40%
Technology & Monitoring	25%	40%	60%
Comprehensive Programme	35%	50%	70%

Example Records

{
  "interventionType": "physical",
  "estimateType": "moderate",
  "customRate": null
}

{
  "interventionType": "comprehensive",
  "estimateType": "optimistic",
  "customRate": null
}

{
  "interventionType": "training",
  "estimateType": "conservative",
  "customRate": 0.18
}

3.5 Cost Entity

Purpose: Represents a security risk management investment cost incurred in a specific period.

Schema

Field Name	Data Type	Unit	Required	Allowed Range	Validation Rule	Example Value
`category`	string	-	Required	Min 1 character	”Category is required"	"Security Training (Staff)“
`amount`	number	USD	Required	≥0	”Amount must be non-negative”	8000
`period`	integer	year	Required	≥1	”Period must be at least 1”	1
`capexOpex`	enum	-	Optional	[capex, opex]	Must be “capex” or “opex"	"opex”

Field Descriptions

category (string, required)

Description: Name or category of the cost item
Accepted Values: See Section 4.2 for taxonomy
Examples: “Security Training (Staff)”, “Physical Security Upgrades”, “Cybersecurity Tools”, “Security Personnel”
Guidance: Use descriptive names that reflect the nature of the investment

amount (number, required)

Description: Cost amount in USD
Precision: Whole dollars (no cents required, but decimals accepted)
Example: 8000 (represents $8,000)

period (integer, required)

Description: Time period when the cost is incurred
Values: 1 = Year 1, 2 = Year 2, 3 = Year 3, etc.
Constraint: Must be ≥1 and ≤ Time Horizon Years (from Assumptions entity)
Example: 1 (cost incurred in first year)

capexOpex (enum, optional)

Description: Classification of cost as Capital Expenditure (CAPEX) or Operating Expenditure (OPEX)
Accepted Values:
- capex: Capital expenditure (one-time asset purchases: vehicles, equipment, facility upgrades)
- opex: Operating expenditure (recurring operational costs: salaries, training, maintenance)
Purpose: Financial reporting and budgeting (does not affect ROI calculation)
Default: If not provided, cost is assumed to be OPEX

Example Records

{
  "category": "Security Training (Staff)",
  "amount": 8000,
  "period": 1,
  "capexOpex": "opex"
}

{
  "category": "Physical Security Upgrades",
  "amount": 25000,
  "period": 1,
  "capexOpex": "capex"
}

{
  "category": "Annual Security Personnel",
  "amount": 32000,
  "period": 2,
  "capexOpex": "opex"
}

3.6 Assumptions Entity

Purpose: Provides financial and temporal parameters for NPV and ROI calculations.

Schema

Field Name	Data Type	Unit	Required	Allowed Range	Validation Rule	Example Value
`timeHorizonYears`	integer	years	Required	1-10	”Time horizon must be between 1 and 10 years”	3

Note: Discounting is fixed at 0% by design for transparency. See Methods Note Section 3.2 and 6.3.

Field Descriptions

timeHorizonYears (integer, required)

Description: Number of years over which costs and benefits are evaluated
Recommended Range: 3 to 5 years for most NGO security investments
Guidance: Match to asset lifespan (e.g., 5 years for vehicles, 10 years for facilities)
Example: 3 (three-year analysis period)

Example Record

{
  "timeHorizonYears": 3
}

3.3.1 Organization Context Fields (Optional)

These fields provide organisational scale context that supports helper features (e.g., SLE severity presets) and enriches qualitative evidence notes. They are never required for validation.

Field	Data Type	Unit	Allowed Range	Purpose	Example Value
`annualOperatingBudget`	number	USD	≥0 and ≤1,000,000,000	Anchors SLE severity presets to budget size and supports benchmarking	1_200_000
`annualSecurityBudget`	number	USD	≥0 and ≤100,000,000	Provides current SRM spend for comparison in reports	80_000
`staffCount`	integer	people (FTE)	1–100,000	Helps interpret qualitative statements (e.g., wellbeing reach)	45
`programmeDeliveryDays`	integer	days/year	1–365	Supplies operational tempo context for continuity narratives	240

3.7 QualitativeModel Entity

Purpose: Captures qualitative impact evidence for the four SRM dimensions (Access, Continuity, Acceptance, Wellbeing). The calculator aggregates these scores into a Qualitative Impact Index (QII).

Schema Overview

Component	Description	Required
`dimensions`	Object keyed by AE, OC, CA, SW containing weights, scores, and supporting notes	Required
`notes`	Free-text notes shown in exports/results (e.g., caveats, next steps)	Optional

3.4.1 Dimensions Object

Each dimension (AE, OC, CA, SW) shares the same structure:

Field	Data Type	Required	Description
`weight`	decimal	Required	Relative importance (normalised so AE+OC+CA+SW = 1.0; tolerance ±0.001)
`score`	integer	Required	0‒5 score auto-calculated from regression and checklist selections (see Methods Note Section 4.4)
`regression`	boolean	Optional	When `true`, forces score to 0 and improvements must be omitted
`improvements`	array[string]	Conditionally required	When regression is false/undefined, supply 0–4 checklist statement IDs (see Section 4.3 for scoring logic)
`evidenceNote`	string	Optional	Short narrative capturing “what changed” and data sources

Import note: Use the identifiers below when populating the improvements array. Provide an empty array ([]) to represent “no statements ticked” when regression is false and no improvements were recorded.

Dimension	Identifier	Checklist statement
AE	`ae-regained-priority-sites`	We regained access to at least one high-priority location that had been closed.
	`ae-sustained-access`	The regained access has been sustained for six months or more without emergency waivers.
	`ae-routine-coverage`	Most critical locations are now reachable on routine rotas with predictable approvals.
	`ae-additional-beneficiaries`	We reached additional communities or beneficiaries because of the improved access.
OC	`oc-fewer-suspensions`	We had fewer security-driven suspensions or closures than last year.
	`oc-on-time-resumption`	After incidents, teams resumed work within the planned timelines.
	`oc-contingency-held`	Contingency plans or backup sites kept essential services running during disruptions.
	`oc-deliverables-met`	We met donor/programme deliverables despite incidents, without accumulating large backlogs.
CA	`ca-complaints-drop`	Community complaints or incident reports linked to acceptance issues dropped noticeably.
	`ca-regular-forum`	We have a regular forum or liaison mechanism that community representatives attend.
	`ca-issues-resolved`	Community leaders helped resolve at least one security or access issue before it escalated.
	`ca-positive-feedback`	Recent feedback (surveys, partner letters, third-party reviews) points to strong support or trust.
SW	`sw-turnover-drop`	Security-related turnover, sick leave, or exceptional R&R requests decreased.
	`sw-uptake-support`	Uptake of wellbeing resources (counselling, peer support, check-ins) increased.
	`sw-regular-checkins`	Supervisors run routine wellbeing check-ins after incidents or high-stress periods.
	`sw-positive-survey`	Staff surveys or pulse checks report feeling safer and better supported.

Qualitative Factors

AE – Access to Environment: Ability to operate in insecure areas
OC – Operational Continuity: Reduced disruptions and faster recovery
CA – Community Acceptance: Strength and reliability of community relationships
SW – Staff Wellbeing: Wellbeing, stress, and retention of staff

3.8 Scenario Entity

Purpose: Bundles all inputs for a complete risk analysis scenario. Supports both simplified historical cost baseline and advanced incident-level modeling. Aligns with the master Scenario schema used in validation and exports.

Schema

Field Name	Data Type	Unit	Required	Allowed Range	Validation Rule	Example Value
`name`	string	-	Required	Min 1 character	”Scenario name is required"	"Historical Cost Baseline”
`baselineMode`	string	-	Required	[historical, scenario, advanced]	“Baseline mode is required"	"historical”
`historicalCosts`	object	-	Optional	-	Required if baselineMode is “historical”	HistoricalCostInput object
`scenarioSelection`	object	-	Optional	-	Required if baselineMode is “scenario”	ScenarioSelection object
`reductionSelection`	object	-	Required	-	Must conform to ReductionRateSelection schema	ReductionRateSelection object
`incidents`	array	-	Optional	≥0 incidents	Required if baselineMode is “advanced”	Array of incident objects
`costs`	array	-	Required	≥0 cost items	Baseline scenarios may leave this empty; interventions should list all relevant costs	Array of cost objects
`assumptions`	object	-	Required	-	Must conform to Assumptions schema	Object with timeHorizonYears
`qualitative`	object	-	Optional	-	Must conform to QualitativeModel schema if provided	Object with weights, checklist selections, evidence notes

Field Descriptions

name (string, required)

Description: Descriptive name for the scenario
Examples:
- “Historical Cost Baseline” (simplified approach)
- “Medium-Risk Scenario Preset” (fallback approach)
- “Advanced ARO × SLE Model” (granular approach)
Guidance: Use names that clearly identify the baseline methodology

baselineMode (string, required)

Description: Method for calculating baseline Expected Annual Loss
Accepted Values:
- historical: Use historical incident costs (recommended)
- scenario: Use evidence-based preset estimates
- advanced: Use incident-level ARO × SLE modeling
Guidance: Choose based on data availability and analysis requirements

historicalCosts (object, optional)

Description: HistoricalCostInput object (see Section 3.2)
Required: When baselineMode is “historical”
Purpose: Provides actual incident cost data for baseline EAL calculation
Guidance: Include 1-3 years of data; more years improve accuracy

scenarioSelection (object, optional)

Description: ScenarioSelection object (see Section 3.3)
Required: When baselineMode is “scenario”
Purpose: Provides evidence-based baseline EAL when historical data unavailable
Guidance: Select preset that best matches operational context

reductionSelection (object, required)

Description: ReductionRateSelection object (see Section 3.4)
Purpose: Defines expected risk reduction from security investment
Guidance: Choose intervention type and confidence level based on planned measures

incidents (array, optional)

Description: Array of Incident objects (see Section 3.1)
Required: When baselineMode is “advanced”
Purpose: Provides incident-level modeling for granular analysis
Guidance: Include all material incident types with reliable ARO/SLE estimates

costs (array, required)

Description: Array of Cost objects (see Section 3.5)
Minimum: 0 cost items (baseline scenarios have no SRM spend)
Purpose: Captures security investment costs for NPV calculation
Guidance: Include initial investment plus ongoing operational costs

assumptions (object, required)

Description: Single Assumptions object (see Section 3.6)
Required fields: timeHorizonYears
Purpose: Provides temporal parameters for financial calculations

qualitative (object, optional)

Description: Single QualitativeModel object (see Section 3.7)
Purpose: Captures non-financial benefits and outcomes
Guidance: Include for comprehensive ROI analysis including qualitative impacts

4. Taxonomies and Codebooks

4.1 Incident Type Taxonomy

Purpose: Standardized categories for security incidents to ensure consistency across organizations and enable benchmarking.

Accepted Values

Category	Subcategories	Examples
Physical Security	Theft, burglary, assault, kidnapping, armed attack, vehicle theft/accident, facility damage	”Vehicle Theft”, “Office Burglary”, “Armed Attack on Facility”
Cybersecurity	Phishing, ransomware, data breach, denial-of-service, malware, unauthorized access	”Data Breach (Phishing)”, “Ransomware Attack”, “Targeted Cyber Attack”
Health & Medical	Disease outbreak, medical emergency, pandemic, evacuation due to health crisis	”Disease Outbreak”, “Medical Emergency Evacuation”
Operational	Suspension of operations, relocation, project cancellation, access denial	”Operational Suspension”, “Access Denial to Program Area”
Political/Social	Civil unrest, protest, government restriction, expulsion, visa denial	”Civil Unrest Impact”, “Expulsion from Operating Area”

Taxonomy Crosswalk

Common NGO terminology mapped to calculator categories:

NGO Term	Calculator Category
”Break-in”	Office Burglary
”Carjacking”	Vehicle Theft
”Hacking incident”	Data Breach
”Crypto virus”	Ransomware Attack
”Abduction”	Staff Kidnapping
”Security incident causing evacuation”	Armed Attack on Facility
”Program shutdown due to threats”	Operational Suspension

Guidance:

Use specific, descriptive names (e.g., “Data Breach (Phishing)” rather than just “Breach”)
Combine incident mechanism and impact for clarity (e.g., “Ransomware Attack” not “Cyberattack”)
Document rare or organization-specific incidents in notes field

4.2 Cost Category Taxonomy

Purpose: Standardized categories for security risk management costs to support budgeting and financial reporting.

Accepted Values

Category	Type	Examples
Personnel	OPEX	”Security Personnel”, “Security Manager”, “Security Focal Point (part-time)“
Training	OPEX	”Security Training (Staff)”, “Security Awareness Workshop”, “First Aid Training”
Assessments	OPEX	”Security Assessment”, “Security Audit”, “Risk Assessment Consultancy”
Physical Security	CAPEX	”Physical Security Upgrades”, “Perimeter Fencing”, “CCTV Installation”, “Access Control System”
Cybersecurity	CAPEX/OPEX	”Cybersecurity Tools”, “Firewall Software”, “Antivirus Licenses”, “Security Operations Center (SOC)“
Vehicles	CAPEX	”Armored Vehicles”, “Security Escort Vehicles”, “GPS Tracking System”
Equipment	CAPEX	”Communications Equipment”, “Satellite Phones”, “Two-way Radios”
Insurance	OPEX	”Kidnap & Ransom Insurance”, “Security Insurance Premium”
Maintenance	OPEX	”Annual Security Maintenance”, “Equipment Servicing”, “Software License Renewal”

CAPEX vs. OPEX Guidance

Capital Expenditure (CAPEX):

One-time asset purchases with multi-year lifespans
Examples: Vehicles, facility upgrades, equipment, software licenses (multi-year)

Operating Expenditure (OPEX):

Recurring operational costs, typically annual
Examples: Salaries, training, maintenance, annual licenses, insurance

Note: Classification does not affect ROI calculation but supports financial reporting.

4.3 Qualitative Criteria Taxonomy

Purpose: Define the four standard qualitative factors for consistent evaluation across organizations.

Criteria Definitions

Criterion	Code	Definition	Example Improvements
Access to Environment	AE	Ability to operate in insecure or restricted areas	Restored access to conflict zones; reduced relocation frequency; expanded operational footprint
Operational Continuity	OC	Reduced disruptions and suspensions of programs	Fewer evacuations; reduced downtime; maintained program timelines despite security events
Community Acceptance	CA	Improved local perceptions and relationships	Enhanced community trust; improved local staff retention; better acceptance by local authorities
Staff Wellbeing	SW	Reduced stress, improved morale, retention (optional dimension)	Lower staff turnover; improved mental health indicators; increased willingness to accept field assignments

Guidance:

Organizations may customize weights to reflect strategic priorities
Weights must sum to 1.0 (e.g., AE=0.40, OC=0.30, CA=0.30 when SW is omitted, or AE=0.40, OC=0.30, CA=0.20, SW=0.10 if access is top priority)
Scores should reflect incremental improvement over baseline, not absolute performance

4.4 Scenario Preset Catalogue

Scenario presets provide evidence-based baseline EAL estimates used when historical cost data are unavailable. Each preset contains the following fields:

Field	Data Type	Description	Example
`id`	string	Unique identifier referenced by `presetId` in data entry	`medium-risk`
`name`	string	Human-readable label shown in the UI	”Medium-Risk Conflict-Affected Context”
`description`	string	Summary of the operational environment	”Mixed stability with sporadic violence, remote programming, and reliance on local partners.”
`baselineEAL`	number (USD)	Default Expected Annual Loss for the scenario	75,000
`range.min`	number (USD)	Lower bound of typical EAL range	45,000
`range.max`	number (USD)	Upper bound of typical EAL range	105,000
`characteristics[]`	string[]	Evidence-backed contextual markers	[“Recurring access negotiations required”, …]
`source`	string	Citation or data provenance	”GISF & Humanitarian Outcomes triangulated benchmarks (n=37 organisations)”

Current presets:

Preset ID	Name	Baseline EAL	Range (USD)	Key Characteristics	Evidence Source
`low-risk`	Low-Risk Stable Environment	$15,000	$10k–$ 25k	ACLED violence index lower quartile; strong state security presence; capital/peri-urban operations	GISF incident dataset 2020–2024 (n=23 organisations)
`medium-risk`	Medium-Risk Conflict-Affected Context	$75,000	$45k–$ 105k	Recurring access negotiations; fragile infrastructure; reliance on partners/mobile teams	GISF & Humanitarian Outcomes triangulated benchmarks (n=37 organisations)
`high-risk`	High-Risk Active Conflict Environment	$200,000	$120k–$ 280k	Armed actor presence; regular evacuations or hibernation; remote management with high turnover risk	GISF pilot partner interviews and AWSD case studies (n=18 organisations)

These presets are stored in SCENARIO_PRESETS (see apps/web/src/roi-calculator/constants/scenario-presets.ts) and can be overridden via the customBaseline field documented in Section 3.3.

4.5 Reduction Rate Catalogue

Reduction rates combine intervention classification (interventionType) with estimate confidence (estimateType). The table below mirrors REDUCTION_RATE_TABLE in source code and should be cited when selecting defaults.

Intervention Type (`interventionType`)	Conservative (`estimateType = conservative`)	Moderate (`estimateType = moderate`)	Optimistic (`estimateType = optimistic`)	Evidence Highlights
`physical` – Physical security (guarding, barriers, safe rooms)	20%	30%	50%	Humanitarian Outcomes incident review; GISF Field Security Handbook
`training` – Training & protocols (HEAT, staff induction, SOP rollouts)	15%	25%	40%	GISF 2024 member survey (n=32); ALNAP HEAT effectiveness briefs
`technology` – Technology & monitoring (tracking, communications, analytics)	25%	40%	60%	Humanitarian Outcomes 2023 AWSD trend analysis; NGO cybersecurity audits
`comprehensive` – Layered SRM programme (governance + training + tech)	35%	50%	70%	FAIR-adapted risk models; practitioner interviews during GISF pilots

Custom reduction rates (customRate) must remain between 0 and 0.95 and include a documented rationale in the scenario assumption log.

5. Validation Rules

5.1 Field-Level Validation

Entity	Field	Rule	Error Message
Incident	incidentType	Min length 1	”Incident type is required”
Incident	aro	aro ≥ 0	”ARO must be at least 0”
Incident	sle	sle ≥ 0	”SLE must be non-negative”
Cost	category	Min length 1	”Category is required”
Cost	amount	amount ≥ 0	”Amount must be non-negative”
Cost	period	Integer, period ≥ 1	”Period must be at least 1”
Cost	capexOpex	In [capex, opex] if provided	”Must be ‘capex’ or ‘opex‘“
Assumptions	timeHorizonYears	Integer, 1 ≤ years ≤ 10	”Time horizon must be between 1 and 10 years”
QualitativeModel	dimensions.*.weight	AE+OC+CA+SW = 1.0 (±0.001)	“Weights must sum to 1”
QualitativeModel	dimensions.*.score	Integer, 0 ≤ score ≤ 5	”Score must be between 0 and 5”
QualitativeModel	dimensions.*.improvements	Array length ≤ 4	”Select up to four statements”
QualitativeModel	dimensions.*.indicator.baselineValue/reportingValue/changeValue	Numeric if provided	”Indicator values must be numeric”
Scenario	name	Min length 1	”Scenario name is required”
Scenario	incidents	Array length ≥ 1	”At least one incident is required”
Scenario	costs	Array length ≥ 0	Baseline scenarios may not include cost entries; interventions should list all SRM investments

5.2 Cross-Field Validation

Rule	Description	Error Message
Cost period ≤ Time horizon	Each cost period must not exceed `timeHorizonYears`	”Cost period X exceeds time horizon Y years”
Qualitative weights sum to 1	`dim.AE.weight + dim.OC.weight + dim.CA.weight + dim.SW.weight = 1.0` (±0.001)	“Weights must sum to 1.0; current sum: X”
Qualitative improvements required	If regression is false/undefined, provide an improvements array (0–4 IDs)	“Select the statements that apply (or mark the regression checkbox).”
Qualitative regression logic	If `regression = true`, score must be 0 and improvements omitted	”Regression selected: score must be 0 and improvement statements cleared”
Qualitative checklist mapping	If regression is false, score must equal `tickCount + 1` (0 ticks = score 1)	“Score does not match number of statements selected”

5.3 Business Logic Validation

Rule	Description	Warning/Error
ARO > 0.8	Very high ARO may indicate chronic issue rather than discrete risk	Warning: “ARO > 0.8 for incident type; confirm the frequency estimate and consider budgeting it as an operational cost if extremely high.”
SLE > $1,000,000	Exceptionally high single loss expectancy	Warning: “SLE > $1M for incident type; verify cost estimate”
Zero incidents	Scenario has no incidents (EAL = 0)	Warning: “No incidents defined; ROI may be negative or undefined”
Time horizon < 3 years	Short analysis period for capital investments	Warning: “Time horizon < 3 years may undervalue long-term investments”

Note: Warnings do not prevent calculation but alert users to review assumptions.

6. CSV/XLSX Import Templates

6.1 Incidents Template

File: public/templates/incidents-template.csv Format:

incidentType,aro,sle,notes,source
Data Breach (Phishing),0.3,45000,Loss of beneficiary data,Historical incidents
Vehicle Theft,0.15,35000,Loss of field vehicle,Regional security data
Office Burglary,0.25,12000,Theft of equipment,Historical incidents

Field Order:

incidentType (string, required)
aro (decimal, required)
sle (number, required)
notes (string, optional - can be empty)
source (string, optional - can be empty)

CSV Rules:

Header row is required
Decimal separator: period (.) not comma
No thousands separators in numbers (use 45000 not 45,000)
Text fields with commas must be quoted (e.g., "Loss of vehicle, equipment")
Empty optional fields: leave blank (e.g., ...,,,)

6.2 Costs Template

File: public/templates/costs-template.csv Format:

category,amount,period,capexOpex
Security Training (Staff),8000,1,opex
Physical Security Upgrades,25000,1,capex
Cybersecurity Tools,12000,1,capex
Annual Security Personnel,30000,1,opex

Field Order:

category (string, required)
amount (number, required)
period (integer, required)
capexOpex (enum [capex, opex], optional)

CSV Rules:

Same formatting rules as incidents template
period must be integer (1, 2, 3, etc.)
capexOpex must be lowercase: capex or opex

6.3 Template-Schema Alignment

Validation:

CSV column headers must match schema field names exactly (case-sensitive)
Column order can vary (parser uses header names, not position)
Extra columns are ignored (allows for organization-specific metadata)
Missing required columns trigger validation error

Import Process:

Upload CSV/XLSX file
Parser validates header row (required columns present)
Parser validates each data row against schema (Section 5.1)
Parser displays validation errors with row numbers
User corrects errors and re-uploads
Valid data imported into calculator

7. Data Provenance Requirements

7.1 Minimum Data Quality Standards

Incident Data:

Recommended Historical Window: ≥12 months (preferably 24-36 months)
Data Source: Historical incident reports, security logs, insurance claims
ARO Calculation: Number of incidents / Number of years observed
SLE Calculation: Total cost of incidents / Number of incidents (average)

Cost Data:

Recommended Coverage: Initial investment + at least 2 years of ongoing costs
Data Source: Budget documents, procurement records, HR salary data
Precision: Whole dollars acceptable (no cents required)

Qualitative Scores:

Data Source: Organizational consensus (security committee, staff surveys, expert judgment)
Documentation: Document rationale for each score in pilot notes or scenario descriptions

7.2 Source Documentation

Recommended Practice:

Populate source field for each Incident (e.g., “Historical incident reports 2020-2022”)
Document cost data sources in scenario notes (e.g., “2023 Budget, Line Item 4.2.1”)
Maintain audit trail linking calculator inputs to source documents

Audit Trail:

Calculator exports include all input data with source fields
Version control tracks changes to scenarios over time
Pilot Pack provides templates for data source documentation

7.3 Data Update Frequency

Recommended Review Cycle:

Quarterly: Update incident data if new incidents occur
Annually: Refresh ARO/SLE estimates based on rolling 3-year historical window
Ad-hoc: Update immediately if major context shifts (e.g., conflict escalation, pandemic)

8. Version Control and Updates

8.1 Schema Versioning

Current Schema Version: 1.0 (2025-10-10) Versioning Policy:

Major version (X.0): Breaking changes (field removed, validation rule tightened)
Minor version (X.Y): Additive changes (new optional field, new taxonomy value)
Patch version (X.Y.Z): Clarifications, typo fixes, no schema impact

8.2 Backward Compatibility

Policy:

Minor and patch updates maintain backward compatibility with previous versions
Major updates provide migration scripts and grace period for data updates
Calculator validates schema version on import and alerts users to version mismatches

8.3 Taxonomy Evolution

Adding New Values:

New incident types, cost categories, or qualitative criteria can be added without schema changes
Organizations may use custom values not in standard taxonomy (flexibility by design)

Deprecating Values:

Deprecated taxonomy values remain valid for existing data but discouraged for new scenarios
Deprecation notices appear in validation warnings (e.g., “This category is deprecated; use alternative category”)

9. Error Handling and Recovery

9.1 Common Import Errors

Error	Cause	Solution
”ARO must be at least 0”	ARO value < 0	Verify ARO is entered as a non-negative decimal (e.g., 0.30 not -0.30). If the incident happens multiple times per year, use a value greater than 1 (e.g., 12 for monthly).
”Period must be at least 1”	Period value is 0 or negative	Change period to 1, 2, 3, etc. (positive integers)
“Weights must sum to 1”	Qualitative weights don’t sum to 1.0	Adjust weights: e.g., AE=0.30, OC=0.25, CA=0.25, SW=0.20
”At least one incident is required”	Empty incidents array	Add at least one incident to the scenario
”Category is required”	Cost category field is blank	Provide a category name for each cost item

9.2 Data Validation Workflow

Step-by-Step Recovery:

Import Attempt: User uploads CSV/XLSX file
Validation: Calculator checks all validation rules (Section 5)
Error Report: Display errors with row/column numbers and error messages
User Correction: User edits CSV/XLSX file to fix errors
Re-import: User uploads corrected file
Success: Data imported and ready for calculation

Error Report Example:

Validation Errors Found (3 errors):

Row 2, Field 'aro': ARO must be at least 0 (found: -0.2)
Row 4, Field 'category': Category is required (empty field)
Row 7, Field 'period': Cost period 5 exceeds time horizon 3 years

Please correct these errors and re-upload the file.

9.3 Troubleshooting Guide

Issue: “My CSV file won’t import”

Check: File encoding (UTF-8 required)
Check: Delimiter (comma , required, not semicolon ;)
Check: Header row matches schema field names (case-sensitive)

Issue: “Calculator shows ‘No payback within time horizon’”

Explanation: Annual benefits (EAL) don’t recover costs within analysis period
Solution: Extend time horizon, or recognize that qualitative benefits justify investment

Issue: “ROI is negative”

Explanation: Total costs exceed total benefits under current assumptions
Solution: Review incident assumptions, time horizon, social value proxies, and qualitative scores; consider non-financial justifications

10. References

Methods Note
specs/001-develop-a-user/methods-note.md
[Comprehensive methodological framework for all calculation formulas] Pilot Pack & Data Readiness Guide
specs/001-develop-a-user/pilot-pack.md
[Step-by-step guidance for NGO data preparation and pilot facilitation] Validation Report
specs/002-close-rfq-driven/validation/validation-report.md
[Verification of formula-implementation alignment and edge case testing]

10.2 Implementation References

Zod Validation Schemas
src/shared/validation/schemas.ts
[TypeScript/Zod implementation of all validation rules] Calculation Service
src/roi-calculator/services/calculation-service.ts
[Calculation implementation using validated data] CSV Templates
public/templates/incidents-template.csv
public/templates/costs-template.csv
[Import templates with example data]

10.3 Standards

ISO 31000:2018 - Risk Management Guidelines
[Risk assessment and quantification principles] Risk Quantification References (EAL)
FAIR Institute (2020). Factor Analysis of Information Risk (FAIR) — Introduction to Risk Quantification (EAL formula reference; full framework not implemented).
Available: https://www.fairinstitute.org/

Document Control

Version: 1.0
Status: Draft
Date: 2025-10-10
Next Review: 2026-01-10 (quarterly review cycle) Change Log:

Date	Version	Changes	Author
2025-10-17	1.1	Documented optional organisation context fields, removed retired qualitative aggregation metadata, and aligned qualitative schema with required dimensions.	Project Team
2025-10-10	1.0	Initial release - comprehensive data schema and codebook	Shayan Seyedi

Approval Sign-Off:

✅ Technical Lead - Schema-implementation alignment verified
⏳ Product Owner - RFQ Stage 2 requirements fulfilled (pending review)
⏳ GISF Stakeholder - Data preparation guidance confirmed (pending review)

End of Data Schema & Codebook For calculation methodologies, see the Methods Note. For data preparation procedures, see the Pilot Pack & Data Readiness Guide.

Start Here

Methodology & Data

Implementation Guides

Future Features (P2)

Validation & Quality

​Data Schema & Codebook: NGO SRM ROI Calculator

​Table of Contents

​1. Introduction

​1.1 Purpose

​1.2 Scope

​1.3 Document Conventions

​2. Entity Overview

​2.1 Core Entities

​2.2 Entity Relationships

​2.3 Data Flow

​3. Entity Schemas

​3.1 Incident Entity

​Schema

​Field Descriptions

​Example Records

​3.2 Historical Cost Baseline Entity

​Schema

​Field Descriptions

​Calculation Logic

​Example Records

​3.3 Scenario Preset Entity

​Schema

​Field Descriptions

​Example Records

​3.4 Reduction Rate Selection Entity

​Schema

​Field Descriptions

​Evidence-Based Reduction Rates

​Example Records

​3.5 Cost Entity

​Schema

​Field Descriptions

​Example Records

​3.6 Assumptions Entity

​Schema

​Field Descriptions

​Example Record

​3.3.1 Organization Context Fields (Optional)

​3.7 QualitativeModel Entity

​Schema Overview

​3.4.1 Dimensions Object

​3.8 Scenario Entity

​Schema

​Field Descriptions

​4. Taxonomies and Codebooks

​4.1 Incident Type Taxonomy

​Accepted Values

​Taxonomy Crosswalk

​4.2 Cost Category Taxonomy

​Accepted Values

​CAPEX vs. OPEX Guidance

​4.3 Qualitative Criteria Taxonomy

​Criteria Definitions

​4.4 Scenario Preset Catalogue

​4.5 Reduction Rate Catalogue

​5. Validation Rules

​5.1 Field-Level Validation

​5.2 Cross-Field Validation

​5.3 Business Logic Validation

​6. CSV/XLSX Import Templates

​6.1 Incidents Template

​6.2 Costs Template

​6.3 Template-Schema Alignment

​7. Data Provenance Requirements

​7.1 Minimum Data Quality Standards

​7.2 Source Documentation

​7.3 Data Update Frequency

​8. Version Control and Updates

​8.1 Schema Versioning

​8.2 Backward Compatibility

​8.3 Taxonomy Evolution

​9. Error Handling and Recovery

​9.1 Common Import Errors

​9.2 Data Validation Workflow

​9.3 Troubleshooting Guide

Data Schema & Codebook: NGO SRM ROI Calculator

Table of Contents

1. Introduction

1.1 Purpose

1.2 Scope

1.3 Document Conventions

2. Entity Overview

2.1 Core Entities

2.2 Entity Relationships

2.3 Data Flow

3. Entity Schemas

3.1 Incident Entity

Schema

Field Descriptions

Example Records

3.2 Historical Cost Baseline Entity

Schema

Field Descriptions

Calculation Logic

Example Records

3.3 Scenario Preset Entity

Schema

Field Descriptions

Example Records

3.4 Reduction Rate Selection Entity

Schema

Field Descriptions

Evidence-Based Reduction Rates

Example Records

3.5 Cost Entity

Schema

Field Descriptions

Example Records

3.6 Assumptions Entity

Schema

Field Descriptions

Example Record

3.3.1 Organization Context Fields (Optional)

3.7 QualitativeModel Entity

Schema Overview

3.4.1 Dimensions Object

3.8 Scenario Entity

Schema

Field Descriptions

4. Taxonomies and Codebooks

4.1 Incident Type Taxonomy

Accepted Values

Taxonomy Crosswalk

4.2 Cost Category Taxonomy

Accepted Values

CAPEX vs. OPEX Guidance

4.3 Qualitative Criteria Taxonomy

Criteria Definitions

4.4 Scenario Preset Catalogue

4.5 Reduction Rate Catalogue

5. Validation Rules

5.1 Field-Level Validation

5.2 Cross-Field Validation

5.3 Business Logic Validation

6. CSV/XLSX Import Templates

6.1 Incidents Template

6.2 Costs Template

6.3 Template-Schema Alignment

7. Data Provenance Requirements

7.1 Minimum Data Quality Standards

7.2 Source Documentation

7.3 Data Update Frequency

8. Version Control and Updates

8.1 Schema Versioning

8.2 Backward Compatibility

8.3 Taxonomy Evolution

9. Error Handling and Recovery

9.1 Common Import Errors

9.2 Data Validation Workflow

9.3 Troubleshooting Guide

10. References

10.1 Related Documentation

10.2 Implementation References

10.3 Standards

Document Control