Skip to navigation
Skip to content

Stages & jobs

Build
- Build oclapi2
Test
- Check formatting
- Run tests
Deploy for testing
Requires a user to start manually
- Deploy for testing
Release
Requires a user to start manually
- Release

Build result summary

Details

Completed: 24 Jul 2025, 3:12:44 PM – 5 months ago
Queue duration: < 1 second
Duration: 128 minutes
Labels: None

Revisions

OCL API 2: a55e3ce3b031f90c5fe14b07d43e3f6b6cff6ba9
OCL CI: 6544e7ed40ead274ba8b0995efe09bb9edfd8319

Fixed in

#182 (Manual run from the stage: Deploy for testing by Sunny Aggarwal.)

No failed test found. A possible compilation error occurred.

Responsible

Michaël Bontyes Automatically assigned

Code commits

OCL API 2
Author		Commit	Message	Commit date
	Michaël Bontyes	a55e3ce3b031f90c5fe14b07d43e3f6b6cff6ba9	Merge branch 'dev' of https://github.com/OpenConceptLab/oclapi2 into dev	24 Jul 2025
	Michaël Bontyes	135e4d3b007e650c2ca60fd5dfae29a1a181fdb3 m	ES score normalization ## PR: Normalize and Standardize Elasticsearch Search Scores ### Problem / What Was Missing - Inconsistent Scoring: Elasticsearch’s raw `_score` is not normalized and varies widely between queries, indices, and even similar queries. This caused: - Difficulty comparing scores across different queries or result sets. - Unstable thresholds for “high confidence” or “best match.” - Confusing or misleading confidence displays for users. - Downstream Usage Issues: - Confidence buckets and thresholds were based on the raw `_score`, which is not absolute. - The API and UI only exposed the raw score, not a normalized or percentage-based value. --- ### What Was Implemented #### 1. Score Normalization - Min-Max Normalization: For each search result set, the code now computes the minimum and maximum `_score` values. Each result’s score is normalized to a 0–1 range: ``` normalized_score = (raw_score - min_score) / (max_score - min_score) ``` - Handles edge cases where all scores are the same. #### 2. API/Serializer Enhancements - Expose Both Scores: The API now returns both the raw `_score` and the normalized score (`search_score` and `search_normalized_score`). - Confidence Calculation: The `search_confidence` field is now based on the normalized score, providing a consistent percentage (e.g., “87.5%”) regardless of the raw score range. #### 3. Downstream Logic Updates - Thresholds and Buckets: All logic for “high confidence,” “very high match,” and bucketing now uses the normalized score, so thresholds are stable (e.g., 0.8 always means “top 20%”). - Legacy Fallback: If a normalized score is not available, the code falls back to the old raw score logic. #### 4. Documentation in Code - Comments and Structure: The code is now clear about which score is being used and why, making it easier for future maintainers to understand the normalization process. --- ### Why This Makes Scores More Consistent - Stable Range: All scores are now in a 0–1 range, so thresholds and confidence levels are meaningful and comparable across queries. - User-Friendly Confidence: Users and downstream consumers can interpret confidence as a percentage, not an arbitrary number. - Easier Tuning: Product and engineering teams can set thresholds (e.g., “show only results with confidence > 70%”) without worrying about the quirks of Elasticsearch’s raw scoring. - Future-Proof: If the underlying Elasticsearch configuration changes, the normalization ensures the API and UI remain stable. --- ### Summary Table \| Field \| Before (raw) \| After (normalized) \| \|---------------------------\|--------------\|--------------------\| \| `search_score` \| Raw float \| Raw float \| \| `search_normalized_score` \| N/A \| 0–1 float \| \| `search_confidence` \| % of max raw \| % of normalized \| \| Thresholds/Buckets \| Raw-based \| Normalized-based \| --- In summary: This PR makes search scoring more robust, interpretable, and consistent for all users and downstream systems.	16 Jul 2025

Jira issues

Issue		Description	Status