Skip to content

Comments

Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755

Open
Mpdreamz wants to merge 16 commits intomainfrom
feature/ingest-rearch
Open

Ingestion re-implement on updated Elastic.Ingest.Elasticsearch#2755
Mpdreamz wants to merge 16 commits intomainfrom
feature/ingest-rearch

Conversation

@Mpdreamz
Copy link
Member

@Mpdreamz Mpdreamz commented Feb 22, 2026

Summary

Migrate Elasticsearch indexing to source-generated mappings via Elastic.Mapping and the IncrementalSyncOrchestrator from Elastic.Ingest.Elasticsearch, replacing ~420 lines of hand-rolled ingest channel code. Introduces a clear separation between build type ({type}) and environment ({env}) in all index naming.

Key changes

  • Source-generated mapping contextDocumentationMappingConfig.cs declares index structure, field mappings, and analysis settings using [Entity<T>] attributes. The source generator produces a typed CreateContext(type:, env:) factory, eliminating manual index name construction.
  • IncrementalSyncOrchestrator replaces the two manually managed ElasticsearchLexicalIngestChannel / ElasticsearchSemanticIngestChannel classes. Dual-index writes, alias rotation, and hash-based change detection are now handled by the library.
  • Centralized endpoint resolutionElasticsearchEndpointFactory resolves Elasticsearch URL, credentials, and environment from user secrets and env vars in one place, shared by all services.
  • Correct template parameter semantics — callers now pass the build type (assembler, isolated, codex) as {type}, while {env} auto-resolves from the ENVIRONMENT env var. Previously {type} was incorrectly receiving the environment name.
  • Jina v5 dense embeddings added alongside existing ELSER sparse embeddings on the semantic index variant.

Elasticsearch resource naming changes

All resources now follow a structured docs-{type}.{variant}-{env} convention:

Resource Old name (e.g. env=edge) New name (e.g. type=assembler, env=edge)
Lexical backing index lexical-docs-edge-2025.10.23.120521 docs-assembler.lexical-edge-2025.10.23.120521
Lexical write alias lexical-docs-edge-latest docs-assembler.lexical-edge-latest
Lexical read alias lexical-docs-edge docs-assembler.lexical-edge-latest
Lexical index template lexical-docs-edge-template docs-assembler.lexical-edge-template
Semantic backing index semantic-docs-edge-2025.10.23.120521 docs-assembler.semantic-edge-2025.10.23.120521
Semantic write alias semantic-docs-edge-latest docs-assembler.semantic-edge-latest
Semantic read alias semantic-docs-edge docs-assembler.semantic-edge-latest
Semantic index template semantic-docs-edge-template docs-assembler.semantic-edge-template
Synonym set docs-edge docs-assembler
Query ruleset docs-ruleset-edge docs-ruleset-assembler

The old {variant}-docs-{env} pattern placed the variant (lexical/semantic) first and embedded only the environment. The new docs-{type}.{variant}-{env} pattern groups all docs indices under a common docs-* prefix, encodes the build type, and uses a dot-separated structure for ILM/SLM grouping.

Library version bumps

Elastic.Ingest.Elasticsearch 0.17.1 → 0.28.0 and Elastic.Mapping (new) 0.28.0:

  • IncrementalSyncOrchestrator<T> — manages dual-index (primary + secondary) writes with coordinated alias rotation, replacing two bespoke channel wrappers and manual multiplex/reindex strategy logic.
  • Source-generated [ElasticsearchMappingContext] — generates typed mapping builders, field configurators, and CreateContext() from [Entity<T>] attributes, eliminating runtime reflection and hand-built JSON mapping strings.
  • HashedBulkUpdate — content-hash-based deduplication built into the channel, replacing manual hash computation for change detection.
  • BootstrapMethod / PreBootstrapTask — declarative bootstrap lifecycle hooks replace imperative init sequences for synonyms, query rules, and enrichment policies.
  • ConfigureAnalysis / IndexSettings on ElasticsearchTypeContext — analysis and settings are composed at context creation rather than injected via callback overrides.

Net effect: deleted ElasticsearchIngestChannel.cs (161 lines) and ElasticsearchIngestChannel.Mapping.cs (260 lines), and simplified the exporter constructor and lifecycle significantly.

Test plan

  • dotnet build passes (verified)
  • Integration tests pass against local Elasticsearch (Aspire)
  • Verify index names resolve correctly: assembler in "edge" env → docs-assembler.semantic-edge-latest
  • Verify synonym set created as docs-assembler (not docs-edge)
  • Verify search API resolves to docs-assembler.semantic-* read alias
  • Verify old DOCUMENTATION_ELASTIC_INDEX env var (e.g. lexical-docs-edge-2025.10.23.120521) correctly parses environment as edge

…mappings

Replace manual channel orchestration with IncrementalSyncOrchestrator<T> and
source-generated ElasticsearchTypeContext from Elastic.Mapping 0.4.0. Add field
type attributes ([Keyword], [Text], [Object], etc.) directly on DocumentationDocument
to drive the mapping source generator, replacing verbose manual JSON mappings.

- Update Elastic.Ingest.Elasticsearch 0.17.1 → 0.19.0, add Elastic.Mapping 0.4.0
- Add mapping attributes to DocumentationDocument and IndexedProduct
- Create DocumentationMappingConfig.cs with two Entity variants (lexical/semantic)
- Rewrite ElasticsearchMarkdownExporter to use orchestrator for dual-index mode
- Delete ElasticsearchIngestChannel.cs and ElasticsearchIngestChannel.Mapping.cs
- Remove unused ReindexAsync from ElasticsearchOperations
- Update SearchBootstrapFixture to use IngestChannel with semantic type context
Replaces `ElasticsearchOptions` with `DocumentationEndpoints` as the single source of truth for
Elasticsearch configuration across all API apps, MCP server, and integration tests.

- Adds `IndexName` property to `ElasticsearchEndpoint` with a field-backed getter defaulting to
  `{IndexNamePrefix}-dev-latest`.
- Creates `ElasticsearchEndpointFactory` in `ServiceDefaults` to centralize user-secrets and
  environment variable reading, eliminating the duplicated `72f50f33` secrets ID pattern.
- Registers `DocumentationEndpoints` as a singleton in `AddDocumentationServiceDefaults`.
- Updates `ElasticsearchClientAccessor` to accept `DocumentationEndpoints` instead of
  `ElasticsearchOptions`, supporting both API key and basic authentication.
- Updates all gateway consumers (`NavigationSearchGateway`, `FullSearchGateway`,
  `DocumentGateway`, `ElasticsearchAskAiMessageFeedbackGateway`) to use endpoint properties.
- Simplifies all three integration test files (`SearchRelevanceTests`,
  `McpToolsIntegrationTestsBase`, `SearchBootstrapFixture`) to use `ElasticsearchEndpointFactory`
  and `ElasticsearchTransportFactory`, removing manual config construction.
- Deletes `ElasticsearchOptions.cs` and removes `Microsoft.Extensions.Configuration.UserSecrets`
  from the Search project.
Move mapping context (DocumentationMappingContext, LexicalConfig, SemanticConfig,
DocumentationAnalysisFactory) from Elastic.Markdown to Elastic.Documentation so
both indexing and search derive index names from the same source. Add ContentHash
helper to avoid Elastic.Ingest.Elasticsearch dependency in Elastic.Documentation.

Remove IndexName from ElasticsearchEndpoint, add Namespace to DocumentationEndpoints.
ElasticsearchEndpointFactory resolves namespace from DOCUMENTATION_ELASTIC_INDEX env
var (backward compat), DOTNET_ENVIRONMENT, ENVIRONMENT, or falls back to "dev".

ElasticsearchClientAccessor derives SearchIndex and RulesetName from namespace
instead of parsing the old IndexName string. Remove ExtractRulesetName and all
hardcoded "semantic-docs-dev-latest" assignments from tests and config files.
Enable IndexPatternUseBatchDate now that Elastic.Mapping supports it,
and pass batchTimestamp to IngestChannelOptions in the lexical-only path
so the channel uses the exporter's timestamp for index name computation.
…meter

Simplify DocumentationTooling endpoint resolution by delegating to
ElasticsearchEndpointFactory. Add missing skipOpenApi parameter to
IsolatedIndexService.Index call.
The lexical-only code path manually reimplemented drain, delete-stale,
refresh, and alias logic that the orchestrator handles automatically.
Remove the flag end-to-end: CLI parameters, configuration, exporter
branching, and CLI documentation.
@Mpdreamz Mpdreamz self-assigned this Feb 22, 2026
@Mpdreamz Mpdreamz requested review from a team and reakaleek February 22, 2026 17:41
@Mpdreamz Mpdreamz changed the title feature/ingest rearch Ingestion re-implement on updated Elastic.Ingest.Elasticsearch Feb 22, 2026
@github-actions
Copy link

github-actions bot commented Feb 22, 2026

🔍 Preview links for changed docs

Add .jina-embeddings-v5-text-small inference on 6 fields (title, abstract,
ai_rag_optimized_summary, ai_questions, ai_use_cases, stripped_body) to
enable hybrid sparse+dense retrieval. Rename InferenceId to ElserInferenceId
for clarity.
Use source-generated IStaticMappingResolver delegates for auto-stamping
BatchIndexDate and LastUpdated instead of manual assignment. Replace
DocumentationAnalysisFactory.CreateContext with direct context
customization via WithIndexName() and record-with expressions. Pass
IndexSettings for default_pipeline conditionally at runtime.
…nment

Rename indexNamespace to buildType throughout the exporter pipeline so
callers pass the build type (assembler, isolated, codex) instead of the
environment name. Search services now hardcode "assembler" as the type
since they always target assembler indices.

ResolveNamespace renamed to ResolveEnvironment and updated to parse the
old production index format ({variant}-docs-{env}-{timestamp}) to
extract the environment name.
… to simplify index naming logic. Update Elasticsearch dependencies to version 0.28.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants