Flinkflow Project Backlog (TODO)
This file is the Single Source of Truth for the Flinkflow project roadmap and task list. It tracks concrete implementations organized by Milestone.
✅ Foundation (Achieved)
Core capabilities already implemented and validated.
⚙️ Core Engine
- YAML Multi-Runtime: Parsing, Flowlet resolution, and Flink DAG generation.
- Polyglot Runtime: In-memory compilation (Janino/Java) and execution (GraalVM/Python/Camel) of dynamic snippets.
- Processor Matrix: Support for
process,filter,flatMap,keyBy,reduce,window,sideOutput, andjoin. - Flowlet System: Parameterized reusable components with K8s CRD discovery.
- Basic Connectors: Kafka (Apache/Confluent), File/S3, DataGen, JDBC, HTTP Sinks.
- Deployment Options: Local script, Docker, and K8s (Manual, Operator, Native, and Pipeline CR).
- Flink 2.2 Migration: Full support for Flink 2.2.0 and the new
OpenContextlifecycle. - Flink State V2: Asynchronous, non-blocking managed state for AI Agents.
🐫 Low-Code & Polyglot
- Python Snippets: Support for inline Python syntax across core steps (GraalVM).
- Camel Expressions: Integration of Camel Simple and JsonPath for declarative logic.
- Camel YAML DSL: Support for complex route fragments and EIPs (Choice, Splitter, etc.) within processors.
- Groovy Integration: High-performance JVM scripting for complex object math.
- XSLT 3.0 DataMapper: Saxon-HE integration for structural transformations.
🛡️ Validation & Observability
- YAML Schema Enforcement: OpenAPI v3 validation for Pipeline and Flowlet CRDs.
- Graph Validation: Detect disconnected DAGs or missing sinks at submission time.
- Advanced Metrics: Native Prometheus/Grafana export with pre-built templates.
- Test Coverage: Comprehensive unit/lifecycle tests for all polyglot runtimes (Java, Python, Camel). Line coverage ≥ 80%.
🚀 Milestone 1 — v1.0: Stable Release
Refine the developer and operator experience for standard deployments.
🔐 Safety & Secrets
- Kubernetes Secrets: Resolve
secret:name/keyin Flowlet/Pipeline properties. - Validation Feedback: Improved error messages when YAML/Parameter validation fails.
[!NOTE] Secret Masking & Encryption: To maintain the simplicity of the Apache 2.0 core, advanced field-level encryption and log masking are prioritized for the Flinkflow Enterprise Edition distribution.
[!NOTE] Helm Chart — deferred to v2.0. The current
kubectl-apply workflow (CRDs + RBAC + ConfigMap) is sufficient forv0.9-xandv1.0early adopters. A Helm chart adds significant maintenance overhead (values schema, templating, upgrade hooks, chart tests) that is not justified until the installation surface stabilises. See Milestone 3 for the planned chart scope.
🌐 Milestone 2 — v1.5: Ecosystem & Lifecycle
Broaden connectivity and manage the full pipeline lifecycle.
🔌 New Connectors (Flowlet Catalog)
- Sources: Debezium (CDC), MongoDB Change Streams, MQTT, Pulsar, HTTP Polling, AWS Kinesis.
- Sinks: ClickHouse, Pinecone (Vector), ElasticSearch, OpenSearch.
- Universal Wrap: Re-implement all core connectors as standard templates in the Flowlet catalog.
📦 Management & State
- Flowlet Versioning: Support for semver pinning in pipeline definitions (e.g.,
version: "1.2.0").
🐫 Advanced Camel Features
- Side Output Support: Ability to emit to multiple streams directly from Camel logic.
- Join Function Support: Bridging Flink's
ProcessJoinFunctionto the Camel YAML DSL (v2). - Flink State Access: Exposing Flink's managed state (ValueState, ListState) as Camel headers.
🎨 Milestone 3 — v2.0: Platform & Enterprise
Self-service platform with agentic developer tools and governance.
🛡️ Governance & Compliance
- Secret Auditing: Implementation of job-level audit logs for secret resolution.
- Quota Management: Per-namespace resource limits (parallelism, max jobs).
🛠️ Developer Tooling & Extensibility
- Helm Chart: Official Helm chart packaging all Flinkflow infrastructure components (Operator CRDs, RBAC, Dashboard, Flowlet Catalog). Enables single-command install via
helm install flinkflow ./chartand GitOps-compatible upgrades via ArgoCD/Flux. Deferred from v1.0 — kubectl-based manifests are sufficient until the install surface stabilises. - Plugin SDK: Java/SPI interface for third-party Sources, Sinks, and operations.
🤖 Intelligent Features & AI (Flink Agents)
- Agentic Bridge: Declarative integration with LLMs via LangChain4j (
type: agentstep). - Agent Step Type: Native
type: agentwith system prompt, memory toggle, and tool discovery. - Flowlets-as-Tools: Automatic registration of Flinkflow Flowlets as agent-invokable tools.
- Multi-Provider Routing: Auto-detect provider from model name — OpenAI (
gpt-*), Google AI Studio (gemini-*), Vertex AI (provider: vertex), Anthropic (claude-*), and Ollama (ollama:*orllama/mistral/phi). - Stateful Agent Memory: Multi-turn conversation history stored as Flink
ValueStateper key. - GeminiDirectChatModel: Custom REST client targeting
v1betaGemini endpoint, bypassing LangChain4jv1betalimitations.
🏢 Flinkflow Enterprise Edition & Platform
Proprietary features and managed services for organizational scale.
🛡️ Governance & Security
- Enterprise RBAC: OpenID Connect (OIDC) and SSO integration for the Flinkflow Dashboard.
- Namespace Isolation: Multi-tenant security boundaries and resource quotas.
- Secret Encryption: End-to-end masking and field-level encryption for PII compliance.
- HashiCorp Vault: Native
vault:path/keyengine for enterprise secret management.
🔌 Proprietary High-Perf Connectors
- Cloud Data Warehouses: Optimized, high-throughput sinks for Snowflake, BigQuery, and Redshift.
- Table Formats: Commercial support for Delta Lake, Iceberg, and Hudi with managed compaction.
- Legacy Systems: Specialized connectors for SAP, Salesforce (CDC), and Mainframe (CDC).
📊 Advanced Observability & Developer Tools
- Agentic Development Tool: Agent-based development tool for pipeline authoring and logic injection, integrated directly into any modern IDE (VS Code, Cursor, IntelliJ).
- Flowlet Marketplace: UI for browsing and installing community/commercial Flowlets.
- Live Log Streaming: WebSocket-based container log tailing in the dashboard.
- Log Filtering: Dashboard controls for log levels and keyword highlighting.
- Job Control UI: Buttons to Cancel or Stop jobs from the monitoring interface.
- SLA Monitoring: Service Level Agreement (SLA/SLO) tracking and latency forecasting.
🤖 Agentic AI & Machine Learning
- Validation Assist: AI-powered explanations and auto-fixes for pipeline configuration errors.
- In-stream ML: Dedicated
model-inferencestep for high-performance real-time AI (CPU/GPU-accelerated). - Hot-Reloading: Live update of transformation logic and agent prompts without stopping the Flink job.
- Agentic Audit: Full prompt/response debugging and token cost attribution for AI Agents.
- Air-Gapped Gallery: Hardened container images with pre-loaded LLM models for offline AI.
Last updated: April 30, 2026