Skip to main content

Flinkflow — Product Vision & Design

Version: 1.0 (April 2026)
Status: Living Document — defines the strategic direction of the platform.
Maintainer: Talweg Authors


1. Vision & Mission

Flinkflow is a low-code, YAML-driven pipeline framework built on Apache Flink. Its mission is to make stateful, distributed stream processing accessible to any developer — not just those with deep Flink expertise — by abstracting the Flink API surface behind a composable, declarative DSL.

"Define once in YAML. Run everywhere — local, Docker, Kubernetes."

Flinkflow draws direct inspiration from Apache Camel / Camel K (declarative routing primitives) and Kamelets (reusable, parameterised connectors), reimagined natively for the Flink stream processing engine and Kubernetes-first deployment.


2. Design Principles

PrincipleDescription
YAML FirstEvery pipeline, connector, and transformation is expressible in YAML with zero Java boilerplate required for the common case
Progressively EscapableInline Java (Janino), Python (GraalVM), and Camel fragments provide escape hatches for custom logic
Kubernetes NativeThe Flowlet catalog, pipeline definitions, and operational concerns live as Kubernetes CRDs
Pluggable by DesignSources, sinks, and operations are discrete, independently composable units
Zero Lock-inFlinkflow generates and submits standard Flink DataStream DAGs — there is no proprietary runtime
Agentic ReasonersAutonomous AI agents (gpt-4) can be dropped into any stream to perform non-deterministic reasoning

3. Product Roadmap

The execution of the Flinkflow vision is tracked via our unified project backlog.

Strategic Milestones

  • Milestone 1 (v1.0) — Production Ready: Focus on security (Vault/Secrets), operational safety (validation), and log observability.
  • Milestone 2 (v1.5) — Ecosystem Expansion: Expanding the Flowlet Catalog (Iceberg, Snowflake, CDC).
  • Milestone 3 (v2.0) — The Platform: Delivering the Visual IDE, Declarative Agentic Bridge, and enterprise governance features.

👉 View the detailed task-level roadmap here: Project Backlog (TODO)


4. Key Design Decisions (ADRs)

ADR-001: String-typed Stream Data Model

The Flink DataStream<String> is the universal wire type. Any serialisation format (JSON, CSV, Avro text, XML) can be represented as a String and transformed via Java snippets, Camel logic, or XSLT.

ADR-002: Polyglot Dynamic Code Execution

Inline Java (Janino) and Python (GraalVM) code snippets in YAML are compiled/interpreted and executed at runtime. This maintains Flinkflow's "Single JAR" deployment model.

ADR-003: Flowlets as Kubernetes CRDs

Reusable components (Flowlets) are defined as kind: Flowlet Kubernetes Custom Resources. This makes the catalog natively versioned and GitOps-compatible.

The target runtime is Apache Flink 2.2 on Java 17 (recommended production runtime; Java 21 is experimental in this release). Flink 2.2 (released December 2025) was chosen for the following reasons:

  • AI/ML Native: Introduces ML_PREDICT for LLM inference and VECTOR_SEARCH for real-time vector similarity search directly in SQL/Table API — directly aligned with Flinkflow's Agentic Bridge (ADR-005).
  • Improved Serialization: Dedicated serializers for Map, List, and Set; Kryo upgraded to 5.6; RocksDB upgraded to 8.10.0 for improved I/O on stateful pipelines.
  • Cleaner API Surface: Scala API fully removed; legacy DataSet API deprecated — aligns with Flinkflow's Java-first DataStream model (ADR-001).
  • Java 11 Dropped: Java 11 support removed upstream, reinforcing our Java 17 baseline commitment.

ADR-005: The Agentic Bridge

Autonomous AI agents are first-class YAML citizens. Flinkflow bridges flink-agents concepts to our DSL, allowing agents to use Flowlets as "Tools" and Flink State as "Memory".


5. Non-Functional Targets

ConcernTarget
ThroughputPipeline overhead ≤ 5% vs. native Flink DAG for equivalent logic
Startup LatencyPipeline submission within 10s of JVM start
Container SizeDocker image ≤ 1.5GB

Last updated: April 2026