- Distributed event streaming platform
a Kafka cluster is highly scalable and fault-tolerant: if any of its servers fails, the other servers will take over their work to ensure continuous operations without any data loss
Core Capabilities
- High throughput
- Scalable
- Permanent storage
- High availability
- Batching
- Message/event order
- Connect
Data format → Serialization + de-serialization
Kafka message: key-value
key: can be int, string, or something more complex
Use key and secret to access Kafka cluster
Topic
- Named container like a table in a database
- Stored similar events
- Events are immutable
- Events are durable and stored on disks
- Retention period is configurable
Partition
- Separate a topic into different pieces stored in different places
- Feed the partition key to a hash function
- The partition key will guarantee the order of the events
Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.
Broker
- An computer, instance, or container running the Kafka process
- Manage partition
- Manage replication of partition
- Handle write and read
Replication
- Copies of data for fault tolerance
- Copies are called follower replicas
- One lead partition and N-1 followers
- The main partition is the leader replica
- Write and read to the leader, followers will follow up
A common production setting is a replication factor of 3
Producer
- Client application
- Puts messages into topics
- Connection pooling
- Network buffering
- Partitioning (decide where or which broker to send)
Consumer
- Client application
- Read messages from topics
- Connection pooling
- Network protocol
- Horizontally and elastically scalable
- Maintains ordering within partitions at scale
- Consumer does not destroy messages, and multiple consumers can read from the same topic
- rebalance between consumer nodes in a consumer group
- Preserve message order by consumer groups
Consumer groups
- A list of consumers
The partitions of all the topics are divided among the consumers in the group
- More consumers in a group reduce bottleneck
- One consumer from the group takes events from a partition of a topic
- Larger number of partition allows adding more consumers to process data
- Better throughput as it scales
- One application → one consumer group
Cluster and Topic
Service account
- Define how we access Confluent Cloud
- Usually tied with ACLs
- Similar to IAM User
ACL (Access Control List)
- Define permission or authorization of Kafka clusters. For example,
read
orwrite
to a specific topic
- Tie to API Keys and service account
- Service account is the Pincipal
- Similar to IAM Policy or IoT Policy
Principal
- An entity authenticated by the authorizer
- Different security protocol has different principals
- Ex:
User:admin
,Group:developers
, orUser:CN=quickstart.confluent.io,OU=TEST,O=Sales,L=PaloAlto,ST=Ca,C=US
SSL
- Subject name
- Ex:
CN=quickstart.confluent.io
,OU=TEST,O=Sales,L=PaloAlto,ST=Ca,C=US
SASL/GSSAPI
- A host and a path
SASL/PLAIN or SCRAM
- simple text string
- Ex:
admin
API Key
- Used to control access to Confluent Cloud components and resources
- Similar to IoT Certificate
- Bind with service account (required) and managed clusters (optional)
- Service account is the owner of the API Key
Authentication
SASL
- Simple Authentication Security Layer
SASL/PLAIN
- Still use TLS to encrypt
- Use username and password
bootstrap server
- A server with an url that gives metadata of topics, partitions, brokers
- Used by producers or consumers to talk to Kafka clusters