Skip to content

Data processing


General Info

  • Message queueing service
  • SQS Queue = buffer between the app components that receive data & those that process data
  • Features
    • Ensures delivery of each message at least once
    • Supports multiple readers/writers on the same queue
  • Limitations
    • Design your system to be IDEMPOTENT
    • FIFO not guaranteed (use FIFO queues)
    • Data stored within SQS is not encrypted by AWS (user has to encrypt before pushing to SQS)
Message Lifecycle
  • Component 1 (C1) sends message A (MA) to a queue
  • When C2 is ready to process a message, it retrieves it from the queue
  • While MA is being processed, it remains in the queue & is not returned for the duration of the visibility timeout
  • C2 deletes MA from the queue once it is done processing it

Message Identifiers

Identifier Description
QUEUE URL Includes QUEUE NAME, provided at creation time and unique in the scope of all your queues
  • Assigned by SQS to each message
  • Max length = 100chars
  • To delete a message you need to specify the RECEIPT HANDLE, not the ID
  • Received each time you receive a message for a queue
  • Handle associated with the act of RECEIVING the message, not with the message itself



Message Group ID
  • Messages within a single group are processed in a FIFO fashion
  • A failing message in a group will block the whole group
  • Could be the same issue number, or different if ordering is not important
Message Deduplication ID Specific to each <issue, user>
  • If the producer detects a failed SendMessage action, it can retry sending as many times as necessary, using the same message deduplication ID
  • Assuming that the producer receives at least one acknowledgement before the deduplication interval expires, multiple retries neither affect the ordering of messages nor introduce duplicates
Delay Queue
  • Let you postpone the delivery of new messages to a queue for a number of seconds
    • Similar to visibility timeouts because both make messages unavailable to consumers for a specific period of time
    • The difference is that, for delay queues, a message is hidden when it is first added to queue
    • Whereas for visibility timeouts a message is hidden only after it is consumed from the queue
  • To set delay seconds on individual messages, rather than on an entire queue, use message timers to allow SQS to use the message timer's DelaySeconds value instead of the delay queue's DelaySeconds value
  • Message timers let you specify an initial invisibility period for a message added to a queue
Dead-Letter Queue
  • Queues, which other queues (source queues) can target for messages that can't be processed (consumed) successfully
  • The redrive policy specifies the source queue, the dead-letter queue, and the conditions under which Amazon SQS moves messages from the former to the latter if the consumer of the source queue fails to process a message a specified number of times
  • When the ReceiveCount for a message exceeds the maxReceiveCount for a queue, SQS moves the message to a dead-letter queue (with its original message ID)
  • Configure an alarm for any messages delivered to a dead-letter queue



Visibility Timeout
  • A period of time during which SQS prevents other consumers from receiving and processing the message
  • To allow your function time to process each batch of records, set the source queue's visibility timeout to at least 6 times the timeout that you configure on your function
  • The extra time allows for Lambda to retry if your function execution is throttled while your function is processing a previous batch
  • Max 12h visibility timeout (default = 30secs)
  • In ReceiveMessage it is possible to receive up to 10 messages in a single call
  • Change batch size
  • Total concurrency is equal to or less than the number of unique MessageGroupIds in the SQS FIFO queue
  • When you receive a message with a message group ID, no more messages for the same message group ID are returned unless you delete the message or it becomes visible
  • SQS FIFO does not guarantee only once delivery when used as a Lambda trigger. If only once delivery is important in your serverless application, it's recommended to make your function idempotent.
Long Polling
  • ReceiveMessage check for existence of a message in the queue & returns IMMEDIATELY (with or without a message)
  • Long polling sends WaitTimeSeconds (<=20secs) argument to ReceiveMessage
  • If no message in the queue → call will wait WaitTimeSeconds for a message to appear before returning


  • Web service for mobile & messaging to manage notifications
  • PUB/SUB Paradigm
    • Notifications pushed to clients, eliminating need to poll periodically
  • Allows for parallel async processing
    • FANOUT = SNS message sent to a topic & then replicated & pushed to multiple SQS queues/HTTP endpoints/email addresses


PUBLISHERS Communicate with SUBSCRIBERS ASYNC by sending a message to a TOPIC
  • Communication channel that contains a list of SUBSCRIBERS & methods used to communicate to them
  • When a message is sent to a TOPIC, it is automatically forwarded to each SUBSCRIBER of that topic using the communication METHOD configured for that SUBSCRIBER


Streaming data platform consisting of 3 services

Use Cases

Use Case Description
DATA INGESTION Ensure data is accepted reliably & successfully stored in AWS
REALTIME PROCESSING of massive data streams Act on knowledge gleaned from a big data stream right away
NOT for BATCH Jobs Not appropriate for batch jobs (ETL)


Service Description
  • STORE: Load massive volumes of STREAMING data into AWS
  • Receives stream data & stores it in S3/Redshift/Elastic
  • Just create a Delivery Stream & configure destination for data
  • PROCESS: Collect and process large streams of data records in realtime
  • Can create a Kinesis STREAMS App that processes data as it moves through the stream
  • Can scale by distributing incoming data across shards
  • Processing executed on consumers which read data from the shards & run the Kinesis Streams App
  • Analyze streaming data realtime with SQL