14 Comments
⭠ Return to thread

Basically including checksum to the HTTP trailer header offers an benefit - Client-Side Integrity check, let's say you uploading a file in chunks, 16 MB file with 1 MB chunk means 16 chunks uploaded, while uploading the last chunk include the checksum calculated at client side as HTTP Trailer header. On S3 server side, it checks for Trailer header and knows that last chunk is received and calculates the checksum for the uploaded 16 MB file and compare the checksum which was sent as part of HTTP Trailer header, if mismatch then responds with error code 419 (checksum failed) and also include the Trailer header as part of the HTTP Response header. In case of successful match, still it includes the HTTP Trailer with the server generated checksum as part of the Response header so that client can validate to see if the same checksum is received as part of response Trailer header and does the client side integrity check. "Each server in the middle has to scan data twice without HTTP trailer" - I think the statement is bit confusing and misleading. Because checksum is generated on client Side (either by SDK or custom gen) and also at the S3 end, no need for the middle man to scan each object irrespective of whether Trailer header present or not.

In another scenario, where client do not add checksum as HTTP Trailer header, but server calculates the checksum and sends back in Response as 200 response, now client can lookout for Trailer header in response and validates the checksum, if different then clients knows something is corrupted and issue a delete request and then retry the upload again. But having Trailer header from client side you can avoid the additional delete request.

Expand full comment

Inline to the thinking and agree! Thanks Sathish.

Expand full comment