12 Comments
Mar 6·edited Mar 6Liked by Neo Kim

Great walkthrough. I really like the visuals and the way you structure you sentences to have breaks between them.

Just reading between the lines, the way they do data redundancy seems costly, I know they say it's a cheap solution but would be good if you have any insights on how much that costs to have such high durability this way?

Expand full comment
Mar 5Liked by Neo Kim

"Besides they use the HTTP trailer to send checksum.

Because it allows sending extra data at the end of chunked data.

Thus avoiding the need to scan data twice and check data integrity at scale."

What do we mean by 'avoiding the need to scan data twice'? Why do need to scan data twice to calculate checksum? I don't quite get it. Can someone explain please?

Expand full comment
Mar 15Liked by Neo Kim

Is the diagram "Re-Replicating Shards of a Failed Hard Disk at Scale" intended to show how shards from a failed disk gets replicated to other disks?

I don't understand how that's possible. I assume a failed disk is completely unreadable by the time it fails. I assume this diagram is not related to the data sector discussion below.

So if the disk is already unreadable, how is it going to replicate the red and green dotted shards to 2 other disks?

And if the disk is readable after failure, why care about breaking an object into shards anyway?

Expand full comment

Amazing walkthrough, NK! Was super cool to learn more about the internals of S3.

Also, thanks for the High Growth Engineer shout-out! I appreciate it

Expand full comment