Distributed Systems — Let’s start from basics

  1. How to store the data?
  2. How to retrieve the data?
  1. Writes should not be slower because of creating multiple indexes.
  2. Indexes should be maintained in memory in order for reads to be fast. — Otherwise, there is no point in having an index! Okay, don’t take this statement too seriously :). But keep this concept in your mind.
  1. When building indexes one detail to take care of is how fast you can rebuild the index?.
  1. We append JSON blob to the file(let’s say offset with 100)
  2. Create an in-memory index <42, 100> which is like, while retrieving I could ‘random seek’ to the byte 100 to retrieve the value for key 42.
  1. Distribute the data into multiple files. Each file is present in a different machine.
  2. Delete/Compact the data that is no longer needed
  3. Since data is not in a single file we need to maintain a hashtable to know which file to search for while serving the read-request for a particular key.
  4. Handling Deletes: Using BloomFilters. will elaborate later on this.
  5. Crash Recovery: Save the index into the disk to recover fast.
  6. Partial Writes: A write happened successfully to disk but not to the hashtable. we have to discard such records eventually.




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why Widgets in the Kivy GridLayout Are Not Positioning Properly

A pycharm IDE showing kivy codes

SFTP for Microsoft GCC High Cloud

gcc azure high cloud sftp

How to make notifications in Lumen Laravel

Some Commonly Used PHP Math Functions

TOP 10 PHP Books for Beginners

CI/CD Pipeline setup by integrating Jenkins and AWS

Weekly progress 4/18

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pavan Kumar Reddy

Pavan Kumar Reddy

More from Medium

A tale on Software design

Data Access between Micro/Services

Best Practices in Designing an Effective API

Behavior Driven Testing (BDT) approach