Open Source

Portfolio > Open Source Work

Pointers

My open projects can be found at

Vector datastores are crucial for developing RAG applications with LLMs.

This resppostirory features code and reference architectures for vector search and RAG applications with MongoDB.

I have experimented with various embedding models and LLMs (openAI, Mistral, LLama)

I created these dockerized stacks to make development and running them easier

Kafka in docker - run mini kafka cluster on a machine
Spark in docker - run mini Spark cluster
Training sandbox docker - that has Spark, Kafka, Tensorflow, ML stack, DL stack, Anaconda all pre-installed and configured to seamlessly work together.
BigDL docker - Run Intel BigDL framework

Hadoop is very particular about DNS records of servers in the cluster. DNS record mis matches can cause runtime errors.

My hadoop DNS checker utility verifies DNS records of cluster machines.

Spark Job Server allows running Spark jobs with low latency.

Contributed performance patch and document patches to HBase - a distributed noSQL database

HBASE-4440 : A write benchmark writes lot of records. Then when a region splits, the writes are paused until the region is split and migrated to another server. This delay negatively affects the benchmark. My patch adds an option to pre-split the table, so the writes can be performed in parallel acros multiple regions / servers
HBASE-5555 - documentation and scripts to verify DNS records of HBase machines.