Deep Cloud

Deep learning models are known to be computationally expensive and can take several days to train depending upon the size of data-set. To reduce the processing time the use of GPU and distributed computing using MapReduce can be seen these days. In this post I will show how to combine both of these processing paradigm. Once the learning algorithm is implemented using MapReduce it is possible to use the model on the Elastic Map Reduce(EMR) platform provided by Amazon Web Services (AWS). The code is available on Github.

  1. I will use Mrjob’s MapReduce implementation in Python to implement a simple neural network.
  2. Each mapper or individual machine is equipped with a GPU and uses Theano/Tensorflow for GPU multi-threading. Continue reading

Simple Markov Model for correcting Transliterated Queries

The use of mixed languages allows people to express and communicate beyond the constraints of one language. Recently, a common term that has been given to these mixed languages is Hinglish. The use of mixed scripts can be seen in social media such as Watsapp & Facebook, e-commerce like Amazon & Flipkart, and web blogs like this one. Some of the corporate organizations use these scripts to create advertisement taglines such as khushiyon ki home delivery –  Dominos, or yeh dil maange more…aha! – Pepsi.



Till now most of us know this usage as Hinglish, yet there is a specific term in linguistics for sentences formed in these mixed scripts, and that is Transliterationas described in Query Expansion for Multi-script Information Retrieval.


Continue reading