Data processing service
Anaconda python 3, Apache Spark 2.4.5 and Java 8
Features & Challenges:
It is a data processing service that pulls a large set of product data from the server and uploads it back, after processing it into the required format for an eCommerce website. Each and every minute the product data size is growing enormously. We have processed the data by using python but it will take a high amount of time to process millions of records. That’s why we have used Apache Spark to process the data. It can process millions of records within a second and we can get quick results instead of waiting for an hour. This data processing service is triggered by chronous to process the data every one hour once. It updates the server quickly without any human action.