Click here to go back to Homepage.
The null classifier predicting majority class only achieves 78% accuracy. This is our baseline model. After experimenting with different loss functions, neural network structures and class weights, we have achieved a best test accuracy around 88% and an f1 score of 87% (we give equal weights to both classes). High f1 score indicates that recall and precision are quite balanced despite the challenge of having only 22% data coming from the negative class.
The dataset contains around 142 millions reviews, with maximum sentence length of over 2000 words. All data preprocessing tasks are carried out on MapReduce and from then onwards reviews are of fixed-length number format. We experimented with multiple storage formats and finally managed large files with hdf5.
We successfully deployed RNN model to multiple GPU’s and carried out experiments with varying number of nodes and batch sizes. Through parallelization, we reduced the runtime from 18 hours using a single p2.xlarge down to 2.5 hours using 2 g3.16xlarge. We also implemented a dynamic load balancer that distributes batches of deferring sizes to GPUs based on their performance at the start of each epoch
[1] R. He, J. McAuley. Modeling the visual evolution of fashion trends with one-class collaborative filtering. WWW, 2016
[2] J. McAuley, C. Targett, J. Shi, A. van den Hengel. Image-based recommendations on styles and substitutes. SIGIR, 2015
[3] “PyTorch 1.0 Distributed Trainer with Amazon AWS.” PyTorch 1.0 Distributed Trainer with Amazon AWS - PyTorch Tutorials 1.1.0.dev20190507 Documentation, pytorch.org/tutorials/beginner/aws_distributed_training_tutorial.html.
[4] Subramanian, Vishnu, and Manas Agarwal. Deep Learning with PyTorch: a Practical Approach to Building Neural Network Models Using PyTorch. Packt Publishing, 2018.
[5] Davis, Matt. SnakeViz, jiffyclub.github.io/snakeviz/.
[6] Goyal, Priya, et al. “Accurate, large minibatch sgd: Training imagenet in 1 hour.” arXiv preprint arXiv:1706.02677, 2017.
[7] Zinkevich, Martin, et al. “Parallelized stochastic gradient descent.” Advances in neural information processing systems. 2010.