With the growing interest in computational models of visual attention, saliency prediction has become an important research topic in computer vision. Over the past years, many different successful saliency models have been proposed especially for image saliency prediction. However, these models generally do not consider the dynamic nature of the scenes, and hence, they work better on static images. To date, there has been relatively little work on dynamic saliency that deals with predicting where humans look at videos. In addition, previous studies showed that how the feature integration is carried out is very crucial for more accurate results. Yet, many dynamic saliency models follow a similar simple design and extract separate spatial and temporal saliency maps which are then integrated together to obtain the final saliency map. In this paper, we present a comparative study for different feature integration strategies in dynamic saliency estimation. We employ a number of low and high-level visual features such as static saliency, motion, faces, humans and text, some of which have not been previously used in dynamic saliency estimation. In order to explore the strength of feature integration strategies, we investigate four learning-based (SVM, Gradient Boosting, NNLS, Random Forest) and two transformation based (Mean, Max) fusion methods, resulting in six new dynamic saliency models. Our experimental analysis on two different dynamic saliency benchmark datasets reveal that our models achieve better performance than the individual features. In addition, our learning-based models outperform the state-of-the-art dynamic saliency models.