Recognizing group activities from still images is a challenging problem since images lack motion and temporal information that makes it easier to differentiate foreground from background. Nevertheless, images present rich spatial content that can be effectively lever-aged for better feature representation and recognition. In this paper, we propose a two-stream convolutional neural network approach for group activity recognition. Our proposed approach is based on using person segment mask images to guide feature learning process. Our method is capable of inferring group relations without the need of bottom-up approaches and low-level annotations. To this end, we utilize three ways of fusing RGB and person segment mask feature maps. Experimental results demonstrate that person mask guidance provides a complementary learning process by outperforming previous methods with a large margin.