Sign recognition is a challenging problem due to high variance of the signs among different signers and multiple modalities of the input information. In addition, the challenges that exist in the action classification problems in computer vision are similar in this domain too, such as variations in illumination and background. In this work, we propose a Siamese Neural Network (SNN) architecture that is used to extract features from the RGB and the depth streams of a sign frame in parallel. We use a pretrained model for the SNN without any finetuning to our training data. We then apply global feature pooling to the depth and color features that the SNN generates and feed the concatenation of the selected features to a recurrent neural network (RNN) to discriminate the signs. We trained our model parameters with the Montalbano dataset and achieved 93.19% test accuracy with ResNet-50 and 91.61% with VGG-16 Network Models.