Region covariance is a robust feature descriptor that allows the use of even the simplest image features like intensity and gradient combined to form a well-performing descriptor for regions on the image. Beyond its robustness, it requires many identical heavy computations on different parts of input data which makes it a good candidate for parallel execution. In this manuscript, we present a real-time parallel implementation of the region covariance which, to our best knowledge, is the first in the literature. We experimented against existing implementations and achieved 6 times faster execution time over vectorized CPU parallel implementation that provides necessary speed up for real-time processing. Additionally, we improved the existing integral image calculation method on CUDA, reducing memory usage by 50%, achieving the fastest computation speed compared to exist- ing solutions, and improved the covariance matrix comparison metric by using a distance metric that is lightweight to compute and easy to implement.