Forward and back substitution algorithms are widely used for solving linear systems of equations after performing LU decomposition on the coefficient matrix. They are also essential in the implementation of high performance preconditioners which improve the convergence properties of the various iterative methods. In this paper, we describe an efficient approach to implementing forward and back substitution algorithms on a GPU and provide the implementation details of these algorithms on a Modified Incomplete Cholesky Preconditioner for the Conjugate Gradient (CG) algorithm. The resulting forward and back substitution algorithms are then used on a Modified Incomplete Cholesky Preconditioned Conjugate Gradient method to solve the sparse, symmetric, positive definite and linear systems of equations arising from the discretization of three dimensional finite difference ground-water flow models. By utilizing multiple threads, the proposed method yields speedups up to 60 times on GeForce GTX 280 compared to CPU implementation and up to 4.8 times speedup compared to cuSPARSE library function optimized for GPU by NVIDIA.