Rivest-Shamir-Adleman (RSA) is one of the most widely preferred algorithms used in public-key cryptography systems. RSA has a very slow ciphering rate if used in software. The use of a specific hardware is the only reasonable solution in applications where performance is the key factor. To speed up the modular multiplication and squaring, bit level systolic arrays are used with the Montgomery's modular multiplication algorithm to constitute the core of modular exponentiation operation. The squaring systolic structure is also performed in parallel with the systolic multiplication in the modular exponentiation. The novel idea in this paper is to use the systolic array cells with increased performance of up to 20% and use them in a single row organization. The final RSA design is configurable and can operate both for encryption and decryption. 1024-bit RSA algorithm is designed for the Xilinx Virtex FPGA and 0.7 mu ASIC. (C) 2004 Elsevier B.V. All rights reserved.