We focus on a wireless sensor network powered with an energy beacon, where sensors send their measurements to the sink using the harvested energy. The aim of the system is to estimate an unknown signal over the area of interest as accurately as possible. We investigate optimal energy beamforming at the energy beacon and optimal transmit power allocation at the sensors under non-linear energy harvesting models. We use a deep reinforcement learning (RL) based approach where multi-layer neural networks are utilized. We illustrate how RL can approach the optimum performance without explicitly forming a system model, but suffers from slow convergence. We also quantify the importance of the number of antennas at the energy beamformer and the number of sensors.