Implicit scene representations have recently shown promising results in photo-realistic 3D reconstruction and view synthesis based on a set of calibrated views. However, their applications face several challenges, including unknown camera poses, boundary ambiguity, and observation noise. This paper proposes a novel online scene representation method that simultaneously learns to represent the target scene and estimates the camera poses from an RGB-D stream. An implicit scene representation function built with scale-encoded cascaded grids is proposed to represent scenes online from incremental observations. This implicit function is optimized in a reparameterized domain that is designed to provide defined boundaries. The cascaded grids are progressively distilled in this reparameterized domain to improve their model capacity and geometry accuracy. A radiance field deblurring module based on a physical imaging process is further proposed to restore a photo-realistic reconstruction against camera motion blur, which is the main component of the observation noise. The proposed method can produce sharp and photo-realistic representations of scenes under various shooting conditions without known camera poses. Experiments on multiple datasets demonstrate the effectiveness of the proposed method in improving view synthesis and camera tracking results for online scene representation tasks.