Due to overwhelming use of 3D models in video games and virtual environments, there is a growing interest in 3D scene generation, scene understanding and 3D model retrieval. In this paper, we introduce a data-driven 3D scene generation approach from a Maximum Entropy (MaxEnt) model selection perspective. Using this model selection criterion, new scenes can be sampled by matching a set of contextual constraints that are extracted from training and synthesized scenes. Starting from a set of random synthesized configurations of objects in 3D, the MaxEnt distribution is iteratively sampled (using Metropolis sampling) and updated until the constraints between training and synthesized scenes match, indicating the generation of plausible synthesized 3D scenes. To illustrate the proposed methodology, we use 3D training desk scenes that are all composed of seven predefined objects with different position, scale and orientation arrangements. After applying the MaxEnt framework, the synthesized scenes show that the proposed strategy can generate reasonably similar scenes to the training examples without any human supervision during sampling. We would like to mention, however, that such an approach is not limited to desk scene generation as described here and can be extended to any 3D scene generation problem.