The objective of this project was to determine whether diagnostics of Alzheimer’s disease (AD) from EMR data alone (without relying on diagnostic imaging) could be significantly improved by applying clinical domain knowledge in data preprocessing and positive patient cohort selection rather than setting naive filters. Data were extracted from a repository of heterogeneous ambulatory EMR data, collected from primary care medical offices all over the United States. Selected Clinically Relevant Positive (SCRP) datasets were used as inputs to a recurrent deep neural network (RNN) model to predict if a given patient may develop AD. The RNN model that used data relevant to AD performed significantly better when learning from the SCRP dataset as opposed to when datasets were selected naively. The integration of qualitative medical knowledge for dataset selection and deep learning techniques provided a mechanism for significant improvement of AD prediction.