Differential deep learning analysis (DDLA) was proposed as a side-channel attack (SCA) with deep learning techniques in non-profiled scenarios at TCHES 2019. In the proposed DDLA, the adversary sets the LSB or MSB of the intermediate value in the encryption process assumed for the key candidates as the ground-truth label and trains a deep neural network (DNN) with power traces as an input. The adversary also observes metrics such as loss and accuracy during DNN training and estimates that the key corresponding to the best-fitting DNN is correct. One of the disadvantages of DDLA is the heavy computation time for the DNN models because the number of required models is the as same as the number of key candidates, which is 256 in the case of AES. Therefore 4096 DNNs are required for revealing keys of 16 bytes. Furthermore, the DNN models have to be trained again if the adversary changes a ground-truth label function from LSB to other labels such as MSB or HW. We propose a new deep-learning-based SCA in a non-profiled scenario to solve these problems. Our core idea is to extract feature of the leakage waveform using DNN. The adversary reveals the correct keys by conducting cluster analysis using the feature vectors extracted from power traces using DNN. We named this method as CA-SCA (cluster-analysis-based side-channel attacks), it is advantageous that only one DNN needs to be trained to reveal all key bytes. In addition, once the DNN is trained, multiple label functions can be tested without the additional cost of training DNNs. We provide four case studies of attacking against AES, including two software implementations and two hardware implementations. Our attacks against software implementations provide methods using a concatenated dataset that efficiently train the DNN. Also, our attack on the hardware implementation introduces multitask learning to exploit the Hamming distance leakage model. The results show that the proposed method requires fewer waveforms to reveal all key bytes than DDLA owing to the efficient learning performance on the above methods. Comparing the computation time to process the same number of waveforms, the proposed method requires only about 1/75 and 1/25 of the time when attacking software and hardware implementations, respectively, due to the significant reduction in the number of training models.