Web Log Data Reduction Technology based on Parallel Processing in Distributed File System


  • Jeong-Joon Kim*


Wavelet, known as one of summary construction techniques was applied to feature extraction for multimedia
data. Wavelet histogram is a summary technique which grafts wavelet on to histogram considered as a typical
summary technique used in query optimization of database system and processing approximate query etc.
Wavelet histogram which combines merits of wavelet and histogram can generate a lossless optimal data
summary of original data. In the existing studies, it needed more than one Mapreduce job to construct local
wavelet histogram of partial data stored in each node. In addition, it took a lot of time to construct the global
wavelet histogram which is the combination of all local distributed wavelet histograms. Because the error
bound for data reconstructed from wavelet histogram was not considered, there is a shortcoming that we cannot control the error of reconstructed data beforehand. In this thesis, we developed a wavelet histogram construction system which can construct wavelet histogram fast by one Mapreduce job. Since the error bound can
beset before the construction of wavelet histogram, we can control the error of data reconstructed from wavelet
histogram under the error bound. Finally, the efficiency of our wavelet histogram construction system was
proved by comparing our system with others