علوم مهندسی کامپیوتر و IT و اینترنت

۲۰۱۷ ImageNet Classification with Deep Convolutional Neural Networks

2017-albawi_alkabiImageNet-Classification-with-Deep-Convolutional-Neural-Networkspptx

در نمایش آنلاین پاورپوینت، ممکن است بعضی علائم، اعداد و حتی فونت‌ها به خوبی نمایش داده نشود. این مشکل در فایل اصلی پاورپوینت وجود ندارد.






  • جزئیات
  • امتیاز و نظرات
  • متن پاورپوینت

امتیاز

درحال ارسال
امتیاز کاربر [0 رای]

نقد و بررسی ها

هیچ نظری برای این پاورپوینت نوشته نشده است.

اولین کسی باشید که نظری می نویسد “۲۰۱۷ ImageNet Classification with Deep Convolutional Neural Networks”

۲۰۱۷ ImageNet Classification with Deep Convolutional Neural Networks

اسلاید 1: ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky University of Toronto kriz@cs.utoronto.caIlya Sutskever University of Toronto ilya@cs.utoronto.caGeoffrey E. Hinton University of Toronto hinton@cs.utoronto.ca Ali Albawi Karrar Alkaabi Cited by 12013Advances in Neural Information Processing Systems 25 (NIPS 2012)

اسلاید 2: AbstractWe trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max pooling layers, and three fully-connected layers with final 1000-way softmax.

اسلاید 3: 1 - AbstractTo make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective.

اسلاید 4: 2 - IntroductionCurrent approaches to object recognition make essential use of machine learning methods.datasets of labeled images were relatively small — on the order of tens of thousands of images (e.g., NORB [16], Caltech-101/256 [8, 9], and CIFAR-10/100 [12]).But objects in realistic settings exhibit considerable variability, so to learn to recognize them it is necessary to use much larger training sets.

اسلاید 5: 2 - IntroductionThe new larger datasets include LabelMe [23], which consists of hundreds of thousands of fully-segmented images, and ImageNet [6], which consists of over 15 million labeled high-resolution images in over 22,000 categories.To learn about thousands of objects from millions of images, we need a model with a large learning capacity like CNN.Despite the attractive qualities of CNNs, and despite the relative efficiency of their local architecture, they have still been prohibitively expensive to apply in large scale to high-resolution images for this reason we using GPU.

اسلاید 6: 3 - The Architecture

اسلاید 7: The architecture of our network is summarized in Figure 2. It contains eight learned layers —five convolutional and three fully-connected. Below, we describe some of the novel or unusual features of our network’s architecture. Sections 3.1-3.4 are sorted according to our estimation of their importance, with the most important first.3 - The Architecture

اسلاید 8: 3 - The Architecture

اسلاید 9: 3.1 - ReLU NonlinearityThe standard way to model a neuron’s output f as a function of its input x is with f(x) = tanh(x) OrDeep convolutional neural networks with ReLUs train several times faster than their equivalents with tanh units.

اسلاید 10: 3.1 - ReLU NonlinearityFigure 1: A four-layer convolutional neural network with ReLUs (solid line) reaches a 25% training error rate on CIFAR-10 six times faster than an equivalent network with tanh neurons (dashed line). The learning rates for each network were chosen independently to make training as fast as possible. No regularization of any kind was employed. The magnitude of the effect demonstrated here varies with network architecture, but networks with ReLUs consistently learn several times faster than equivalents with saturating neurons.

اسلاید 11: 3.2 - Training on Multiple GPUsA single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU. Therefore we spread the net across two GPUs. Current GPUs are particularly well-suited to cross-GPU parallelization, as they are able to read from and write to one another’s memory directly, without going through host machine memory.The parallelization scheme that we employ essentially puts half of the kernels (or neurons) on each GPU, with one additional trick: the GPUs communicate only in certain layers. This means that, for example, the kernels of layer 3 take input from all kernel maps in layer 2. However, kernels in layer 4 take input only from those kernel maps in layer 3 which reside on the same GPU.

اسلاید 12: 3.3 - Local Response NormalizationReLUs have the desirable property that they do not require input normalization to prevent them from saturating. If at least some training examples produce a positive input to a ReLU, learning will happen in that neuron. However, we still find that the following local normalization scheme aids generalization. Denoting by ai x;y the activity of a neuron computed by applying kernel i at position (x; y) and then applying the ReLU nonlinearity, the response-normalized activity bi x;y is given by the expression.Response normalization reduces our top-1 and top-5 error rates by 1.4% and 1.2%, respectively. We also verified the effectiveness of this scheme on the CIFAR-10 dataset: a four-layer CNN achieved a 13% test error rate without normalization and 11% with normalization.

اسلاید 13: 3.4 - Overlapping PoolingPooling layers in CNNs summarize the outputs of neighboring groups of neurons in the same kernel map. Traditionally, the neighborhoods summarized by adjacent pooling units do not overlap (e.g., [17, 11, 4]).To be more precise, a pooling layer can be thought of as consisting of a grid of pooling units spaced s pixels apart, each summarizing a neighborhood of size z z centered at the location of the pooling unit. If we set s = z, we obtain traditional local pooling as commonly employed in CNNs. If we set s < z, we obtain overlapping pooling.This is what we use throughout our network, with s = 2 and z = 3. This scheme reduces the top-1 and top-5 error rates by 0.4% and 0.3%, respectively, as compared with the non overlapping scheme s = 2; z = 2, which produces output of equivalent dimensions.

اسلاید 14: We generally observe during training that models with overlapping pooling find it slightly more difficult to over fit.3.4 - Overlapping Pooling

اسلاید 15: 3.4 - Overlapping Pooling

اسلاید 16: 3.5 - Overall ArchitectureNow we are ready to describe the overall architecture of our CNN.the net contains eight layers with weights; the first five are convolutional and the remaining three are fully connected.The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels.The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU.The kernels of the third convolutional layer are connected to all kernel maps in the second layer.The neurons in the fully connected layers are connected to all neurons in the previous layer.

اسلاید 17: The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.Response-normalization layers follow the first and second convolutional layers.Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the fifth convolutional layer.3.5 - Overall Architecture

اسلاید 18: The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5x5x48.The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers.The third convolutional layer has 384 kernels of size 3x3x 256 connected to the (normalized, pooled) outputs of the second convolutional layer.The fourth convolutional layer has 384 kernels of size 3x3x192 , and the fifth convolutional layer has 256 kernels of size 3x3x192. The fully-connected layers have 4096 neurons each.3.5 - Overall Architecture

اسلاید 19: 3.5 - Overall Architecture

اسلاید 20: 4 - Reducing OverfittingWe describe the two primary ways in which we combat overfitting.We employ two distinct forms of data augmentation.The first form of data augmentation consists of generating image translations and horizontal reflections. We do this by extracting random 224 x 224 patches (and their horizontal reflections) from the 256 x 256 images and training our network on these extracted patches .4.1 - Data Augmentation

اسلاید 21: this scheme, our network suffers from substantial overfitting, which would have forced us to use much smaller networks.4.1 - Data Augmentation256x256224x224224x224224x224224x224224x224224x224Horizontal Flip224x224

اسلاید 22: The second form of data augmentation consists of altering the intensities of the RGB channels in training images.4.1 - Data Augmentation

اسلاید 23: 4.2 - DropoutThe recently-introduced technique, called “dropout” [10], consists of setting to zero the output of each hidden neuron with probability 0.5.So every time an input is presented, the neural network samples a different architecture.We use dropout in the first two fully-connected layers of Figure 2. Without dropout, our network exhibits substantial overfitting. Dropout roughly doubles the number of iterations required to converge.

اسلاید 24: 4.2 - Dropout

اسلاید 25: 5 - Details of learningWe trained our models using stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005. We found that this small amount of weight decay was important for the model to learn. In other words, weight decay here is not merely a regularizer : it reduces the model’s training error.We initialized the weights in each layer from a zero-mean Gaussian distribution with standard deviation 0.01.We initialized the neuron biases in the second, fourth, and fifth convolutional layers,as well as in the fully-connected hidden layers, with the constant 1.We initialized the neuron biases in the remaining layers with the constant 0.This initialization accelerates the early stages of learning by providing the ReLUs with positive inputs.

اسلاید 26: We used an equal learning rate for all layers, which we adjusted manually throughout training.The heuristic which we followed was to divide the learning rate by 10 when thev alidation error rate stopped improving with the current learning rate.The learning rate was initialized at 0.01 and reduced three times prior to termination.We trained the network for roughly 90 cycles through the training set of 1.2 million images, which took five to six days on two NVIDIA GTX 580 3GB GPUs.5 - Details of learning

اسلاید 27: 6 - The DatasetImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories.ILSVRC uses a subset of ImageNet with roughly 1000 images in each of 1000 categories.1.2 million training images, 50,000 validation images, and 150,000 testing images.(ILSVRC) : ImageNet Large-Scale Visual Recognition Challenge.On ImageNet, it is customary to report two error rates: top-1 and top-5, where the top-5 error rate.

اسلاید 28: 6.1 - Qualitative Evaluations

اسلاید 29: 7 - Results

اسلاید 30: 7 - Results

اسلاید 31: Thank you Any questio

رایگان

خرید پاورپوینت توسط کلیه کارت‌های شتاب امکان‌پذیر است و بلافاصله پس از خرید، لینک دانلود پاورپوینت در اختیار شما قرار خواهد گرفت.

در صورت عدم رضایت سفارش برگشت و وجه به حساب شما برگشت داده خواهد شد.

در صورت نیاز با شماره 09353405883 در واتساپ، ایتا و روبیکا تماس بگیرید.

دانلود رایگان