Learning weights in perceptrons, Computer Engineering

Assignment Help:

Learning Weights in Perceptrons - Artificial neural network

In detail we will look at the learning method for weights in multi-layer networks next chapter. The following description of learning in perceptions will help explain what is going on in the multilayer case. We are in a machine learning set up so we can expect the job to be to learn a objective function which categorizes examples into categories, at least given a set of training examples supplied with their right categorizations. A little thought will be needed in order to select the right way of thinking regarding the examples as input to a set of input units, but, due to the simple nature of a preceptor, there is not much option for the rest of the architecture.

In order to produce a perceptron able to perform our categorization task, we have to use the examples to train the weights among the input units and the output unit, and to train the threshold. To make simple the routine, we think of the threshold as a special weight, that comes from a special input node that always outputs one. Thus, we think of our perceptron like this:

Then, we say that the output from the perceptron is +1 if the weighted sum from all the input units (including the special one) is bigger than zero, and it outputs -1 otherwise. We see that weight w0 is easily the threshold value. Though, thinking of the network like this means in the same way we may train w0 as we train all the other weights.

The weights are first assigned randomly and training instance is used 1 after another to tweak the weights in the network. In the training set all instance are used and the whole process (using all the examples again) is iterated till all examples are exactly categorized by the network. The tweaking is called as the perceptron training rule, and is following like: If the training example, E, is exactly categorized by the network, no tweaking is carried out,then. If E is misclassified, then every specific weight is tweaked by adding on a tiny value, Δ. Imagine we are trying to calculate weight wi, which is between the xi, i-th input unit and the output unit. Then,  given  that  the  network  should  have  calculated  the  target  value  t(E)  for example E, but in fact calculated the observed value o(E), then Δ is calculated as:

Δ = η (t (E) - o (E)) xi

Notice that η is a fixed positive constant called the learning rate. By ignoring η briefly, we see that the value Δ that we add on to our weight wi is calculated by multiplying the input value xi by t (E) - o (E). t(E) - o(E) will either be -2 or  +2, because perceptrons output only -1 or +1, and t(E) can't be equal to o(E), otherwise we would not be doing any tweaking. Thus, we may think of t(E) - o(E) as a movement in a specific numerical direction, for example, negative or positive. This direction will be like that, if the whole sum, S, was too low to get over the threshold and produce the right categorisation, then the contribution to S from wi * xi will be increased. On the other hand, if S is too high, the contribution from wi * xi is reduced. Because t (E) - o (E) is multiplied by xi, then if xi is a high value (positive or negative), the change to the weight will be larger. To get a better comfort for why this direction correction works, it is a good idea to do some simple calculations by hand.

 

272_Perceptrons1.png

η simply controls how far the correction should go at 1 time, and is generally set to be a fairly low value, for example, 0.1. The weight learning problem may be seen as finding the global minimum error, calculated as the proportion of miscategorised training examples, over a space where every input values may vary. So, it is possible to move too far in a direction and improve one specific weight to the detriment of the whole sum: while the sum can work for the training example being looked at, it can no longer be a good value for categorizing all the examples correct. Due to this reason, η limit the amount of movement possible. If a large movement is really required for a weight, then it will happen over a series of iterations through the example set. For Sometimes, η is set to decay as the number of such type of iterations through the complete set of training examples increases, so that it may move more slowly towards the global minimum in order not to overshoot in 1 direction. This type of gradient descent is at the heart of the learning algorithm for multi- layered networks, as discussed in the next chapter.

Perceptrons with step functions have restricted abilities when it comes to the range of concepts that may be learned, as discussed in a later section. One way to get better matters is to replace the threshold function with a linear unit, so that the network outputs a real value, rather than a -1or +1. This enables us to utilize another rule, called the delta rule, which is also depending on gradient descent. We do not look at this rule here, because the back propagation  learning  method  for  multi-layer  networks  is same.


Related Discussions:- Learning weights in perceptrons

What is boundary scan, What is Boundary Scan?  Boundary scan is a board...

What is Boundary Scan?  Boundary scan is a board level design method that provides test access to the input and output pads of ICs on PCBs. Boundary scan changes the IO circuit

Explain about decimal numbers, Q. Explain about Decimal Numbers? Deci...

Q. Explain about Decimal Numbers? Decimal number system has 10 digits signified by 0,1,2,3,4,5,6,7,8 and 9. Any decimal number can be signified as a string of these digits an

What are the differences between struts and units, What are the differences...

What are the differences between struts and units?  A warm up question. Units are static objects that exist from the start of the simulation right up to its end, whereas struts

history of databases , Write a four-page paper how relational data solutio...

Write a four-page paper how relational data solution is applied to presnt Video Store business. 1.       Describe Relational Databases   2.       Write History of databases

Digital communication systems, a pcm has the following parameters a maximum...

a pcm has the following parameters a maximum analog input frequency of 4khz maximum decoded voltage at the receiver of 2.55v minimum dr of 6db compute minimum sampling rate,minimum

Design the counter using sequential logic, Q. Design the counter using sequ...

Q. Design the counter using sequential logic with following counting sequence using RS- flip-flops. 000, 100, 101, 111, 010, ... ... ... ... ... ... ... ... ... ...

No. of decimal places for output within a write statement, The no of decima...

The no of decimal places for output can be describes within a write statement. This statement is right. Write:/ decimals 2.

Implementation of 4-to-1 multiplexer, Implement the Y(A, B, C) = ∑(2,3,5,6)...

Implement the Y(A, B, C) = ∑(2,3,5,6) function using 4-to-1 multiplexer.   Ans. Y(A,B,C)=∑(2,3,5,6) Here we take B,C as the select bits also A as input. To select the input we can

What is e-brokerage, What is e-brokerage? E-brokerage is an investment ...

What is e-brokerage? E-brokerage is an investment house that permits you to buy and sell stocks and get investment information from its Web site.

Explain the various interface circuits, Explain the various interface circu...

Explain the various interface circuits.  An I/O interface having of circuitry required to connect an I/O device to computer bus. One side having of a data path with its associa

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd