home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: sci.math
- Path: sparky!uunet!munnari.oz.au!cs.mu.OZ.AU!psm
- From: psm@mullian.ee.mu.OZ.AU (Phil Malin)
- Subject: Taking derivative w.r.t. a matrix...
- Message-ID: <psm.727605195@murlibobo>
- Sender: news@cs.mu.OZ.AU
- Organization: Computer Science, University of Melbourne, Australia
- Date: Thu, 21 Jan 1993 08:33:15 GMT
- Lines: 55
-
- Hi all.
-
- Previously someone posted a question about taking the derivative of
- a scalar function of a matrix w.r.t. the matrix, more precisely
-
- if
- f=x^{T}Ax (using Latex convention)
- then
- df/dA=xx^{T}
-
- which is more apparent if we consider the problem in the following form;
-
- f=sum_{i,j} a_{ij}x_{i}x_{j}
- so
- @f/@a_{ij}=x_{i}x_{j}
-
- where @ is the partial derivative.
-
- Even more previously I posted a question along the same lines but using
- tensors of rank two (or even higher) rather than matrices, so I would be
- taking the derivative of a scalar function w.r.t. a tensor of order
- greater than one. Basically I was told (in a nice way) that it did not
- make sense :-) Being an engineer it seems intuitive that the above
- formulation is correct but I was wondering if it was formally correct.
- To get the heart of the matter I'll briefly explain what I'm on about;
-
- I have a function
-
- y^{i}=f(z^{j})
- z^{j}=w^{j}_{k}x^{k}+b^{j}
-
- which represents a simplified neuron. I could use matrices and summations
- like everyone else but I think it looks nicer if I use this shorthand
- notation by defining the matrix as a (1,1) tensor, etc. The problem comes
- when it comes time to compute the p.d. of y^{i} w.r.t. the tensor w^{i}_{j}.
- Again, intuitively (and consistent with everyone else's results) I obtain
-
- @y^{i}/@w^{p}_{q}=f'^{i}_{p}()x^{q}
-
- My question is - is this notation formally correct? It seems obvious but
- that's not a good enough reason to accept it.
-
- I might also add that sometimes this summation convention is a bit annoying.
- Normally y^{i}=0 for all z^{j} when i!=j, but if I write f'^{i}_{i} I get
- a summation which I don't want. Is there any way around this. Maybe
- the bottom line is not to use the tensor formulation. Maybe (and this
- is the most probable reason) my formulation is incorrect to begin with.
- But I like to think that there is an elegant and formally correct expression
- of this problem.
-
- Any help (and thoughts) would be appreciated.
-
- Phil Malin.
- psm@mullian.ee.mu.oz.au
-
-