|
Convolution is an operation which takes two functions as input, and produces a single function output (much like addition or multiplication of functions). The method of combining these functions is defined as
卷積是一種操作,用兩個函數作為輸入,產生一個單一的函數作為輸出(很像加法或者函數的乘法)。合並這些函數的方法定義為如下表達式:
Where
x,
y both range over all of
 . Here I will try and present convolution as a very convenient way of solving certain problems, and then abstracting from that; as a mathematical operation important in its own right.
其中x,y在$ \Omega $的所有范圍內取值。在這里我會試着以解決某一個具體問題的方式來展示什么是卷積,然后再從中抽象出來卷積的數學概念和數學上的操作。
Propagation
傳播
We will start our investigation by looking at some waves. Imagine you're standing on the edge of a pond, and you throw a small stone in. The stone will cause a ripple to travel outwards across the surface:
我們先開始觀察一些波的擴散過程。想象你站在池塘邊向池塘里扔一顆小石子,小石子會激起一圈圈的波紋。如下圖所示:
What about if the small stone was heavier? Well, intuitively we would expect a bigger ripple:
如果這個小石子更重一些會如何呢?哈哈,直觀上來說石子激起的波紋會更大
Or what about a lighter stone?
如果換成一個比較輕的石子呢?
All three of the ripple patterns above share the same general shape, but differ only in their magnitude. How do we go about expressing this observation mathematically? First, let's abstract a bit: if we look at the impacting stone as our input, and the resulting ripple as our output, we then have a process to model:
上面三種不同大小的波紋都有着共同的形狀,只不過強度不一樣而已。我們如何用數學描述我們觀察到的現象呢?首先,讓我們做一點抽象。如果我們把投入湖中的石子當做輸入,石子產生的波紋當做輸出,那么我們就有了如下的描述這個過程的模型:
石頭 ——>波紋
We can express the dependence of the ripple's size on the weight of the stone by saying that the output scales
linearly with the input:
影響波紋大小的因素是石頭的重量,我們可以把這種相關關系描述成輸出與輸入成線性關系,也就是說石頭的重量增加一個比例,對應的波的強度也增加一個比例。
c*石頭——>c*波紋
This property lets us see that there is some "special" stone we could study: that of unit weight. If we know the ripple caused by this stone, we can then find the ripple caused by any stone (lighter
or heavier) by just scaling by the proper amount. If we think of this stone as appropriately "small" (technically, a point), then mathematically we can model it by an impulse. The wave caused by this unit weight pebble is called the impulse response, a relationship we can throw into symbols as follows:
這種性質使我們看到有一些“特殊”的石頭我們可以好好研究一下,也就是單位重量的石頭。如果我們知道單位重量的石頭所激起的波紋,我們就能計算得到任意重量的石頭所激起的波紋。只需要在前面乘以一個石頭相對於單位重量的石頭有多重就可以了。如果我們假設這個石頭非常小(比如小到只有一個點的大小),那么數學的描述上我們就可以把他描述成一個脈沖,或者一個刺激。由這個脈沖所引發的波稱之為針對單位脈沖的反饋。我們可以把這種關系用符號來描述,如下:
脈沖——>反饋
Where
 is the impulse and
h symbolizes the ripple. So far, by just knowing
h we are able to find the output which results from an input of any magnitude. What else can we do with it? Well; a lot it turns out. For one, so far we have been dropping our stones at the (mathematical) origin, but we intuitively expect the shape of our ripple to look the same no matter where we drop the rock. We can express this by saying that if we translate the impulse, we translate the response:
在這個用符號描述的式子里,$ \delta $ 代指脈沖,而 $ h $代指一個系統接到單位脈沖時的反饋,在這個具體的例子里指代石頭濺起的波紋。目前為止,知道單位脈沖的反饋h,我們就可以計算得到任何重量的石頭所激起的不同強度的反饋,或者說波紋。那么我們還能做什么呢?哈哈,他說明了很多東西。比如說,目前為止我們都是把石頭投入到原點(從數學上來說),但是完全可以預期無論把石頭投向哪里,波紋的形狀都一樣。我們可以這樣表述:如果我們平移脈沖的位置,那么我們也就平移了脈沖引起的反饋。用數學符號表示如下:
r表示原始位置,r0表示移動的方向和距離。脈沖往哪個方向移動,對應的反饋就往哪個方向移動。
What about if we drop two stones in at once, but at different locations?
如果我們同時扔進去兩個石頭呢?如下圖
The wave caused by two stones is just the sum of the waves produced by each stone. In math-speak:
兩個石頭所引起的波紋是每一個石頭所激起的波紋簡單的疊加。在數學上的描述為:
脈沖(位於位置1處)+脈沖(位於位置2處)——>反饋(位於位置1處)+反饋(位於位置2處)
Together with the scaling property we had from above; this lets us say our ripples are
linear. To clear up notation from here on out; I'm going to replace the arrow with a function,
w, so that instead of saying
再加上我們上面提到的反饋的強度是怎么樣來變大或者縮小,我們就說波紋是
線性的。為了從這里開始簡化描述,我打算把箭頭表示的關系替換成函數來表示,函數w,
We will say
Let's say now that we have a handful of small pebbles of all different weights. We throw them into the water and they spread out, impacting at all different locations. How are we to model the complex ripple that will result? Well, upon impact, the surface just experiences a sum of stones
And each stone can be viewed as a scaled and translated version of our unit stone (
 ). This lets us say
Now, what does the wave that results look like? We can just apply our "wave operator",
w to the impact:
Which we can phrase more mathematically by
We are now in a position to use our previous knowledge of the linearity of our "wave operator" to write this more explicitly: we know the wave of a sum is the sum of waves, and the wave of a scalar multiple is the scaled wave of the impulse, so we can go ahead and express this symbolically
Now, we have expressed the resultant wave as just a scaled sum of various "unit waves", which we already have an expression for:
h
Assuming we know the form of this function
h then (the impulse response), we know the full form of our solution for the handful of pebbles:
What have we learned here? Well, assuming we know what wave propagates from a unit impulse at the origin, we are able to figure out what the wave looks like for a collection of various impulses all over the place. Given a difficult problem (what does the water's surface look like when you throw a handful of pebbles in), we were able to reduce it to a much simpler problem (what happens when you apply a unit impulse at the origin), and the answer to this question and from it construct the answer to our original problem.
In fact, we can significantly expand on this observation; but it will be easier if we do it at a slightly higher level of abstraction. What we have really been doing above is solving the problem of
Where our evolution function
w is both linear and translation-invariant (meaning we can break the input up as a sum of simpler parts if we'd like, and no matter where an impulse occurs the response looks the same). But what if our input isn't a simple sum of impulses? What if the input is a continuous function? Such a case could easily arrive in our wave analogy, if instead of dropping a pointlike pebble, we instead dropped in an extended object, like a cardboard box or a rock. If we would like to use the line of reasoning above, we need some way to break this complex input into impulses somehow:
If we can manage to do this, then we can solve our problem simply: we just need to apply
w to each impulse individually, and sum the results!
To do this however, we are going to switch to a different physical analogy (don't worry, we will come back and solve the wave problem in a bit!). The reason for our switch-up is just to simplify the problem at hand: with waves we have to worry about the
time dependence
of our response as it travels through space, which doesn't affect the nature of the problem at all. After all, off in abstract-land all we are doing is dealing with problems of the form
Instead of water waves, lets say we are going to take a picture. The setup is something like this: an object to be photographed, the camera lens system, and a piece of film.
The goal of our camera is to take light from the original object and deposit it on the lens; so we can think of our camera as a
function from object to film. If we call the object to be photographed the Source, the function to the image the Camera, and the film the Photograph, we can reduce this to a simpler schematic diagram.
If we shine a bright light at the camera, we expect a bright image; if we shine a dim light, we expect a dim image. If we shine two lights, we expect two images. So, we can say that our camera function is linear, much like the waves. We can symbolize this relationship
If our camera is in focus, we expect to see on the image exactly what what was around in the real world. That is, we expect a 1-1 correspondance between the source plane and the photograph.
What if our camera is out of focus? We know from experience that this means what appears to be a single point in the source (think a star) is a smudged blob on our photo. Thus, we don't have a 1-1 correspondence anymore, single source points spread their influence to multiple image points:
To make things easier to keep track of, we will let h(x,y) stand for the blurry image of a star which was originally centered on the source screen (ie at (0,0)), after our crappy camera imaged it. Since this star is rather point-like, we will treat it as an impulse and say
Or more succinctly
In exact analogy with our wave example, lets say now we take our camera and image the night sky. The source plane (the cosmos) contains a multitude of stars, each of a different brightness. What will our image look like? Well, just knowing how a unit impulse at the origin is imaged, we can figure it out! Viewing each star as a stretched and translated unit impulse; our image will be a collection of stretched and translated blurs.
We can decompose the night sky into a sum of these modified impulses, and then distribute our imaging function over the sum, blurring each star individually and adding the results.
Symbolically say
Where
mi is the brightness of each star, and (
xi,yi) is its spatial location.
If we want to determine what our photograph will look like, we can say
And again, we already know how C affects a unit impulse: it turns it into
h! This gives us our solution:
So far we have come up with two seemingly different problems; both which have a simple solution if we first understand how an "impulse" is propagated through them. Now lets take a picture of a portion of the sky with even more stars in it: for example, this small globular cluster which orbits our home galaxy:
We can apply the exact same reasoning as above here: each star is like an impulse, so to see how each star will look on our final picture, we simply take that impulse, propagate it through the camera until it becomes our response function
h, and then add all of them back together. The resulting blurry photo looks like this:
It's time to make sure the mathematical formalism of this process really makes sense: at the location of each impulse (star), we apply our "blur" function which models our camera's lack of focus. This blur function spreads the light out from this original point source to a more extended region. If our star is located on the source plane at point
(xi, yi), we need to make sure that our final image of the star is centered on that location as well. We defined our function
h(x,y) to be the blur caused by a unit impulse at the origin, so we need to slide this impulse to occur at our new point
(xi, yi) instead. This is what the term
says. Its just a translated version of our blurred out impulse, originating from
(xi, yi) instead of (0,0). Now if the stars original brightness was not simply 1 but some other amount, we must scale the brightness of our blur accordingly. Thus the image of a star of brightness
mi located at
(xi, yi) will be
h(x,y)
So far, this has just been a more careful repetition of what we have done above. What we are going to do now seems to be a simple change of symbols, but will actually give us some profound insight into convolution itself. The number
mi represents the brightness of our
ith star, which is located at the point
(xi, yi) on the source plane. What this is really saying then is that the brightness of the source plane at location
(xi, yi) is given by
mi. That is, we can define a
function on the source plane which gives us the brightness at each point. Lets call this function
m(u,v). Given a point (u,v) of our source plane (the night sky),
m tells us what the brightness is there. Of course, in our star examples so far most points of the plane have a brightness of zero (black sky). But as we saw in the globular cluster, if we try to image a part of the sky with lots and lots of stars, some of them will seem to touch, and form extended areas of brightness.
In fact, in the limit lets say we find a point of the night sky where there are stars
everywhere. There are no dark points, every line of sight ends in a star. Our brightness function
m would be nonzero everywhere, and every single point of the sky would have an impulse (star) located at it. Can we use what we have learned so far to figure out what our photograph of this area would look like? Sure we can! The only difference now is that we have a
continuum of stars, instead of discrete points. But, at each location (u,v) in the night sky, we have an impulse with brightness
m(u,v). We can represent this star by this brightness multiplied by an impulse function shifted to be located at (u,v):
This single point source dosen't propagate through our camera unchanged, however; it becomes blurred. This blur is still centered around the original location of the star however, and scaled for brightness by
m(u,v) so we can say the image of this particular star is
Just like in our discrete cases above, we now just need to sum up the images of each star to get the final image. This sum needs to be taken over each (u,v) which has nonzero brightness, which in the present case is all of them! We will need to perform a continuous sum then, over each position (u,v).
This last equation just says the image of our continuous field of stars is the blurred image of each star individually, all added back together. However, this also happens to be a realization of the formula for convolution presented at the top of this post! The resulting image of our star field is just the convolution of our brightness field with the blurred image of a unit star at the origin!
How about the image of a lizard? Well, we were able to treat a continuous field of stars successfully by modeling it via a continuous "brightness" field. What if we just view our lizard as a continuous field of brightness/color? With the stars it was intuitive to say that each point was in reality an "impulse," which was the crux of our argument. Does the same thing carry over here? Is the image of a lizard really nothing more than a weighted collection of impulses? Let's think for a second about a television set displaying our lizard on it's screen. If you get really close to this image, you'll notice that its actually made out of teeny tiny pixels, its nothing more than a collection of tiny impulses! If we shrink these pixels to zero size (as an ideal impulse is), then it seems we can really say that a lizard's image is nothing more than a collection of infinitly small pixels, or impulses!
Since we already know how our camera responds to a unit impulse, all we have to do to model the photo of our lizard with the blurry camera is to take this response, scale it by the brightness at each location, and add them all up.
(It's probably time for a new camera!)
The blurry image is then just the
convolution of our original image of the lizard (the brightness field), and the response of the camera to a single impulse.
Alright then, back to the water waves. What if we threw a larger object into the pond? Instead of having a discrete set of impulses to start the water vibrating, we have a continuous field of applied pressure.
(PRESSURE FIELD APPLIED TO WATER INITIALLY)
Just like with the image of the lizard, we can look at this contiuous field as simply being a bunch of scaled impulses smushed up next to eachother. To find the wave caused by this, we just need to find the wave caused by a single impulse, translate it, scale it, and add them all up
(FINAL WAVE PICTURE)
Again, we recognize this as a convolution of the wave caused by a single impulse and the pressure field of the extended object:
These two examples give us a bit of a feel for what convolution is. If we know how a certain system reacts to a simple impulse, we can figure out how that system reacts to
anything, by first breaking it down into impulses, sending each through individually, and adding them up at the end. When we add them all up, we have to make sure to shift all the functions appropriately and scale them (so it matches up with the input) and taking care of this turns our sum into a convolution integral. Convolution is just the continuous analog of the problem solving strategy "break it down into small parts, solve those, and put em back together at the end".
Convolution seems to be a quite common operation throughout mathematics though; so let's see if we can find other places where it arises to further broaden our intuition.
Consider for a moment a metal rod which we have heated in some way. If we let it sit out, it will obviously cool down, but how do we express this quantitatively? Via a partial differential equation known aptly as the "heat equation", given the initial temperature of our rod (as a function of position), we can solve for its temperature at all future times.
For a general initial temperature distribution this may be a hard thing to do. So let's see if we can stick to the reasoning that's proved fruitful so far, and consider the effect one "impulse" of heat has on our rod. The mathematical model for this is
The solution to this problem is called the "fundamental solution", and can be visualized as follows:
In this image, the red represents "hot" and the blue "cold". At t=0, we apply an impulse of heat to our rod (think of touching a soldering iron to it), and as time progresses that heat spreads out and evens out, as we would expect it to. We can alternatively choose to plot the heat distribution in a "standard" graph:
Where the vertical dimension gives the temperature at position x.
Solving the heat equation analytically for this particular initial condition, we can arrive at a closed form of this fundamental solution (here we will take the constant to be unity)
The most important part however is not the form of this solution, but rather the form of the problem: the transformation of our initial impulse into a different function:
If we had decided to place the soldering iron at a different location than the origin, we would expect the same heat distribution to result, and so this process is translation invariant. Had we placed two soldering irons, we would expect the result to be the sum of two dispersing heat waves, and if we had placed a hotter iron, we would expect a hotter rod; the heat equation is linear and translation invariant.
In abstract-land, this problem is identical to both of the above. We have some sort of a correspondence between an impulse and a function, and we can write this correspondence in a linear, translation-invariant manner. Thus, we expect that if we heat the bar via some continuous distribution instead of an impulse, we can find the final solution to our problem by convoluting the initial temperature with the impulse response. As an example lets say we heat the bar to the right of the origin, and cool it to the left so that the initial temperature distribution looks something like this:
Or, in symbols:
To solve for the temperature distribution caused by this initial condition, we will view
f as being composed of a bunch of mini impulses, right next to eachother. Feeding each impulse through the heat equation will give us a shifted and translated version of our fundamental solution, and adding them all back together will give us the answer we seek. This is of course, just the convolution of the initial condition and the fundamental solution:
Since our initial condition is only nonzero over the finite range (-3,3), we can re-write this integral as
Performing this integral, we get
Which is much more easy to understand pictorally:
We can see qualitatively that the temperature distribution tends to "smooth out" as time progresses, much as we would expect. This example provides some good mathematical justification for the use of convolution: in addition to it being intuitively simpler to break a problem into impulses and then just re-combine them, it's hard to see how to even come up with an analytic expression as complicated as the solution here if we had not first solved for the fundamental solution, and then gone used convolution formally to compute the desired answer.
There's one more type of problem where convolution is common in its solution:
How do we mathematically formalize the concept of an impulse? We want this impulse to have a few specific properties
- Be of unit "size"
- Point-like
- Centered at the origin
To satisfy the first property, we must first understand what we mean by the "size" of a function. For our purposes we will say that size is equivalent to the area under that function; in symbols
By being point-like and centered at the origin, we mean that our impulse is pretty much zero at all other points. This fits well with our intuitive examples of impulses, the small pebbles and starlight. How could we go about constructing some function like this? For a starting place, look at the normal curve:
By construction this curve has a total integral of 1, and it's roughly concentrated about the origin. What can we do to make it more "point-like"? The following animation shows a sequence of normal curves (each of total integral 1) with increasingly small standard deviations.
As the standard deviation decreases, the curve becomes more and more centered at the origin, which is precisely what we want. It appears that if we shrink the standard deviation towards zero, the curve will collapse onto the origin into one single, infinitely thin spike.
Since all of the normal curves limiting towards this spike had area 1, we will say this spike has "area" 1 as well, it is the "unit" spike. This guy will serve us perfectly as a unit impulse.
However, this isn't the only way to define our impulse. What if instead of squishing normal curves towards the origin, we squished a unit square? As its base gets smaller and smaller, it must grow taller and taller (to preserve the unit area inside of it) and again in the limit we will end up with a spike-like object centered on the origin, which we can say has unit area. In fact, there are a bunch of different curves we could use in a limiting procedure like this, and our final notion of impulse is independent of that choice.
|