First of all, I must apologise for the fact that I haven't been updating my blog as much as I'd intended to over the past few weeks. As often happens, real life has gotten in the way, including (for example) my graduation. I can now officially carry the post-nominal letters BScAdv(Hons), although anyone concerned for my ever-inflating ego will no doubt be glad to hear that I have very little intention of doing so.
In terms of the blog, I'm currently working on a post on Fourier analysis and the following post will likely be on the physics of mechanical flight, by popular request. At the current rate I'm going that might not be for a while though!
In physics news, since my last news post some rumours have arisen that the BICEP2 results may not be as sound as initially claimed. Basically, Adam Falkowski has claimed that the BICEP2 collaboration have miscalculated the galactic foreground radiation by misinterpreting an image on a Planck collaboration slide, and their primordial polarisations can mostly be accounted for due to this error (a claim that the BICEP2 team strongly denies). A sceptical take on the rumour is provided by Sesh Nadathur, who argues that the issue has been blown far out of proportion. It will be interesting to see how things unfold in the coming months!
Thursday, May 29, 2014
Monday, April 21, 2014
Why Heisenberg uncertainty is not that weird
Whenever quantum mechanics (QM) is brought up in a popular context, in a scientific or pseudo-scientific way, the 'weirdness' of it is almost always mentioned, and the Heisenberg uncertainty principle$^1$ is almost always the go-to example of the weirdness (although in pseudo-scientific contexts it is almost always misrepresented).
So what is Heisenberg uncertainty? Simply put, it is a restriction on the accuracy of simultaneous measurements of 'observables' (measurable quantities). The prototypical example is position and momentum; Heisenberg uncertainty states that the position and momentum of a particle$^2$ cannot be known simultaneously to arbitrary precision. The mathematical statement of the position-momentum uncertainty principle is
\begin{equation}
\Delta x\Delta p\leq\frac{\hbar}{2}
\end{equation}
where $\hbar$ is the reduced Planck constant and $\Delta x$, $\Delta p$ represent (in some sense)$^3$ the uncertainty in $x$ and $p$ respectively. Effectively, the better you know position, the less well you know momentum, and vice versa. This is not a specifically experimental limitation, but a fundamental theoretical one. This sort of restriction, on first viewing, indeed seems very strange and certainly counter-intuitive. I will attempt to convince you that not only is it not necessarily strange, but expected.
What is important to remember about QM is that wave mechanics is a central theme. Particles are represented by wave-functions, which are complex solutions to the Schrödinger equation,$^4$ and this wave-nature contributes to a good deal of the quantum weirdness we are familiar with (an example is shown in a recent blog post of mine which relies on superposition and destructive interference of photon waves). With this in mind, let's take a look at some wave-functions.
For illustrative purposes, we will work in one spatial dimension and free space (zero potential everywhere) and only consider time-independent wave-functions. The simplest example of such a wave-function is the plane wave, which takes the form
\begin{equation}
\psi(x)=Ae^{ikx}\equiv Ae^{ipx/\hbar}.
\end{equation}
Here $A$ is the complex-valued amplitude. The amplitude in this case is not important, because any wave-function must be normalisable (as the probability distribution function, which must integrate to 1 over all space to preserve conservation of probability, is given by $|\psi(x)|^2$, known as the Born rule) and so $A$ will need to be scaled anyway. For those who are unfamiliar with complex exponential form, the waviness is more explicit in the less compact form $\exp{(ikx)}\equiv\cos{(kx)}+i\sin{(kx)}$. The $k$ is the wave-number (or wave-vector in higher dimensions), and this term appears naturally in most of the mathematics I'm presenting in this post. For this reason I will include the version of the equations with $k$ alongside the version with the more physically immediate momentum $p$ (the conversion is simply $p=\hbar k$).
Because the Schrödinger equation is linear, sums of solutions are themselves solutions (this property is known as the superposition principle). That means we can have wave-functions of the form
\begin{equation}
\psi(x)=\sum_{m=0}^{n}A_me^{ip_mx/\hbar}
\end{equation}
for any arbitrary $n$ (finite or infinite). Here the scaling of the $A$ values is still not important because of normalisation, but the relative magnitude of them is, as this determines the relative probability weightings according to the Born rule. However, because we have wavelength $\lambda_m=2\pi/k_m\equiv2\pi\hbar/p_m$, we can see that the above formulation does not capture the full number of possible modes; only integer multiples of the $m=0$ mode are captured. In free space all modes are permissible and so we can take the continuum limit (let the sum turn into an integral):
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\tilde{\psi}(k) e^{ikx}\mathrm{d}k\equiv\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\tilde{\psi}(p) e^{ipx/\hbar}\mathrm{d}p.
\end{equation}
Because we have moved away from integer indexing, the discrete set of amplitudes $A_m$ is replaced by the continuous function that is suggestively symbolled $\tilde{\psi}$. The function ranges over $p$ because we are integrating over all possible modes/wavelengths/momenta—in a sense $p$ takes over the role of index in the integral from $m$ in the summation. The factor of $1/\sqrt{2\pi}$ is a matter of convention and the factor of $1/\sqrt{\hbar}$ comes from the change of $k$ to $p$.
There is more to $\tilde{\psi}$ than meets the eye. Not only is it the amplitude function for the integral, but it's actually the wave-function itself, except not in physical space like $\psi$ but in momentum space.$^5$ In the context of QM this is known as the momentum space representation of the wave-function, but more broadly the mathematical construct is known as the Fourier transform,$^6$ and Fourier transforms occur very frequently in all manner of physical theories involving waves, be they QM, acoustics, optics, crystallography, signal analysis and so on.
So how do we determine the form of $\tilde{\psi}$? As it turns out, perhaps unsurprisingly, the Fourier transform is invertible, and so we find that
\begin{equation}
\tilde{\psi}(p)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\psi(x) e^{-ipx/\hbar}\mathrm{d}x.
\end{equation}
All well and good, but how do we make sense of it? Well, let's consider a limiting case. We can select out a single mode by using a Dirac delta such that $\tilde{\psi}(p)=\delta(p-p_0)$. This Dirac delta is zero everywhere except at $p=p_0$, where it is undefined but the area under the Dirac delta is always normalised to $1$. Inserting the delta into equation (4) yields$^7$
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\delta(p-p_0)e^{ipx/\hbar}\mathrm{d}p=\frac{e^{ip_0x/\hbar}}{\sqrt{2\pi\hbar}},
\end{equation}
which is, outside a numerical factor, effectively a complex exponential in $x$, or equivalently, a flat (complex-valued) wave across all space (applying the Born rule to a plane wave yields a probability distribution of $|e^{iz}|^2=\text{const.}$). So when the momentum is maximally well-defined (put into a single mode) the position is maximally poorly-defined (the wave-function is spread evenly across all space). By the invertibility of the Fourier transform, we can expect the vice versa case to hold (a maximally defined position, i.e., Dirac delta in $x$, will result in a maximally poorly-defined momentum, i.e., an even wave across all momentum space). This relationships lies at the heart of the Heisenberg uncertainty principle.
As hinted to at that start of the post, position and momentum are not the only observables which obey Heisenberg uncertainty. The next most common pairing$^8$ is energy and time,$^9$ although uncertainty relationships can be generated more generally by taking the derivative of the (classical) action (and quantising). For example, momentum is the derivative of the action with respect to position, energy is the derivative of the action with respect to time, and so on.
Hopefully I have demonstrated that Heisenberg uncertainty is not so strange as might have first appeared. Rather than an arbitrary restriction on how accurately we can know certain measurable quantities, it is in fact a basic and unavoidable feature of any linear wave theory, of which quantum mechanics is only one example, albeit of a more fundamental and therefore perhaps more intuitively challenging sort than most.
$2$. I hesitate to use the word 'particle', as it is important when talking about these quantum concepts to be very clear about what one means. A better term might have been 'quantum' or 'wave-particle', as the wave-nature of QM is central to the discussion of Heisenberg uncertainty. However, despite the slightly misleading connotations, 'particle' is by far the most commonly used term and so I will use it also.
$3$. We could, for example, take the standard deviation in $x$ and $p$, $\sigma_x$ and $\sigma_p$, as these will be represented by continuous distributions.
$4$. In this post I will use the Schrödinger representation of QM as this makes explicit the wave-nature of the wave-function. However, there are many, many representations of QM, each with their own advantages (and disadvantages) when it comes to analysing real systems, but importantly they are all exactly equivalent and so this wave-nature I have been emphasising is intrinsic to all of them. In that sense I could just as well have chosen any representation and this blog post would otherwise have been identical, although perhaps not as easy to understand.
$5$. If you don't know what momentum space is, the most important thing to understand is that there are mathematical spaces other than the space(time) we are familiar with. Let's consider the flat 3-dimensional "position" space (3-space) of ordinary life, with $x$-, $y$- and $z$-directions. Suppose we have an object at the coordinates $x=1$, $y=0$ and $z=0$. This can be represented by a vector in the 3-space going to the point $(1,0,0)$, thus describing the position. Now suppose the 3-space we are looking at is part of a 4-space that includes time, except we are going to set the time to some instant and freeze it there.
Let's say at that instant the object at $(1,0,0)$ is travelling with a momentum of $0$ in the $x$-direction, $1$ in the $y$-direction and $0$ in the $z$-direction (in arbitrary units). We could then construct a "momentum" 3-space (or 4-space including time) with directions $p_x$, $p_y$ and $p_z$ and at that instant of time the vector corresponding to the object would be at the coordinates $(0,1,0)$. So for any $n$-dimensional position space it's easy to see there is a corresponding $n$-dimensional momentum space. In fact, we can define a $2n$-dimensional space known as the phase space by combining the position and momentum spaces, and that space will describe all possible states of a physical system.
$6$. I want to stress that strictly speaking the momentum representation of the wave-function is not the Fourier transform of the position representation. The Fourier transform dual to position is the wave-vector (or wave-number in 1-D) and not the momentum, although given one is a scalar multiple of the other I feel like we can be a little loose in our communication in this one respect.
$7$. In evaluating equation (6) we have made use of the so-called sifting property of the Dirac delta, where $\int_{-\infty}^{\infty}\delta(x-a)f(x)\mathrm{d}x=f(a)$. This property is analogous to the Kronecker delta for sums but used instead of integrals and is arguably its most useful feature.
$8$. These pairs of variables are typically referred to in physics as 'conjugate variables', although in this context we can also call them Fourier transform duals. It is important to remember that we would be less inclined to refer to them as such if we were working in, for example, the Heisenberg representation of QM where the Heisenberg uncertainty principle arises more directly out of the non-commutativity of Hermitian operator matrices. This is only because the Fourier transforms are implicit in that representation; they are still there in some sense due to the equivalence of representations of QM as discussed in Note 4, but are not nearly as obvious.
$9$. The energy-time Heisenberg uncertainty principle is given mathematically as $\Delta E\Delta t=\hbar/2$. This is analogous to position-momentum uncertainty in the sense that the Fourier dual of position is wave-number $k=p/\hbar$ and not momentum directly; the Fourier dual of time is technically angular frequency $\omega=E/\hbar$ and not energy directly. As in the position-momentum case, however, the difference is only a scalar factor of $\hbar$ and so we can speak reasonably loosely with some impunity.
So what is Heisenberg uncertainty? Simply put, it is a restriction on the accuracy of simultaneous measurements of 'observables' (measurable quantities). The prototypical example is position and momentum; Heisenberg uncertainty states that the position and momentum of a particle$^2$ cannot be known simultaneously to arbitrary precision. The mathematical statement of the position-momentum uncertainty principle is
\begin{equation}
\Delta x\Delta p\leq\frac{\hbar}{2}
\end{equation}
where $\hbar$ is the reduced Planck constant and $\Delta x$, $\Delta p$ represent (in some sense)$^3$ the uncertainty in $x$ and $p$ respectively. Effectively, the better you know position, the less well you know momentum, and vice versa. This is not a specifically experimental limitation, but a fundamental theoretical one. This sort of restriction, on first viewing, indeed seems very strange and certainly counter-intuitive. I will attempt to convince you that not only is it not necessarily strange, but expected.
What is important to remember about QM is that wave mechanics is a central theme. Particles are represented by wave-functions, which are complex solutions to the Schrödinger equation,$^4$ and this wave-nature contributes to a good deal of the quantum weirdness we are familiar with (an example is shown in a recent blog post of mine which relies on superposition and destructive interference of photon waves). With this in mind, let's take a look at some wave-functions.
For illustrative purposes, we will work in one spatial dimension and free space (zero potential everywhere) and only consider time-independent wave-functions. The simplest example of such a wave-function is the plane wave, which takes the form
\begin{equation}
\psi(x)=Ae^{ikx}\equiv Ae^{ipx/\hbar}.
\end{equation}
Here $A$ is the complex-valued amplitude. The amplitude in this case is not important, because any wave-function must be normalisable (as the probability distribution function, which must integrate to 1 over all space to preserve conservation of probability, is given by $|\psi(x)|^2$, known as the Born rule) and so $A$ will need to be scaled anyway. For those who are unfamiliar with complex exponential form, the waviness is more explicit in the less compact form $\exp{(ikx)}\equiv\cos{(kx)}+i\sin{(kx)}$. The $k$ is the wave-number (or wave-vector in higher dimensions), and this term appears naturally in most of the mathematics I'm presenting in this post. For this reason I will include the version of the equations with $k$ alongside the version with the more physically immediate momentum $p$ (the conversion is simply $p=\hbar k$).
Because the Schrödinger equation is linear, sums of solutions are themselves solutions (this property is known as the superposition principle). That means we can have wave-functions of the form
\begin{equation}
\psi(x)=\sum_{m=0}^{n}A_me^{ip_mx/\hbar}
\end{equation}
for any arbitrary $n$ (finite or infinite). Here the scaling of the $A$ values is still not important because of normalisation, but the relative magnitude of them is, as this determines the relative probability weightings according to the Born rule. However, because we have wavelength $\lambda_m=2\pi/k_m\equiv2\pi\hbar/p_m$, we can see that the above formulation does not capture the full number of possible modes; only integer multiples of the $m=0$ mode are captured. In free space all modes are permissible and so we can take the continuum limit (let the sum turn into an integral):
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\tilde{\psi}(k) e^{ikx}\mathrm{d}k\equiv\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\tilde{\psi}(p) e^{ipx/\hbar}\mathrm{d}p.
\end{equation}
Because we have moved away from integer indexing, the discrete set of amplitudes $A_m$ is replaced by the continuous function that is suggestively symbolled $\tilde{\psi}$. The function ranges over $p$ because we are integrating over all possible modes/wavelengths/momenta—in a sense $p$ takes over the role of index in the integral from $m$ in the summation. The factor of $1/\sqrt{2\pi}$ is a matter of convention and the factor of $1/\sqrt{\hbar}$ comes from the change of $k$ to $p$.
There is more to $\tilde{\psi}$ than meets the eye. Not only is it the amplitude function for the integral, but it's actually the wave-function itself, except not in physical space like $\psi$ but in momentum space.$^5$ In the context of QM this is known as the momentum space representation of the wave-function, but more broadly the mathematical construct is known as the Fourier transform,$^6$ and Fourier transforms occur very frequently in all manner of physical theories involving waves, be they QM, acoustics, optics, crystallography, signal analysis and so on.
So how do we determine the form of $\tilde{\psi}$? As it turns out, perhaps unsurprisingly, the Fourier transform is invertible, and so we find that
\begin{equation}
\tilde{\psi}(p)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\psi(x) e^{-ipx/\hbar}\mathrm{d}x.
\end{equation}
All well and good, but how do we make sense of it? Well, let's consider a limiting case. We can select out a single mode by using a Dirac delta such that $\tilde{\psi}(p)=\delta(p-p_0)$. This Dirac delta is zero everywhere except at $p=p_0$, where it is undefined but the area under the Dirac delta is always normalised to $1$. Inserting the delta into equation (4) yields$^7$
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\delta(p-p_0)e^{ipx/\hbar}\mathrm{d}p=\frac{e^{ip_0x/\hbar}}{\sqrt{2\pi\hbar}},
\end{equation}
which is, outside a numerical factor, effectively a complex exponential in $x$, or equivalently, a flat (complex-valued) wave across all space (applying the Born rule to a plane wave yields a probability distribution of $|e^{iz}|^2=\text{const.}$). So when the momentum is maximally well-defined (put into a single mode) the position is maximally poorly-defined (the wave-function is spread evenly across all space). By the invertibility of the Fourier transform, we can expect the vice versa case to hold (a maximally defined position, i.e., Dirac delta in $x$, will result in a maximally poorly-defined momentum, i.e., an even wave across all momentum space). This relationships lies at the heart of the Heisenberg uncertainty principle.
As hinted to at that start of the post, position and momentum are not the only observables which obey Heisenberg uncertainty. The next most common pairing$^8$ is energy and time,$^9$ although uncertainty relationships can be generated more generally by taking the derivative of the (classical) action (and quantising). For example, momentum is the derivative of the action with respect to position, energy is the derivative of the action with respect to time, and so on.
Hopefully I have demonstrated that Heisenberg uncertainty is not so strange as might have first appeared. Rather than an arbitrary restriction on how accurately we can know certain measurable quantities, it is in fact a basic and unavoidable feature of any linear wave theory, of which quantum mechanics is only one example, albeit of a more fundamental and therefore perhaps more intuitively challenging sort than most.
Notes
$1$. The Heisenberg uncertainty principle is so ubiquitous in quantum physics that it is frequently referred to simply as 'the uncertainty principle'. Out of habit, however, I tend to use the less common 'Heisenberg uncertainty' to differentiate it from other, admittedly much less common uncertainty relations. As far as I know, any of these uses are considered acceptable.$2$. I hesitate to use the word 'particle', as it is important when talking about these quantum concepts to be very clear about what one means. A better term might have been 'quantum' or 'wave-particle', as the wave-nature of QM is central to the discussion of Heisenberg uncertainty. However, despite the slightly misleading connotations, 'particle' is by far the most commonly used term and so I will use it also.
$3$. We could, for example, take the standard deviation in $x$ and $p$, $\sigma_x$ and $\sigma_p$, as these will be represented by continuous distributions.
$4$. In this post I will use the Schrödinger representation of QM as this makes explicit the wave-nature of the wave-function. However, there are many, many representations of QM, each with their own advantages (and disadvantages) when it comes to analysing real systems, but importantly they are all exactly equivalent and so this wave-nature I have been emphasising is intrinsic to all of them. In that sense I could just as well have chosen any representation and this blog post would otherwise have been identical, although perhaps not as easy to understand.
$5$. If you don't know what momentum space is, the most important thing to understand is that there are mathematical spaces other than the space(time) we are familiar with. Let's consider the flat 3-dimensional "position" space (3-space) of ordinary life, with $x$-, $y$- and $z$-directions. Suppose we have an object at the coordinates $x=1$, $y=0$ and $z=0$. This can be represented by a vector in the 3-space going to the point $(1,0,0)$, thus describing the position. Now suppose the 3-space we are looking at is part of a 4-space that includes time, except we are going to set the time to some instant and freeze it there.
Let's say at that instant the object at $(1,0,0)$ is travelling with a momentum of $0$ in the $x$-direction, $1$ in the $y$-direction and $0$ in the $z$-direction (in arbitrary units). We could then construct a "momentum" 3-space (or 4-space including time) with directions $p_x$, $p_y$ and $p_z$ and at that instant of time the vector corresponding to the object would be at the coordinates $(0,1,0)$. So for any $n$-dimensional position space it's easy to see there is a corresponding $n$-dimensional momentum space. In fact, we can define a $2n$-dimensional space known as the phase space by combining the position and momentum spaces, and that space will describe all possible states of a physical system.
$6$. I want to stress that strictly speaking the momentum representation of the wave-function is not the Fourier transform of the position representation. The Fourier transform dual to position is the wave-vector (or wave-number in 1-D) and not the momentum, although given one is a scalar multiple of the other I feel like we can be a little loose in our communication in this one respect.
$7$. In evaluating equation (6) we have made use of the so-called sifting property of the Dirac delta, where $\int_{-\infty}^{\infty}\delta(x-a)f(x)\mathrm{d}x=f(a)$. This property is analogous to the Kronecker delta for sums but used instead of integrals and is arguably its most useful feature.
$8$. These pairs of variables are typically referred to in physics as 'conjugate variables', although in this context we can also call them Fourier transform duals. It is important to remember that we would be less inclined to refer to them as such if we were working in, for example, the Heisenberg representation of QM where the Heisenberg uncertainty principle arises more directly out of the non-commutativity of Hermitian operator matrices. This is only because the Fourier transforms are implicit in that representation; they are still there in some sense due to the equivalence of representations of QM as discussed in Note 4, but are not nearly as obvious.
$9$. The energy-time Heisenberg uncertainty principle is given mathematically as $\Delta E\Delta t=\hbar/2$. This is analogous to position-momentum uncertainty in the sense that the Fourier dual of position is wave-number $k=p/\hbar$ and not momentum directly; the Fourier dual of time is technically angular frequency $\omega=E/\hbar$ and not energy directly. As in the position-momentum case, however, the difference is only a scalar factor of $\hbar$ and so we can speak reasonably loosely with some impunity.
Thursday, April 10, 2014
News (2014/04/10)
Two items of news to report this week (well, one and a half at least). The half-piece of news is that I'm working on a new blog post which will hopefully be posted some time next week (although it might take until the week after).
The real news is that the LHC has confirmed the existence of Z(4430), a so-called "exotic hadron". Hadrons are composite particles made of quarks (and held together by gluons). According to the quark model, hadrons can only form in one of two ways: a quark-antiquark pairing (known as a meson) and in a quark triplet (know as a baryon).
The most common hadrons in the universe are protons and neutrons which form atomic nuclei. Protons consist of 2 up-type quarks and 1 down-type quark ("uud") while neutrons consist of 1 up quark and 2 down quarks ("udd"). The names and details of quark types, which are known as "flavours", is not something I will go into here, as it is an interesting enough topic to deserve its own post (although a proper explanation for the layperson would need a little more than that I think).
The quark model is a simple one though and does not describe all of the dynamics permitted by quantum chromodynamics ("QCD", the part of the Standard Model that describes strong interactions). This leaves open the door for exotic hadrons which are not mesons or baryons. Z(4430) is one such exotic hadron.
It was first 'discovered' in 2007 (although 5-sigma confirmation didn't come until 2008) and has now been observed at the LHCb experiment at 13.9-sigma accuracy. This means the chances of the observation being a statistical fluke are $1$ in $1.579\times10^{43}$ (a very, very, very large number indeed). It is believed to be a tetraquark made up of 1 charm quark, 1 charm antiquark, 1 down quark and 1 up antiquark ("ccdu").
While perhaps not as exciting as, for example, the BICEP2 result recently, this confirmation is still a very interesting result and will hopefully spur on further developments in the search for exotic hadrons.
The real news is that the LHC has confirmed the existence of Z(4430), a so-called "exotic hadron". Hadrons are composite particles made of quarks (and held together by gluons). According to the quark model, hadrons can only form in one of two ways: a quark-antiquark pairing (known as a meson) and in a quark triplet (know as a baryon).
The most common hadrons in the universe are protons and neutrons which form atomic nuclei. Protons consist of 2 up-type quarks and 1 down-type quark ("uud") while neutrons consist of 1 up quark and 2 down quarks ("udd"). The names and details of quark types, which are known as "flavours", is not something I will go into here, as it is an interesting enough topic to deserve its own post (although a proper explanation for the layperson would need a little more than that I think).
The quark model is a simple one though and does not describe all of the dynamics permitted by quantum chromodynamics ("QCD", the part of the Standard Model that describes strong interactions). This leaves open the door for exotic hadrons which are not mesons or baryons. Z(4430) is one such exotic hadron.
It was first 'discovered' in 2007 (although 5-sigma confirmation didn't come until 2008) and has now been observed at the LHCb experiment at 13.9-sigma accuracy. This means the chances of the observation being a statistical fluke are $1$ in $1.579\times10^{43}$ (a very, very, very large number indeed). It is believed to be a tetraquark made up of 1 charm quark, 1 charm antiquark, 1 down quark and 1 up antiquark ("ccdu").
While perhaps not as exciting as, for example, the BICEP2 result recently, this confirmation is still a very interesting result and will hopefully spur on further developments in the search for exotic hadrons.
Thursday, April 3, 2014
Learning to learn
It has happened a few times over the past couple of years that someone will come up to me with some piece of mathematics that they're having trouble with and ask me if I can help them with it. I'll have a look over it and say something along the lines of "I'm not familiar with this specifically, but I think I should be able to pick it up pretty easily". The usual response to this involves some degree of indignation.
Of course I understand why. Nobody likes to feel stupid or belittled, and this is certainly not my intention, but I can see how someone might feel that way. If I were in that position it would probably be my first response as well. But with that said, I don't think it's an entirely reasonable one.
I'm not trying to show off how clever I am. They were the ones to come to me asking for my help (and if I can I try my best to). Presumably they came to me because they expected I had some degree of expertise, but I get the impression the sort of expertise they were expecting was that of having been taught the subject previously. Sometimes this is the case of course, but mathematics is such a broad topic that you will rarely have seen everything at a sub-postgraduate level no matter how educated you are.
The point of having gone to university and graduated with a degree (even a bachelor's) specialising in mathematics or a mathematics-heavy field like physics is not to simply learn as much of a field as you can and leave it there. The point is to equip yourself with the tools to teach yourself new things, and I hope to some extent I have managed to do that. I've taught myself things I never saw at university and so I know my limitations. Some concepts and techniques I'm reasonably familiar with and others I'm not; some things take more time and more effort to learn than others. It just happens that through study and experience I've managed to achieve the minimal level of competence to teach myself what they're having trouble with.
So my exasperation does not come from a place of ignorance. I have struggled with maths in the past, I know what it's like. The fact remains, however, that my degree would be worth very little if I hadn't picked up the ability to learn high school- or early undergraduate-level maths with relative ease. If someone's problem left me just as nonplussed as they were, then I should probably be seeking partial reimbursement from my university. I certainly hope they don't take it as a personal slight; I have no doubt that if they had studied what I had then they would have just as little trouble with their present difficult as I might (and, I would hazard, quite possibly even less so).
Of course I understand why. Nobody likes to feel stupid or belittled, and this is certainly not my intention, but I can see how someone might feel that way. If I were in that position it would probably be my first response as well. But with that said, I don't think it's an entirely reasonable one.
I'm not trying to show off how clever I am. They were the ones to come to me asking for my help (and if I can I try my best to). Presumably they came to me because they expected I had some degree of expertise, but I get the impression the sort of expertise they were expecting was that of having been taught the subject previously. Sometimes this is the case of course, but mathematics is such a broad topic that you will rarely have seen everything at a sub-postgraduate level no matter how educated you are.
The point of having gone to university and graduated with a degree (even a bachelor's) specialising in mathematics or a mathematics-heavy field like physics is not to simply learn as much of a field as you can and leave it there. The point is to equip yourself with the tools to teach yourself new things, and I hope to some extent I have managed to do that. I've taught myself things I never saw at university and so I know my limitations. Some concepts and techniques I'm reasonably familiar with and others I'm not; some things take more time and more effort to learn than others. It just happens that through study and experience I've managed to achieve the minimal level of competence to teach myself what they're having trouble with.
So my exasperation does not come from a place of ignorance. I have struggled with maths in the past, I know what it's like. The fact remains, however, that my degree would be worth very little if I hadn't picked up the ability to learn high school- or early undergraduate-level maths with relative ease. If someone's problem left me just as nonplussed as they were, then I should probably be seeking partial reimbursement from my university. I certainly hope they don't take it as a personal slight; I have no doubt that if they had studied what I had then they would have just as little trouble with their present difficult as I might (and, I would hazard, quite possibly even less so).
Sunday, March 30, 2014
A very dangerous factory
Suppose a new type of bomb is invented whose detonation device is so incredibly sensitive that if it comes into contact with a single particle it will explode. Putting aside the impracticality of such a weapon (and the obvious factory OH&S issues), the producer wishes to maintain quality control as, with anything, some bombs will be faulty and not have detonation devices attached. The question immediately arises: Is it possible to have some ensemble of bombs which we can guarantee contains no faulty weapons?
This question is known as the Elitzur-Vaidman bomb-testing problem, and although one can arrive after reasonably little thought at the fairly obvious answer that no such ensemble is possible (as any direct observation using light or matter will detonate any working bombs), in actual fact such an ensemble is possible! How can this be the case? The short answer is: quantum effects. The long answer? Read on!
The solution to this problem involves the use of a Mach-Zehnder interferometer (Fig. 1) with a single-photon source. To see how, let's consider the case of the interferometer without any bomb in place. We then have
\begin{align}\label{eq:MZ}
\left|s\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left(\frac{1}{\sqrt{2}}\left|c\right\rangle + \frac{i}{\sqrt{2}}\left|d\right\rangle\right) + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right) \nonumber \\
&= \frac{i}{2}\left|c\right\rangle + \frac{-1}{2}\left|d\right\rangle + \frac{i}{2}\left|c\right\rangle + \frac{1}{2}\left|d\right\rangle \nonumber \\
&= i\left|c\right\rangle,
\end{align}
where $\left|a\right\rangle$ represents the quantum state in the $a$-branch of the interferometer (as labelled in Fig. 1) and $i$ is the imaginary unit.$^{1}$ What the above calculation shows$^{2}$ is that (somewhat surprisingly) despite the branching at the second beam-splitter, destructive interference along $d$ and constructive interference along $c$ causes the photon to always be detected at $C$ and never at $D$ (for this alignment).
Now let's consider the same Mach-Zehnder interferometer but with a bomb placed such that the detector will be along the $u$-branch (as shown in Fig. 2). In this case we have
\begin{align}\label{eq:bomb}
\left|s\right\rangle\left|B_0\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle\left|B_0\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right)\left|B_0\right\rangle \nonumber \\
&= \frac{i}{\sqrt{2}}\left|X\right\rangle +\frac{i}{2}\left|c\right\rangle\left|B_0\right\rangle + \frac{1}{2}\left|d\right\rangle\left|B_0\right\rangle,
\end{align}
where $\left|B_0\right\rangle$ is the 'primed' or unexploded bomb, $\left|X\right\rangle$ represents the state where the bomb has been detonated$^3$ and $\left|a\right\rangle\left|b\right\rangle\equiv\left|a\right\rangle\otimes\left|b\right\rangle$. Note that for the purposes of this thought experiment we are assuming the detonator is a perfect detector, i.e., the photon wave cannot travel down $u$ without being absorbed.
As is clear from equation \ref{eq:bomb}, the inclusion of the detonator destroys the constructive/destructive interference that caused the simplification in equation \ref{eq:MZ}. Therefore, in the detonator case, rather than having every photon detected at $C$, we have the photon detected at $C$ with a probability of $1/4$, detected at $D$ with a probability of $1/4$ and the bomb detonated with a probability of $1/2$.$^4$
This is what makes it possible to assemble a set of functional bombs without detonating them—if a photon is detected by $D$ then the bomb must have a detonator attached and so we can set it aside knowing it works. If a photon is detected by $C$ then the functionality is indeterminate as we expect a detection at $C$ with non-zero probability in both detonator and no-detonator cases, but this is not a problem as we can simply emit another photon and re-run the test.
Note that while the probabilities above can be derived (in a fairly straightforward manner) from classical principles, we cannot apply a classical interpretation here as the quantum nature of the experiment is indispensable. In the classical (many-photon) run it is possible to both detonate a bomb and make a detection at $D$; this is precluded in the quantum case as the single photon cannot be absorbed by multiple objects. Furthermore, it is the wave-nature of the photon that permits the destructive interference at $D$ in the no-detonator case and thus provides 'detection by $D$' to signify the presence of the detonator and thus successfully make an 'interaction-free' measurement.
If you're unconvinced of this argument because it is based on a purely theoretical consideration, consider that this thought experiment has (equivalently) been carried out in the real world (admittedly using an ordinary detector rather than a bomb) and in fact was first done about a year after this problem was first published. I can't speak to the practical applications, if any exist, but I love this problem regardless for the simple fact that the solution challenges your intuition but can be understood using reasonably straightforward quantum mechanical principles.
Consider a beam-splitter as shown in Fig. 3. This system can be represented by the matrix equation $\left|\psi_3,\psi_4\right\rangle = \hat{B}\left|\psi_1,\psi_2\right\rangle$, or explicitly,
\begin{equation}\label{eq:BSM}
\begin{pmatrix}
\psi_3 \\ \psi_4
\end{pmatrix}
=
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
\begin{pmatrix}
\psi_1 \\ \psi_2
\end{pmatrix},
\end{equation}
where $T$ and $R$ are the transmission and reflection coefficients respectively. In the experiment we assume an ideal, lossless beam-splitter which demands that the beam-splitter matrix be unitary, i.e., $\hat{B}^{\dagger}\hat{B}=\hat{\mathbb{I}}$, or,
\begin{equation}\label{eq:unitary}
\begin{pmatrix}
T^{\ast} & R^{\ast} \\ R^{\ast} & T^{\ast}
\end{pmatrix}
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 \\ 0 & 1
\end{pmatrix}.
\end{equation}
Equation \ref{eq:unitary} immediately implies the following relations:
\begin{equation}
|T|^2+|R|^2=1,
\end{equation}
\begin{equation}\label{eq:0}
T^{\ast}R+R^{\ast}T=0.
\end{equation}
As $T$ and $R$ are complex numbers, we can represent them in polar form as $T=|T|e^{i\theta_T}$ and $R=|R|e^{i\theta_R}$. For simplicity we choose $\theta_T=0$ and thus $T=|T|\implies T^{\ast}=T$ and so equation \ref{eq:0} becomes
\begin{align}\label{eq:0new}
T|R|e^{i\theta_R}+|R|e^{-i\theta_R}T&=0 \nonumber \\
2T|R|\cos{\left(\theta_R\right)}&=0
\end{align}
where we have made use of the identity $\cos{(\alpha)}=e^{i\alpha}/2+e^{-i\alpha}/2$. Equation \ref{eq:0new} is satisfied by $\theta_R=n\pi+\pi/2, n\in\mathbb{Z}$, but we will choose $n=0\implies\theta_R=\pi/2$ for simplicity, which in turn gives $R=|R|e^{i\pi/2}=i|R|$.
Finally, as the beam-splitter is 50:50 (50% transmission, 50% reflection) we demand $|T|=|R|=1/\sqrt{2}$ and so the beam-splitter matrix is given by
\begin{equation}\label{eq:B}
\hat{B}=\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 & i \\ i & 1
\end{pmatrix}.
\end{equation}
It should be clear that equation \ref{eq:B} is not a unique representation of $\hat{B}$; another choice of $\theta_T$ and/or $\theta_R$ would yield a different (unitary) matrix that would make no difference to the calculations shown in equations \ref{eq:MZ} and \ref{eq:bomb} (I leave proof of this as an exercise for the interested reader). With that said, the reason I like this representation is that it allows $i$ to function as a label for the states that result from a beam-splitter reflection, making it easier to write down interferometer equations directly from the diagram and keep track of where each term comes from. This is, of course, purely a matter of personal preference.
$2$. This equation is an example of quantum superposition in action. For example, the first line says that the photon exists in a superposition of the $\left|u\right\rangle$ and $\left|v\right\rangle$ states where the states are equally weighted (as we are assuming normalisation). Superposition is a fundamental aspect of quantum mechanics that follows from the linearity of the Schrödinger equation (linear combinations of solutions will themselves be solutions). In this case, the beam-splitter splits the photon probability wave along the two channels and so in some sense the photon travels along both branches, although no measurement can be made which will detect the photon in both channels at once—this is not a consequence of experimental limitations but is a restriction that is fundamental to quantum theory. The question of why this is the case is a deep and ongoing one, and I encourage the interested reader to investigate the literature on the philosophy (and especially interpretations) of quantum mechanics.
$3$. I have gone to some pains in this post to avoid using the term "wavefunction collapse" at any point, although for clarity I will say will say that in the Copenhagen interpretation, the case of the photon interacting with the detonator (or any of the detectors for that matter) is an example of wavefunction collapse.
$4$. So long as the beam-splitters are both 50:50, as we have assumed throughout this blog post. Naturally, other types of beam-splitters will yield different results, and in fact using a more sophisticated apparatus will permit a much better detection level (in theory, the detection fraction can be brought arbitrarily close to 1, although I cannot speak to the practicality of such an apparatus).
This question is known as the Elitzur-Vaidman bomb-testing problem, and although one can arrive after reasonably little thought at the fairly obvious answer that no such ensemble is possible (as any direct observation using light or matter will detonate any working bombs), in actual fact such an ensemble is possible! How can this be the case? The short answer is: quantum effects. The long answer? Read on!
The solution to this problem involves the use of a Mach-Zehnder interferometer (Fig. 1) with a single-photon source. To see how, let's consider the case of the interferometer without any bomb in place. We then have
\begin{align}\label{eq:MZ}
\left|s\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left(\frac{1}{\sqrt{2}}\left|c\right\rangle + \frac{i}{\sqrt{2}}\left|d\right\rangle\right) + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right) \nonumber \\
&= \frac{i}{2}\left|c\right\rangle + \frac{-1}{2}\left|d\right\rangle + \frac{i}{2}\left|c\right\rangle + \frac{1}{2}\left|d\right\rangle \nonumber \\
&= i\left|c\right\rangle,
\end{align}
where $\left|a\right\rangle$ represents the quantum state in the $a$-branch of the interferometer (as labelled in Fig. 1) and $i$ is the imaginary unit.$^{1}$ What the above calculation shows$^{2}$ is that (somewhat surprisingly) despite the branching at the second beam-splitter, destructive interference along $d$ and constructive interference along $c$ causes the photon to always be detected at $C$ and never at $D$ (for this alignment).
Now let's consider the same Mach-Zehnder interferometer but with a bomb placed such that the detector will be along the $u$-branch (as shown in Fig. 2). In this case we have
\begin{align}\label{eq:bomb}
\left|s\right\rangle\left|B_0\right\rangle &\rightarrow \frac{i}{\sqrt{2}}\left|u\right\rangle\left|B_0\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left|v\right\rangle\left|B_0\right\rangle \nonumber \\
&\rightarrow \frac{i}{\sqrt{2}}\left|X\right\rangle + \frac{1}{\sqrt{2}}\left(\frac{i}{\sqrt{2}}\left|c\right\rangle + \frac{1}{\sqrt{2}}\left|d\right\rangle\right)\left|B_0\right\rangle \nonumber \\
&= \frac{i}{\sqrt{2}}\left|X\right\rangle +\frac{i}{2}\left|c\right\rangle\left|B_0\right\rangle + \frac{1}{2}\left|d\right\rangle\left|B_0\right\rangle,
\end{align}
where $\left|B_0\right\rangle$ is the 'primed' or unexploded bomb, $\left|X\right\rangle$ represents the state where the bomb has been detonated$^3$ and $\left|a\right\rangle\left|b\right\rangle\equiv\left|a\right\rangle\otimes\left|b\right\rangle$. Note that for the purposes of this thought experiment we are assuming the detonator is a perfect detector, i.e., the photon wave cannot travel down $u$ without being absorbed.
As is clear from equation \ref{eq:bomb}, the inclusion of the detonator destroys the constructive/destructive interference that caused the simplification in equation \ref{eq:MZ}. Therefore, in the detonator case, rather than having every photon detected at $C$, we have the photon detected at $C$ with a probability of $1/4$, detected at $D$ with a probability of $1/4$ and the bomb detonated with a probability of $1/2$.$^4$
This is what makes it possible to assemble a set of functional bombs without detonating them—if a photon is detected by $D$ then the bomb must have a detonator attached and so we can set it aside knowing it works. If a photon is detected by $C$ then the functionality is indeterminate as we expect a detection at $C$ with non-zero probability in both detonator and no-detonator cases, but this is not a problem as we can simply emit another photon and re-run the test.
Note that while the probabilities above can be derived (in a fairly straightforward manner) from classical principles, we cannot apply a classical interpretation here as the quantum nature of the experiment is indispensable. In the classical (many-photon) run it is possible to both detonate a bomb and make a detection at $D$; this is precluded in the quantum case as the single photon cannot be absorbed by multiple objects. Furthermore, it is the wave-nature of the photon that permits the destructive interference at $D$ in the no-detonator case and thus provides 'detection by $D$' to signify the presence of the detonator and thus successfully make an 'interaction-free' measurement.
If you're unconvinced of this argument because it is based on a purely theoretical consideration, consider that this thought experiment has (equivalently) been carried out in the real world (admittedly using an ordinary detector rather than a bomb) and in fact was first done about a year after this problem was first published. I can't speak to the practical applications, if any exist, but I love this problem regardless for the simple fact that the solution challenges your intuition but can be understood using reasonably straightforward quantum mechanical principles.
Notes
$1$. The inclusion of $i$ in these equations might seem unusual or arbitrary, so I will provide a derivation here that shows where it comes from.Consider a beam-splitter as shown in Fig. 3. This system can be represented by the matrix equation $\left|\psi_3,\psi_4\right\rangle = \hat{B}\left|\psi_1,\psi_2\right\rangle$, or explicitly,
\begin{equation}\label{eq:BSM}
\begin{pmatrix}
\psi_3 \\ \psi_4
\end{pmatrix}
=
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
\begin{pmatrix}
\psi_1 \\ \psi_2
\end{pmatrix},
\end{equation}
where $T$ and $R$ are the transmission and reflection coefficients respectively. In the experiment we assume an ideal, lossless beam-splitter which demands that the beam-splitter matrix be unitary, i.e., $\hat{B}^{\dagger}\hat{B}=\hat{\mathbb{I}}$, or,
\begin{equation}\label{eq:unitary}
\begin{pmatrix}
T^{\ast} & R^{\ast} \\ R^{\ast} & T^{\ast}
\end{pmatrix}
\begin{pmatrix}
T & R \\ R & T
\end{pmatrix}
=
\begin{pmatrix}
1 & 0 \\ 0 & 1
\end{pmatrix}.
\end{equation}
Equation \ref{eq:unitary} immediately implies the following relations:
\begin{equation}
|T|^2+|R|^2=1,
\end{equation}
\begin{equation}\label{eq:0}
T^{\ast}R+R^{\ast}T=0.
\end{equation}
As $T$ and $R$ are complex numbers, we can represent them in polar form as $T=|T|e^{i\theta_T}$ and $R=|R|e^{i\theta_R}$. For simplicity we choose $\theta_T=0$ and thus $T=|T|\implies T^{\ast}=T$ and so equation \ref{eq:0} becomes
\begin{align}\label{eq:0new}
T|R|e^{i\theta_R}+|R|e^{-i\theta_R}T&=0 \nonumber \\
2T|R|\cos{\left(\theta_R\right)}&=0
\end{align}
where we have made use of the identity $\cos{(\alpha)}=e^{i\alpha}/2+e^{-i\alpha}/2$. Equation \ref{eq:0new} is satisfied by $\theta_R=n\pi+\pi/2, n\in\mathbb{Z}$, but we will choose $n=0\implies\theta_R=\pi/2$ for simplicity, which in turn gives $R=|R|e^{i\pi/2}=i|R|$.
Finally, as the beam-splitter is 50:50 (50% transmission, 50% reflection) we demand $|T|=|R|=1/\sqrt{2}$ and so the beam-splitter matrix is given by
\begin{equation}\label{eq:B}
\hat{B}=\frac{1}{\sqrt{2}}
\begin{pmatrix}
1 & i \\ i & 1
\end{pmatrix}.
\end{equation}
It should be clear that equation \ref{eq:B} is not a unique representation of $\hat{B}$; another choice of $\theta_T$ and/or $\theta_R$ would yield a different (unitary) matrix that would make no difference to the calculations shown in equations \ref{eq:MZ} and \ref{eq:bomb} (I leave proof of this as an exercise for the interested reader). With that said, the reason I like this representation is that it allows $i$ to function as a label for the states that result from a beam-splitter reflection, making it easier to write down interferometer equations directly from the diagram and keep track of where each term comes from. This is, of course, purely a matter of personal preference.
$2$. This equation is an example of quantum superposition in action. For example, the first line says that the photon exists in a superposition of the $\left|u\right\rangle$ and $\left|v\right\rangle$ states where the states are equally weighted (as we are assuming normalisation). Superposition is a fundamental aspect of quantum mechanics that follows from the linearity of the Schrödinger equation (linear combinations of solutions will themselves be solutions). In this case, the beam-splitter splits the photon probability wave along the two channels and so in some sense the photon travels along both branches, although no measurement can be made which will detect the photon in both channels at once—this is not a consequence of experimental limitations but is a restriction that is fundamental to quantum theory. The question of why this is the case is a deep and ongoing one, and I encourage the interested reader to investigate the literature on the philosophy (and especially interpretations) of quantum mechanics.
$3$. I have gone to some pains in this post to avoid using the term "wavefunction collapse" at any point, although for clarity I will say will say that in the Copenhagen interpretation, the case of the photon interacting with the detonator (or any of the detectors for that matter) is an example of wavefunction collapse.
$4$. So long as the beam-splitters are both 50:50, as we have assumed throughout this blog post. Naturally, other types of beam-splitters will yield different results, and in fact using a more sophisticated apparatus will permit a much better detection level (in theory, the detection fraction can be brought arbitrarily close to 1, although I cannot speak to the practicality of such an apparatus).
Thursday, March 20, 2014
News (2014/03/20)
I'm trying to post on my blog much more often this year than I used to, in fact as close to every week as I can manage. Unfortunately, the post I'm working on at the moment isn't nearly ready for publication, so this week I'm instead going to make a little news post, the first item of which will be the thing that I just told you (about the new blog post coming soon)!
The next item of news is not particularly new; last Friday (the 14th of March) was Pi Day. Pi Day is of course silly for a whole bunch of reasons (the main one being it's based on the completely nonsensical American dating system) so I've decided to introduce readers who may not be familiar with it to the concept of tau (τ). Tau has been proposed as an alternative to pi, and while I am not especially partisan on the matter I have to say I am somewhat sympathetic. Here is the case for tau laid out in the Tau Manifesto and for the sake of fairness a counterargument in the Pi Manifesto.
Finally, the real news comes in the form of the results of the recent BICEP2 measurements of B-mode polarisation in the CMB, easily the biggest news in physics since the Higgs was announced in 2012 and a major breakthrough for early-universe cosmologists. I will be able to do an explanatory post about the news if there's enough demand for one, but otherwise a lot of good explanations can be found around the place ranging from the somewhat simplistic to the slightly more technical. This is a very exciting time for fundamental physics and I expect to see some very interesting papers published in the next couple of years based on insights from this new data.
Finally, it isn't technically news, but if you haven't heard of them already, I strongly urge you to check out Brady Haran's science channels, especially Sixty Symbols (physics) and Numberphile (mathematics); I've subscribed to most of them on YouTube and they are absolutely fantastic.
That's all for this quick post, hopefully I'll have a considerably more in-depth number ready for next week! See you then!
The next item of news is not particularly new; last Friday (the 14th of March) was Pi Day. Pi Day is of course silly for a whole bunch of reasons (the main one being it's based on the completely nonsensical American dating system) so I've decided to introduce readers who may not be familiar with it to the concept of tau (τ). Tau has been proposed as an alternative to pi, and while I am not especially partisan on the matter I have to say I am somewhat sympathetic. Here is the case for tau laid out in the Tau Manifesto and for the sake of fairness a counterargument in the Pi Manifesto.
Finally, the real news comes in the form of the results of the recent BICEP2 measurements of B-mode polarisation in the CMB, easily the biggest news in physics since the Higgs was announced in 2012 and a major breakthrough for early-universe cosmologists. I will be able to do an explanatory post about the news if there's enough demand for one, but otherwise a lot of good explanations can be found around the place ranging from the somewhat simplistic to the slightly more technical. This is a very exciting time for fundamental physics and I expect to see some very interesting papers published in the next couple of years based on insights from this new data.
Finally, it isn't technically news, but if you haven't heard of them already, I strongly urge you to check out Brady Haran's science channels, especially Sixty Symbols (physics) and Numberphile (mathematics); I've subscribed to most of them on YouTube and they are absolutely fantastic.
That's all for this quick post, hopefully I'll have a considerably more in-depth number ready for next week! See you then!
Thursday, March 6, 2014
Playing with infinite series
Sequences, series and summation notation
In mathematical parlance, the term 'sequence' carries a similar meaning as in regular speech; it refers to an ordered list of numbers that usually follows a rule. For example, the sequence $1, 2, 3, 4, 5...$ (onwards without end) is given by adding 1 to the previous number in the sequence, beginning at 1. The definition of the term 'series' is less obvious—one description is that a series is the sum of a sequence, or in other words, to gain a series one adds all the terms of a sequence. For a finite series (a series with finitely many terms, or alternatively a series with a last term), the mathematics is typically fairly straightforward and so we will focus instead on the topic of infinite series, but before we do so, I will make a brief digression to discuss notation.For reasons I am sympathetic to, few people enjoy reading about mathematical notation and it is difficult to write about it in a way which is interesting. However, while it is possible to write a series as, for example, $1+2+3+4+5+...$ this soon becomes cumbersome and is very limiting for finite series with a large number of terms and for series for which the rule or pattern is not obvious from the first few terms. For these reasons, mathematicians have developed a notation for series which I will adopt in the remainder of this post for instructional purposes (side–by–side with the long-hand version for clarity).
The notation is referred to as summation notation or sigma notation and uses a large capital sigma $\Sigma$ to denote the summation. The rule is written to the right of the sigma in terms of an index of summation, the lowest value of the index will be under the sigma and the highest value will be above the sigma (or '$\infty$' for an infinite series with no final term).$^1$
A simple example of sigma notation in action is the finite series
\begin{align} \label{eq:square5}
\sum_{n=1}^{5}n^2 & = 1^2+2^2+3^2+4^2+5^2 \nonumber \\
& = 1+4+9+16+25,
\end{align}
which in this case is equal to $55$. Here, the index of summation $n$ appears prominently in the rule as the number that is being squared. Another example, slightly tricker this time, is the infinite series
\begin{align} \label{eq:Zeno}
\sum_{n=1}^{\infty}\left(\frac{1}{2}\right)^n
& = \left(\frac{1}{2}\right)^1+\left(\frac{1}{2}\right)^2+\left(\frac{1}{2}\right)^3+\left(\frac{1}{2}\right)^4+... \nonumber \\ & = \frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+...
\end{align}
where the index of summation is this time the power that $1/2$ is raised to in each term of the sum.$^2$
Infinite series behaving badly
An infinite series is in many ways a different beast to a finite series. Possibly the clearest way is conceptually; finite series are computed fairly easily, in principle at least—it is simply a case of adding so many numbers together and then reading the value off your calculator. This is not possible with an infinite series, as there is no final term and no opportunity to hit a final '$=$' button on the calculator, as the sum goes on forever. This is not always a problem, however. Some series are known as 'convergent', which is to say that they are in effect equal or equivalent to some number. There are many tests for convergence, but suffice it to say that we know a series is convergent when it approaches said number.$^3$ We have already seen a convergent series in equation \ref{eq:Zeno}, but another example of a convergent series is\begin{equation}
\label{eq:euler}
e=\sum^{\infty}_{n=0}\frac{1}{n!}=1+1+\frac{1}{2}+\frac{1}{6}+...
\end{equation}
where $!$ is the factorial operator.$^4$ As you can see, this series converges to $e$, a truly marvellous number which happens to be my favourite mathematical constant, but an explanation of that fact would require another post entirely.
If a series is not convergent, however, then we call it divergent,$^5$ and that is where the trouble starts. A straightforward example is the harmonic series
\begin{equation}
\label{eq:harmonic}
\sum^{\infty}_{n=1}\frac{1}{n}=1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+...
\end{equation}
which despite its similarity to equation \ref{eq:Zeno} increases indefinitely and does not approach any particular number. This is in some sense easy to understand intuitively; one can see how the sum would just keep getting bigger and bigger. This kind of reasoning is based around partial sums, which are truncations of infinite series. For the harmonic series the partial sum is given by
\begin{equation} \label{eq:Hn}
H_n=\sum^{n}_{k=1}\frac{1}{k}=1+\frac{1}{2}+\frac{1}{3}+...+\frac{1}{n},
\end{equation}
where the $n^{\text{th}}$ partial sum $H_n$ is known as the $n^{\text{th}}$ harmonic number. (Note that here we use $n$ to label the partial sum and so the role of the index of summation is taken over by $k$ to avoid confusion, although we could have chosen $k$ as our label and kept $n$ the index of summation if we wished; it makes no difference). Partial sums are finite series by design and so just like equation \ref{eq:square5} there is a final term after which we can hit the metaphorical '$=$' button. If we do so after 2 terms we find $H_2=3/2$, after 3 terms $H_3=11/6$, 4 terms $H_4=25/12$ and so on.$^6$
Now let us consider an altogether different beast known as Grandi's series. Grandi's series is given as
\begin{equation} \label{eq:Grandi}
\sum^{\infty}_{n=0}(-1)^n=1-1+1-1+1-1...
\end{equation}
Unlike the harmonic series, the partial sums of Grandi's series do not seem to trend in any particular direction, but rather alternate between two 'accumulation points' at 1 and 0. This peculiarity will cause us some trouble, but to see why first we will make a brief foray into the physical sciences.
A brief foray into the physical sciences
Suppose we have two thin neutral conductive plates placed parallel to each other very close together in a vacuum. According to classical physics (and everyday intuition) absolutely nothing will happen. However, this is not what we observe; in fact, the two plates will experience an attractive force attempting to bring them together. This is known as the Casimir effect, and can only be understood in terms of quantum mechanics. The Casimir effect is typically expressed in terms of quantum electrodynamics, but there is nothing inherently electrodynamic about the effect and so one can consider many analogous scenarios with equivalent Casimir effects. In order to greatly simplify our derivation I will choose to do just that.One thing that it is essential to be aware of is that in quantum mechanics a vacuum is not 'empty' in the sense that there is nothing at all there; so far as we understand, such an emptiness cannot exist in the physical universe. This possibility is precluded both on experimental grounds and on theoretical grounds by the uncertainty principle.$^7$ Instead we understand the universe to be filled with quantised fields (mathematically speaking, a field assigns some value(s) to every point in space(time); a quantised field is one where the range of possible values is restricted to some discrete set) with each particle being a localised excitation in the energy of the field, e.g., photons (light particles) are excitations in the electromagnetic field, electrons are excitations in the electron field and so on. The rough procedure$^8$ for quantising a field is to treat it as a quantum harmonic oscillator (QHO) at every point in space—one could crudely picture this as an infinite system of connected balls and springs—which naturally results in quantised, discrete energy levels. A 'true' vacuum is therefore the ground state (lowest energy state) in every quantum field. We know that the ground state energy cannot be 0 as that would be the kind of emptiness that cannot exist. Rather, it takes on a value of
\begin{equation} \label{eq:energy}
E_0=\frac{\hbar\omega}{2},
\end{equation}
the ground state of the QHO, where $\hbar$ is the reduced Planck constant and $\omega$ is the angular frequency of the oscillator.
With all this in mind, let's return to the Casimir effect, considering a 1+1-dimensional massless scalar field to simplify and clarify the example. The plates impose what are called 'boundary conditions', they restrict the frequencies that waves in between the plates can take, in this case to standing waves.$^9$ The equation for a standing wave in 1+1-dimensions is
\begin{equation} \label{eq:wave}
\psi_n(x,t)=e^{-i\omega_nt}\sin{\left(\frac{n\pi x}{a}\right)}
\end{equation}
where $a$ is the width of the cavity between the plates, $n$ is a natural number (a positive whole number) and $\omega_n$ is the angular frequency given by
\begin{equation} \label{eq:omega}
\omega_n=\frac{n\pi c}{a}
\end{equation}
where $c$ is the wave speed. Now, we know that the ground state energy for a QHO is given by equation \ref{eq:energy} and we know that in between the plates only standing waves can exist, so we can only have waves with angular frequency $\omega_n=n\pi c/a$. If we wish to find the vacuum energy between the plates, it seems clear then that all we need to do is sum over the possible ground state energies, giving
\begin{equation} \label{eq:vacuum}
E=\frac{\hbar}{2}\sum^{\infty}_{n=1}\omega_n=\frac{\hbar\pi c}{2a}\sum^{\infty}_{n=1}n.
\end{equation}
Here we run into a problem. Unless you haven't been paying attention, you'll notice that
\begin{equation} \label{eq:natural}
\sum^{\infty}_{n=1}n=1+2+3+4+...
\end{equation}
absolutely positively does not converge at all, and yet here it appears in a physics equation relating to a very real and decidedly measurably finite effect. How can this be? We haven't made a mistake, but we have overlooked one crucial fact. No matter what our plates are made of, they cannot confine arbitrarily high energies of the field; those high energy modes will always be able to escape. So what we need now is to somehow take account of that fact and in doing so somehow assign a finite value to equation \ref{eq:vacuum} and rescue our derivation.
Putting divergent series to work
So what we seek is a way of attaching a meaningful finite value to divergent equations. Let's take a look at Grandi's series again and see if we can come up with anything consistent. We can try cancelling off pairs of terms to give\begin{align} \label{eq:gpair1}
\sum^{\infty}_{n=0}(-1)^n&=(1-1)+(1-1)+(1-1)+...\nonumber\\
&=0+0+0+...\nonumber\\
&=0
\end{align}
but we can just as easily choose different pairings to give
\begin{align} \label{eq:gpair2}
\sum^{\infty}_{n=0}(-1)^n&=1+(-1+1)+(-1+1)+...\nonumber\\
&=1+0+0+...\nonumber\\
&=1
\end{align}
which is certainly not consistent. We can try re-ordering the series to bring all the $+1$s to the front, but this gives
\begin{align} \label{eq:inf+1}
\sum^{\infty}_{n=0}(-1)^n=1+1+1+...-1-1-1...
\end{align}
As there are an infinite number of $+1$s we never reach the $-1$s and the series approaches $+\infty$. Trying the same process by arranging all the $-1$s to the front will in the same way cause the series to approach $-\infty$. Rather than find a single consistent way of assigning a number to the series, all we have found is four duds.
The reason these methods are all duds is because operations like reordering and cancelling pairs of terms (method of differences) are valid only for convergent series. If we try to apply them to divergent series, the result is clearly a mess. Let's step back and look at the problem from another angle. We want to assign a number to the series; presumably we should be able to manipulate that number algebraically. If we call the series $S$ then after some algebraic juggling we find
\begin{align} \label{eq:S}
S&=1-1+1-1+1-1+...\nonumber\\
1-S&=1-(1-1+1-1+1-1+...)\nonumber\\
&=1-1+1-1+1-1+...\nonumber\\
&=S\nonumber\\
1&=2S\nonumber\\
\Rightarrow S&=1/2
\end{align}
Furthermore, we can consider Grandi's series as an example of the infinite geometric series
\begin{align} \label{eq:geometricG}
\sum_{k=0}^{\infty}ar^k=a+ar+ar^2+ar^3...
\end{align}
where $a=1$ and $r=-1$. Even though Grandi's series is divergent, equation \ref{eq:geometricG} is convergent for $|r|<1$ and in that case
\begin{align} \label{eq:geometricC}
\sum_{k=0}^{\infty}ar^k=\frac{a}{1-r}.
\end{align}
If we substitute $a=1$ and $r=-1$ into equation \ref{eq:geometricC} then we again find $S=1/2$. Neither of these constitute solid proof in and of themselves, but they are highly suggestive. The tool we are looking for is the Cesàro sum.$^{10}$ A series is Cesàro summable when the mean value of its partial sums tends to a given value. For convergent series the Cesàro sum will always equal the number the series converges to and the Cesàro sum is defined for many divergent series too, including Grandi's series. The partial sums of Grandi's series are $1,0,1,0,...$ and so the terms in the Cesàro sequence are $1, 1/2, 2/3,1/2,3/5,1/2,4/7,...$ which clearly converges to $1/2$ in the limit.$^{11}$ In some sense this is quite a satisfying result as our Cesàro sum lies exactly in between the two accumulation points, serving as a kind of average value.
Having fun with zeta function regularisation
The partial sums of equation \ref{eq:natural} are the triangular numbers $T_n$ (so named because they give the numbers of objects that can be arranged into equilateral triangles) $1, 3, 6, 10, 15,...$, so we calculate the Cesàro sequence of equation \ref{eq:natural} and find it goes $1, 2, 10/3, 5, 7,...$ Once again, just as we stop to bask in our moment of triumph, we find our job isn't quite yet done; equation \ref{eq:natural} is not Cesàro summable. We must find another, more sophisticated method for attaching a number to that series.Let us consider the series
\begin{align} \label{eq:dirichlet}
D(s)=\sum^{\infty}_{n=1}n^{-s}, \text{ Re}(s)>1
\end{align}
where $s$ is a complex number and $\text{Re}(s)$ denotes the real part of $s$. For $s=-1$ this series would be exactly the same as equation \ref{eq:natural}, but the series is not defined for $s=-1$.$^{12}$ However, we saw back in equation \ref{eq:geometricC} that on that occasion if we applied the convergent case equation to Grandi's series we got the result of $1/2$ that turned out to be right; perhaps we could do something similar here? As it would happen, we can, but first I would implore you to, in the great words of John Arnold, "Hold on to your butts".
In the domain $\text{Re}(s)>1$, $\zeta(s)=D(s)$ where $\zeta(s)$ is known as the Riemann zeta function.$^{13}$ Unlike $D(s)$, $\zeta(s)$ is defined over the entire complex plane and is known as an analytic continuation of $D(s)$. Analytic continuation is a wonderfully useful (and perplexing) tool of complex analysis whereby the domain of an analytic ('well-behaved') function can be extended. While this may not seem like a big deal, it raises the question as to whether or not a function can be continued arbitrarily; if our original function is only defined for some small domain and we wish to extend that domain, what is to stop us giving it such–and–such value in the extended domain instead of some other value? As it would happen, the identity theorem states (very roughly) that any two holomorphic functions (all complex analytic functions are holomorphic) that are equal to each other at some point in a given domain must be equal over the entire domain, and thus there is only one unique way to analytically continue a function.
We wish to know what $\zeta(-1)$ is so we can assign that value to $D(-1)$ through the magic of analytic continuation. For negative whole numbers $n<0$,
\begin{equation} \label{eq:negzeta} \zeta(n)=-\frac{B_{1-n}}{1-n}
\end{equation}
where $B_{1-n}$ is the '$1-n$'$^{\text{th}}$ Bernoulli number.$^{14}$ In the case of $n=-1$ we have
\begin{equation} \label{eq:-1zeta}
\zeta(-1)=-\frac{B_2}{2}=-\frac{1}{12}.
\end{equation}
Thus we can assign to the divergent series $1+2+3+4+...$ the value of $-1/12$. If this strikes you as strange or even suspicious then I applaud your scepticism; there is indeed something plainly odd about assigning the value of a small, fractional negative number to a series of ever-increasing positive whole numbers. This is not at all like the neat case of Grandi's series where we had our value lying neatly between the accumulation points. But before we throw our hands up in despair, recall our motivation for this investigation, the Casimir effect. What happens if we use our value of $-1/12$ there?
As it would happen, to do so is to use a technique known as zeta function regularisation. We replace a divergent series with a 'regulator' in the form of a zeta function (although other regulators exist, each with different strengths and weaknesses) and in doing so remove unphysical infinities from our theory. If we have regularised correctly, then by the time we have reached our final result, the regulator will have disappeared—it is nothing more than a 'trick' for calculating the correct value and so it should not still appear at the last step.
\begin{equation} \label{eq:zetanorm}
E=\frac{\hbar\pi c}{2a}\sum^{\infty}_{n=1}n=\frac{\hbar\pi c}{2a}\zeta(-1)=-\frac{\hbar\pi c}{24a}.
\end{equation}
The force between the two plates is given by the negative gradient of the energy:
\begin{equation} \label{eq:force}
F=-\frac{\partial E}{\partial a}=-\frac{\partial}{\partial a}\left(-\frac{\hbar\pi c}{24a}\right)=-\frac{\hbar\pi c}{24a^2}
\end{equation}
which, lo and behold, is exactly the right result.
Final thoughts
The prompt for my writing this lengthy explanation of infinite series, divergences, regularisation and so on was a minor flurry on the Internet a little while ago due to a somewhat dodgy derivation of the result $1+2+3+4+...=-1/12$. In order to get this result a number of those forbidden–for–divergent–series operations were used for simplicity's sake, but in doing so I felt the important subtlety between a convergent series equalling a number and a divergent series being assigned a value (albeit in a rigorous way) was lost, and that is an important distinction to make. You cannot keep adding $1+2+3+4+...$ and then through the magic of infinity come up with a $-1/12$ at the end; that series will always diverge and will always approach $+\infty$, but as we have seen we can rigorously assign the value of $-1/12$ to it for the purposes of removing infinities from our calculations using, in this case, the Riemann zeta function.During the conception of this post I did ponder a question which continues to interest me, though. We saw from the example of Grandi's series that there are some mathematical operations and manipulations which would be fine in 'normal' mathematics but which suddenly become verboten when done in the specific context of a divergent series. The question is, Is this a fundamental property of the mathematics in question, that is to say, the underlying patterns and structures, or is it one emergent from notational limits? Is a rearrangment of terms in a divergent series actually fundamentally different to a rearrangment of terms in a convergent series, or is it the same thing that manifests different results in different contexts? I am not sure if this question can be answered sensibly, but for my money I am reminded of the old dichotomy in the philosophy of mathematics whose question is still yet to find an answer: Is mathematics discovered or invented? Now there is truly some food for thought.
Notes
$1$. There are other ways to use sigma notation. For example, equation \ref{eq:natural} can also be represented as $\sum_{n\leq 1}n$ ($n$ being a whole number is implied by the discrete sum being used instead of the continuous integtral) or $\sum_{n\in\mathbb{N}}n$. Other less common examples of sigma notation are $\sum_{p\text{ prime}}\frac{1}{p}$ which is the sum of the reciprocals of all prime numbers and $\sum_{d \vert n}d^x$ which is the divisor function, where '$d \vert n$' means $d$ divides $n$ exactly.$2$. The sum shown here has the interesting property of being a series representation of the number 1, or in the given notation, $\sum_{n=1}^{\infty}\left(\frac{1}{2}\right)^n=1$. For any readers who are passingly familiar with Ancient Greek philosophy, this fact can be viewed as a solution to Zeno's dichotomy paradox, albeit one which ignores some nuances which I will address at a later date when I cover supertasks, one of the many intersections of philosophy and mathematics.
$3$. Formally, there exists a limit $S$ such that for any (arbitrarily small) number $\epsilon>0$ there is a number $N$ such that for $n>N$, $|S_n-S|<\epsilon$ where $S_n$ is the $n^{\text{th}}$ partial sum. Informally, there exists a number $S$ such that for an arbitrarily large $n$ the partial sum $S_n$ will be arbitrarily close to $S$.
$4$. The factorial operator is defined as $n!=n(n-1)(n-2)...1$, or in words, $n!$ (read '$n$ factorial') is given by the multiplication of $n$ by all the whole numbers less than $n$ going down to $1$. As an example, $5!=5\cdot4\cdot3\cdot2\cdot1=120$.
$5$. I will not delve here into the depths of conditional and absolute convergence, almost convergence, and so on. Suffice it to say that there are a great many interesting infinite series that have a great many interesting properties relating to convergence behaviour other than those simple ones shown here. If you are especially interested in the topic, I recommend investigating the Riemann series theorem for a very interesting ad surprising property of conditionally convergent series.
$6$. The harmonic series is deceptively interesting and there are many, many different and varied ways of calculating the harmonic numbers. One example straight out of equation \ref{eq:harmonic} is the recurrence relation (an equation which gives one term in a sequence in terms of a previous one) $H_n=H_{n-1}+1/n$. I encourage you to investigate others and see where it leads you!
$7$. The value of a field and the value of the derivative of the field at a given point in space cannot be known to arbitrary accuracy; the better one is known the less well the other must be. Though the uncertainty principle is often raised as a weird and wonderful result of quantum mechanics, in fact it is a feature of any wave theory and is linked intimately with Fourier transformations, although a thorough demonstration of how this is so is sadly beyond the scope of this note.
$8$. The complexities of quantum field theory should by no means be underestimated; what I am presenting here is an extraordinarily simplified version that, while instructive, would not necessarily be very useful in practice.
$9$. Standing waves are waves with nodes (points of zero displacement) at the endpoints. An example would be a plucked guitar string, which is retricted from moving at the bridge and nut. This restriction ensures the only possible wavelengths are given by $\lambda_n=na/2$ for length $a$ where $n$ is a natural number. Using the wave relationship $c=\nu_n\lambda_n$ we find equivalently $\nu_n=nc/2a$ (or equation \ref{eq:omega} as $\omega=2\pi\nu$) which gives the frequency $\nu$ for the $n^{\text{th}}$ harmonic.
$10$. Or rather, one of the tools, as we could equally have chosen the Abel sum, the Borel sum, the $1/x$ series method, or a number of others. Cesàro summation is far from the only rigorous way of dealing with Grandi's series, but what is important is that it gives a value of 1/2, as do the methods I listed above—this consistency is a big hint that we have picked the right number to assign to the series.
$11$. It is worth noting that if we 'dilute' the series by adding in $+0$s we change the value of the Cesàro sum (although the summability is not affected). This illuminates yet another mathematical manipulation which would be fine for a convergent series but is not for a divergent series.
$12$. Precisely because it would become divergent.
$13$. If $e$ is my favourite number then $\zeta(s)$ is surely my favourite function, but in exactly the same way I could not possibly hope to explain why except in another post devoted to it exclusively.
$14$. For fear of overwhelming you with yet more beautiful mathematics in an already over-long post I will avoid the temptation of discussing the Bernoulli numbers, although as ever I encourage the interested reader to investigate for themselves!
Subscribe to:
Posts (Atom)