So what is Heisenberg uncertainty? Simply put, it is a restriction on the accuracy of simultaneous measurements of 'observables' (measurable quantities). The prototypical example is position and momentum; Heisenberg uncertainty states that the position and momentum of a particle$^2$ cannot be known simultaneously to arbitrary precision. The mathematical statement of the position-momentum uncertainty principle is
\begin{equation}
\Delta x\Delta p\leq\frac{\hbar}{2}
\end{equation}
where $\hbar$ is the reduced Planck constant and $\Delta x$, $\Delta p$ represent (in some sense)$^3$ the uncertainty in $x$ and $p$ respectively. Effectively, the better you know position, the less well you know momentum, and vice versa. This is not a specifically experimental limitation, but a fundamental theoretical one. This sort of restriction, on first viewing, indeed seems very strange and certainly counter-intuitive. I will attempt to convince you that not only is it not necessarily strange, but expected.
What is important to remember about QM is that wave mechanics is a central theme. Particles are represented by wave-functions, which are complex solutions to the Schrödinger equation,$^4$ and this wave-nature contributes to a good deal of the quantum weirdness we are familiar with (an example is shown in a recent blog post of mine which relies on superposition and destructive interference of photon waves). With this in mind, let's take a look at some wave-functions.
For illustrative purposes, we will work in one spatial dimension and free space (zero potential everywhere) and only consider time-independent wave-functions. The simplest example of such a wave-function is the plane wave, which takes the form
\begin{equation}
\psi(x)=Ae^{ikx}\equiv Ae^{ipx/\hbar}.
\end{equation}
Here $A$ is the complex-valued amplitude. The amplitude in this case is not important, because any wave-function must be normalisable (as the probability distribution function, which must integrate to 1 over all space to preserve conservation of probability, is given by $|\psi(x)|^2$, known as the Born rule) and so $A$ will need to be scaled anyway. For those who are unfamiliar with complex exponential form, the waviness is more explicit in the less compact form $\exp{(ikx)}\equiv\cos{(kx)}+i\sin{(kx)}$. The $k$ is the wave-number (or wave-vector in higher dimensions), and this term appears naturally in most of the mathematics I'm presenting in this post. For this reason I will include the version of the equations with $k$ alongside the version with the more physically immediate momentum $p$ (the conversion is simply $p=\hbar k$).
Because the Schrödinger equation is linear, sums of solutions are themselves solutions (this property is known as the superposition principle). That means we can have wave-functions of the form
\begin{equation}
\psi(x)=\sum_{m=0}^{n}A_me^{ip_mx/\hbar}
\end{equation}
for any arbitrary $n$ (finite or infinite). Here the scaling of the $A$ values is still not important because of normalisation, but the relative magnitude of them is, as this determines the relative probability weightings according to the Born rule. However, because we have wavelength $\lambda_m=2\pi/k_m\equiv2\pi\hbar/p_m$, we can see that the above formulation does not capture the full number of possible modes; only integer multiples of the $m=0$ mode are captured. In free space all modes are permissible and so we can take the continuum limit (let the sum turn into an integral):
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\tilde{\psi}(k) e^{ikx}\mathrm{d}k\equiv\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\tilde{\psi}(p) e^{ipx/\hbar}\mathrm{d}p.
\end{equation}
Because we have moved away from integer indexing, the discrete set of amplitudes $A_m$ is replaced by the continuous function that is suggestively symbolled $\tilde{\psi}$. The function ranges over $p$ because we are integrating over all possible modes/wavelengths/momenta—in a sense $p$ takes over the role of index in the integral from $m$ in the summation. The factor of $1/\sqrt{2\pi}$ is a matter of convention and the factor of $1/\sqrt{\hbar}$ comes from the change of $k$ to $p$.
There is more to $\tilde{\psi}$ than meets the eye. Not only is it the amplitude function for the integral, but it's actually the wave-function itself, except not in physical space like $\psi$ but in momentum space.$^5$ In the context of QM this is known as the momentum space representation of the wave-function, but more broadly the mathematical construct is known as the Fourier transform,$^6$ and Fourier transforms occur very frequently in all manner of physical theories involving waves, be they QM, acoustics, optics, crystallography, signal analysis and so on.
So how do we determine the form of $\tilde{\psi}$? As it turns out, perhaps unsurprisingly, the Fourier transform is invertible, and so we find that
\begin{equation}
\tilde{\psi}(p)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\psi(x) e^{-ipx/\hbar}\mathrm{d}x.
\end{equation}
All well and good, but how do we make sense of it? Well, let's consider a limiting case. We can select out a single mode by using a Dirac delta such that $\tilde{\psi}(p)=\delta(p-p_0)$. This Dirac delta is zero everywhere except at $p=p_0$, where it is undefined but the area under the Dirac delta is always normalised to $1$. Inserting the delta into equation (4) yields$^7$
\begin{equation}
\psi(x)=\frac{1}{\sqrt{2\pi\hbar}}\int_{-\infty}^{\infty}\delta(p-p_0)e^{ipx/\hbar}\mathrm{d}p=\frac{e^{ip_0x/\hbar}}{\sqrt{2\pi\hbar}},
\end{equation}
which is, outside a numerical factor, effectively a complex exponential in $x$, or equivalently, a flat (complex-valued) wave across all space (applying the Born rule to a plane wave yields a probability distribution of $|e^{iz}|^2=\text{const.}$). So when the momentum is maximally well-defined (put into a single mode) the position is maximally poorly-defined (the wave-function is spread evenly across all space). By the invertibility of the Fourier transform, we can expect the vice versa case to hold (a maximally defined position, i.e., Dirac delta in $x$, will result in a maximally poorly-defined momentum, i.e., an even wave across all momentum space). This relationships lies at the heart of the Heisenberg uncertainty principle.
As hinted to at that start of the post, position and momentum are not the only observables which obey Heisenberg uncertainty. The next most common pairing$^8$ is energy and time,$^9$ although uncertainty relationships can be generated more generally by taking the derivative of the (classical) action (and quantising). For example, momentum is the derivative of the action with respect to position, energy is the derivative of the action with respect to time, and so on.
Hopefully I have demonstrated that Heisenberg uncertainty is not so strange as might have first appeared. Rather than an arbitrary restriction on how accurately we can know certain measurable quantities, it is in fact a basic and unavoidable feature of any linear wave theory, of which quantum mechanics is only one example, albeit of a more fundamental and therefore perhaps more intuitively challenging sort than most.
Notes
$1$. The Heisenberg uncertainty principle is so ubiquitous in quantum physics that it is frequently referred to simply as 'the uncertainty principle'. Out of habit, however, I tend to use the less common 'Heisenberg uncertainty' to differentiate it from other, admittedly much less common uncertainty relations. As far as I know, any of these uses are considered acceptable.$2$. I hesitate to use the word 'particle', as it is important when talking about these quantum concepts to be very clear about what one means. A better term might have been 'quantum' or 'wave-particle', as the wave-nature of QM is central to the discussion of Heisenberg uncertainty. However, despite the slightly misleading connotations, 'particle' is by far the most commonly used term and so I will use it also.
$3$. We could, for example, take the standard deviation in $x$ and $p$, $\sigma_x$ and $\sigma_p$, as these will be represented by continuous distributions.
$4$. In this post I will use the Schrödinger representation of QM as this makes explicit the wave-nature of the wave-function. However, there are many, many representations of QM, each with their own advantages (and disadvantages) when it comes to analysing real systems, but importantly they are all exactly equivalent and so this wave-nature I have been emphasising is intrinsic to all of them. In that sense I could just as well have chosen any representation and this blog post would otherwise have been identical, although perhaps not as easy to understand.
$5$. If you don't know what momentum space is, the most important thing to understand is that there are mathematical spaces other than the space(time) we are familiar with. Let's consider the flat 3-dimensional "position" space (3-space) of ordinary life, with $x$-, $y$- and $z$-directions. Suppose we have an object at the coordinates $x=1$, $y=0$ and $z=0$. This can be represented by a vector in the 3-space going to the point $(1,0,0)$, thus describing the position. Now suppose the 3-space we are looking at is part of a 4-space that includes time, except we are going to set the time to some instant and freeze it there.
Let's say at that instant the object at $(1,0,0)$ is travelling with a momentum of $0$ in the $x$-direction, $1$ in the $y$-direction and $0$ in the $z$-direction (in arbitrary units). We could then construct a "momentum" 3-space (or 4-space including time) with directions $p_x$, $p_y$ and $p_z$ and at that instant of time the vector corresponding to the object would be at the coordinates $(0,1,0)$. So for any $n$-dimensional position space it's easy to see there is a corresponding $n$-dimensional momentum space. In fact, we can define a $2n$-dimensional space known as the phase space by combining the position and momentum spaces, and that space will describe all possible states of a physical system.
$6$. I want to stress that strictly speaking the momentum representation of the wave-function is not the Fourier transform of the position representation. The Fourier transform dual to position is the wave-vector (or wave-number in 1-D) and not the momentum, although given one is a scalar multiple of the other I feel like we can be a little loose in our communication in this one respect.
$7$. In evaluating equation (6) we have made use of the so-called sifting property of the Dirac delta, where $\int_{-\infty}^{\infty}\delta(x-a)f(x)\mathrm{d}x=f(a)$. This property is analogous to the Kronecker delta for sums but used instead of integrals and is arguably its most useful feature.
$8$. These pairs of variables are typically referred to in physics as 'conjugate variables', although in this context we can also call them Fourier transform duals. It is important to remember that we would be less inclined to refer to them as such if we were working in, for example, the Heisenberg representation of QM where the Heisenberg uncertainty principle arises more directly out of the non-commutativity of Hermitian operator matrices. This is only because the Fourier transforms are implicit in that representation; they are still there in some sense due to the equivalence of representations of QM as discussed in Note 4, but are not nearly as obvious.
$9$. The energy-time Heisenberg uncertainty principle is given mathematically as $\Delta E\Delta t=\hbar/2$. This is analogous to position-momentum uncertainty in the sense that the Fourier dual of position is wave-number $k=p/\hbar$ and not momentum directly; the Fourier dual of time is technically angular frequency $\omega=E/\hbar$ and not energy directly. As in the position-momentum case, however, the difference is only a scalar factor of $\hbar$ and so we can speak reasonably loosely with some impunity.
 

 
