RevK®'s ramblings: MATHS

Showing posts with label MATHS. Show all posts

2024-01-13

Distance of point to line in four dimensions (or more)

Why?

I'm writing the blog because searching has not helped me on this. Why do I need to know? well the answer is I am applying a Ramer–Douglas–Peucker algorithm to GPS tracking data.

You can look it up, but it is pretty simple to explain. Imagine you have a set of points starting from A and going to B - in my case it is GPS log data, so points on a map, every second, so lots of points. You want to reduce how many points you have without losing significant information.

Ideally all the intermediate points are on a straight line A to B, and can all be removed leaving just A and B. I practice you allow a tolerance, an allowable deviation from the line. If all of the intermediate points are within that tolerance you can delete them all. If not, you find the most deviant point, call it C. C is effectively a corner point. You apply the algorithm on A to C and C to B recursively.

In the end you have a set of points, which could just be A and B, or A, C, and B, or more. If you join the dots with straight lines you can be sure all of the removed points lay within your tolerance of that line, and so be sure any detail, that is more than the tolerance allowed, has not been missed.

I use this years ago on GPS data.

Before

After

Point to line

The crux of the algorithm is working our the deviation from the line, what wikipedia helpfully just refers to as perpendicularDistance().

Two dimensions

In two dimensions this is simple, and searching finds an algorithm. I used this on latitude and longitude. But this causes problems.

Latitude and longitude are not a flat plane. This is not likely to be a real issue unless tracking a plane (pun intended) or a ship over a long distance.
Latitude and longitude are in degrees not metres, so working out your distance is not easy. A degree North is not the same distance as a degree West and changes depending one where you are. You need trigonometry to compensate and convert to distance units.

Three dimensions

Of course, we also have altitude, and so seems sensible to include that - why not. Algorithms for a point to a line in 3D are also easy to find, but there seem to be several different approaches. At least altitude is already in metres.

However, the GPS can provide not just lat/lon/alt, but also ECEF data. This is simple Cartesian X/Y/Z co-ordinates from centre of the Earth. This avoids any need to adjust from degrees, and makes it simple.

Four dimensions

I have to push it one step further, don't I? If you travel in a straight line, but stop for a while, you would remove all points. By including time as a 4th dimension we create a corner when speed changes notably or you stop for a while.

It does mean you have to scale time to a distance, but that can be a simple parameter in the algorithm.

However, there is a snag - finding an algorithm for point to line in four dimensions is not simple. The only ones I could find explain it in vector maths - and even with an A in A-level maths, it was a long time ago and I really did not get on with vector maths. Nobody seems to give me a simple algebraic answer which is driving me mad.

Making it simple

So, back to basics, I am interested in three points, and regardless of how many dimensions that is, they can be said to lie on a flat two dimensional plane. This means working out the deviation is a simple matter of considering a triangle.

So I want to find h. To do this I need distances a, b, and c. But that is easy to find as it is simple Pythagoras, regardless of how many dimensions. For two dimensions it is √((Ax-Bx)²+(Ay-By)²). For three dimensions it is √((Ax-Bx)²+(Ay-By)²+(Az-Bz)²), and so on.

Knowing a, b, and c I can use Heron's formula to find area...

s = ½(a+b+c)
𝔸 = √(s(s-a)(s-b)(s-c))

Then use the formula for area from base and height

𝔸 = ½hb
h = 2𝔸/b
h = 2√(s(s-a)(s-b)(s-c))/b

So it can be done with any number of dimensions.

Faster

One of the considerations is performance when coding for a microcontroller. They can do integer add/subtract very fast, multiply reasonably fast, division slower, and floating point even slower. As it happens the ESP32 has hardware single precision floating point, which helps.

So can I simplify this?

Thankfully wikipedia helpfully expands Heron's formula to:

𝔸 = ¼√(4a²b²-(a²+b²-c²)²)

Which gives us:

h = 2(¼√(4a²b²-(a²+b²-c²)²)/b)
h = ½√(4a²b²-(a²+b²-c²)²)/b
h² = (4a²b²-(a²+b²-c²)²)/4b²

Working our h² is useful as ultimately I am just comparing this distance so no need to apply a square root. I can square my reference threshold.

Why is this helpful? Well, all of the parameters are a², b², and c². This means that I don't need to use a square root to work out a, b, and c in the first place. The whole lot is simple addition, subtraction, multiplication, and just one division.

Turning back

There is another problem with all of the perpendicular distance algorithms, including this one. The all work out a distance of a point to an infinite line which is defined as going through two points.

In this case point C is way off the line A-B, in fact it is c off it, but h is small. If h is within the tolerance then it would be removed leaving only A-B.

To solve this I can use the fact that if c²-b² is > a² the C must be beyond A and I should use a² as the distance².

Similarly if a²-b² > c² then C is beyond B, meaning I should use c² as the distance².

Fortunately I have a², b², and c² already, so is a simple test.

Of course, what if you have a straight line A-B-C-D and travel A-B-C-B-C-D? No points will be "outside" the A-D range... Well the use of time as a 4th dimension catches that - which means the first case would also have been caught... Even so, this simple test on end points is worth doing I think.

C what I mean...

inline float dist2 (fix_t * A, fix_t * B)

{

float X = x (A) - x (B);

float Y = y (A) - y (B);

float Z = z (A) - z (B);

float T = t (A) - t (B);

return X * X + Y * Y + Z * Z + T * T;

}

float b2 = dist2 (A, B);

fix_t *m = NULL;

float best = 0;

for (fix_t * C = A->next; C && C != B; C = C->next)

{

float h2 = 0;

float a2 = dist2 (A, C);

if (b2 == 0)

h2 = a2; // A/B same, so distance from A

else

{

float c2 = dist2 (B, C);

if (c2 - b2 >= a2)

h2 = a2; // Off end of A

else if (a2 - b2 >= c2)

h2 = c2; // Off end of B

else

h2 = (4 * a2 * b2 - (a2 + b2 - c2) * (a2 + b2 - c2)) / (b2 * 4);

}

if (m && h2 <= best)

continue;

best = h2;

m = C;

}

2020-03-08

Big number maths

One of my first C programmes ever, at uni, was a calculator. I had done a lot of code in BASIC, Z80, 6502 and other languages before, but C I learned at uni.

It was a strange exercise for me as I was teamed with someone else, and basically gave up on him. To me a calculator, i.e. something that could parse 1+2*3 and get 7, and understand brackets, was inherently simple. The code was a couple of pages at most and I had added all sorts of extras over and above the basic +, -, *, / binary operators. It handle brackets and operator precedence, and even had a table of operators to allow easy expansion. My "partner" made code that was dozens of pages and every test that involved extra brackets or some unexpected sequence broke it and he had two add extra code for edge cases. Oddly I once encountered a compiler that must have been written in such a way as adding extra brackets, e.g. 1+(2*3), actually broke it and made wrong code. Scary that stuff like exists.

My code has a simple loop to process: prefixes or "("s, operand, postfixes or ")"s, operator, in a loop, ending after last operand. It stacked operands, and operators (after processing the stack for any higher operators before adding), and that was it. It allowed unary pre and post operators, and binary operators. Simple and easy to understand, IMHO.

Of course, it used the normal C code to parse and output numbers, storing internally as floats (I think).

However, having played with mechanical calculators, one of the things I have meant to code one day and had not got around to for over 30 years, is a decimal calculator library... That is until this weekend.

Basically, one tool I have written, and my friends use a lot, includes an "eval" function to evaluate a simple sum for them. It is used in loads of places (as it embeds maths in a back end for a web page). It used integer (long long) types (64 bit) and shifted by number of decimal places it saw, giving it around 18 significant digits. Sadly it broke if you went over that (my bad) and they had some silly case of some numbers plus a fraction which had rather too many decimal places. So they asked me to fix.

A simple fix was limit the size, and then limit size and apply bankers rounding at the limit.

But I decided this may be the time to finally do that decimal maths code. I googled, and there are a few decimal libraries. There are also some arbitrary precision binary libraries. To my surprise the latest GCC has decimal types, where data is stored and processed in decimal. I can only assume we have decimal maths in some processors even, these days. This is good for accounting where the rounding errors you can get are bad (essentially binary does not have a way to represent 0.1 or 0.01 without recurring digits). Sadly GCC may have it, but clib does not yet (scanf and printf), but will some time, and I can see that being useful. Even so, _Decimal128 does have limits on number of digits. There are libraries in other languages, but I wanted something for C.

So I made a C string decimal library, and put it here on GitHub for all to use.

It is a library and command line, and includes functions to add, subtract, multiple, divide, and compare, as well as a simple evaluation function that parses basic sums, just like my uni project calculator.

The key thing is that it works on C text strings for numbers. A simple sum (I learned at school, and now out of date) was 366.2564*86164.091=31558149.7789324. It was numbers from an encyclopaedia in the library. I remember because I had to do the maths manually as no calculator I had would do it to full precision. Even google calculator truncates it a bit. That was my first test, and it worked. Yes, I also learned π to 50 places (and now only remember 25).

So, for add, subtract, and multiple, it is simple to ensure you have as many digits as needed. Sadly division can go on for ever, so I had to include a limit of decimal places, and a rounding rule, and remainder option.

I included rounding on limit on division, and also a separate simple rounding function - obviously actually applied at whatever digit you set. Default is banker's rounding :-

Round towards 0 (i.e. -3.9 rounds to -3)
Round away from 0 (i.e. -3.9 rounds to -4)
Round towards -ve (i.e. -3.9 rounds to -4, 4.2 rounds to 4)
Round towards +ve (i.e -3.9 rounds to -3, 4.2 rounds to 5)
Simple rounding (i.e 2.4 rounds to 2, 2.5 rounds to 3)
Banker's rounding (i.e. 2.4 rounds to 2, 2.5 rounds to 2, 2.6 rounds to 3, 3.5 rounds to 4)

I went further though, and made the calculator work on rational numbers, that means all of the maths is done with numerator and denominator, and only finally at the end does the division get done and rounding applied. This means that 1000/7*7 is 1000, unlike bc which makes it 994. minor optimisations for adding/subtracting with same denominator, and anything with denominator of 1 or a power of 10, making it simpler. For anything without division the maths does not even need a final divide.

I had a few silly bugs, one that fooled me is that zero is a special case, which I store as a magnitude of 0 and no digits, which is fine, but comparing numbers I started by comparing magnitude before comparing digits, so 0 looked like a bigger number than 0.1 as 0 was 0*10^0 and 0.1 was 1^10-1 and magnitude 0 is greater than magnitude -1. I had to add checks for comparison with 0.

However, overall, I am quite chuffed. Heck, I even asked it to work out 1e1000000000+1 and it just worked, that is a billion significant digits!

This is pure coding fun though. But may be useful - feel free to use it.

P.S. Someone asked about Banker's rounding, and I was sure I had blogged once, but cannot find it. Unlike that which I was taught at school, and every Casio (and other) calculator I had, rounding is not as simple as you think. I used to assume that a residual of exactly 0.5, or above, rounded up, and below 0.5 was down. That is what my calculator did. However, this creates a bias. Ideally you want to round an exact 0.5 residual down half the time and up half the time to remove bias. This could be random, but that is bad as you get results which are not reproducible. One simple way, bankers rounding, is to round 0.5 residual to nearest even number. Hence 0.5 rounds to 0, 1.5 rounds to 2, 2.5 rounds to 2, 3.5 rounds to 4, and so on.

2018-01-16

I did take a course in economics.

I did do economics at university - I failed, as it happens, but I also did Maths at O, and A level. I can do "adding up", honest.

Having borrowed one of these things to press reset buttons and remove SIM cards from iPhones, off my son, and then lost it, I thought I would get another. They are very useful and I am amazed that Victorinox don't have one as a standard tool on a pen knife.

So I googled, and Amazon offer them... And just to check I understand this, it is from 19p each but actually for one you pay 50p, for two you pay £11.37, and for three you pay £19.48

OK, I know I failed economics, but I think this one makes no sense to anyone.

So I did a bit more googling, and found someone selling for a more reasonable price, 3p.

Of course, buying something for 3p is a tad crazy, as postage will easily cover a lot more than that, so how does the price break down work on that I wonder...

Seriously, this is 2p at 50 units and 1p at 500 units. I only need one, maybe two so I have one too, but seriously?!

Given lowest postage option was like £4.35 I decided I may as well buy 500, I mean, really, I don't need 500 but, WTF.

It quoted £5 for 500, as you expect, at 1p each. But they it gets complicated.

The actual total for the 500 items was then shown as £7.38 so 1.46p each, hmmm, OK... But the invoice listed them as 2p each so £10 total in the itemisation.

In all, with postage and card surcharge (non EU), I have spent £12.14 which works out 2.42p each. I should have gone for 1000 clearly.

Next question is what I do with 500 pokey things!

2017-07-31

Maths test

This is an invoice from Viking (Office Depot International (UK) Ltd). I am at a loss as to how it is worked out.

The items are simple, there are £58.25 of items with 20% VAT and £53.98 of items with zero VAT. There is also a phantom £2.26 "Protection Plus", which I have no clue about and no idea if VAT applies or not.

The items are not clear whether they are quoting VAT inclusive or exclusive, as neither way do they add to the stated NET total of £102.23.

VAT of £11.06 is not right on the VAT listed items, whether they are VAT inclusive or exclusive and whether the £2.26 has VAT or not or whether it is VAT inclusive or exclusive.

Basically, I cannot find a way to make it add up at all. Any maths or accounting genius out there that can explain it to me?

P.S. Their customer service number is an expensive 0844 number which is a problem under current legislation!

RevK^®'s ramblings