Showing posts with label gnuplot 4.4. Show all posts
Showing posts with label gnuplot 4.4. Show all posts

Saturday, 12 June 2010

Ministry of Sillier Walks

In my last post, I have shown how we can define or read an array of number from a file. Having constructed the array, the question naturally arises: can we use it for something else, to do something that we could not do otherwise. The answer is yes, although in the present post, there will not be anything that I haven't discussed here or there. We have only got to put the pieces together, and with a little bit of work, we can produce three-dimensional column stacked histograms at ease. I will use the same datafile, but I repeat it here:

"" France Germany Japan Nauru
Defense 9163 4857 2648 9437
Agriculture 3547 5378 1831 1948
Education 7722 7445 731 9822
Industry 4837 147 3449 6111
"Silly walks" 3441 7297 308 7386

Then, let us see, how we are going to make that histogram! For the sake of example, we will draw four cylinders, which will be striped according to the values that the columns in the data file take. We could do two things here. Provided that we have already read the values into an array, we could draw 4 times 5 cylinders of various size and colour. Alternatively, we could draw 4 cylinders, and colour them with stripes that we take from the palette. In this case, we have to find some way of defining the proper palette. Both methods have advantages and disadvantages. The advantage of the first one is that it is faster, because the cylinders are monocolour, which means that we need only two isosamples. On the other hand, we have to have "nested" for loops (which we would just simply write out). The advantage of the second method is that we can get away with one for loop, but we need many isosamples, because the cylinders contain many colour, though the transition between the colours is required to be abrupt. But since we do not know where the boundary between two colours is, we would need many isosamples. Consequently, it will be slow.

When defining our array, we will employ the trick from the last post, and we will do something very similar, when we determine our palette. In addition, we will also have to calculate the sum in the columns, for the cylinders' height should be proportional to that. If, on the other hand, we opt for re-scaling the cylinders to the same height, we, again, need the sum. With these preliminary remarks, the first version of our script could look like this

reset

unset key
unset colorbox
unset xtics; unset xlabel
unset ytics; unset ylabel
unset ztics
file = 'marimekko.dat'   
cylinder = 'cylinder.dat'

set ticslevel 0   
set border 1+2+4+8
set parametric; set urange [0:2*pi]; set vrange [0:1]; set iso 2, 200
set table cylinder
splot cos(u), sin(u), v, cos(u)*v, sin(u)*v, 1
unset table
unset parametric

col = 4 
row = 5 
sm = 0.0
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
h(x,a) = (x <= a ? 0.0 : 1.0)

ARRAY = "b(x, y) = 0"
SUM = "s(x,a) = 0"
PALETTE = "set palette defined (-1 0.7 0 0"

array(x, c) = (sm = h($0,0.5)*sm + x, ARRAY = ARRAY.sprintf(" + g(x,%d)*h(y,%.3f)", c, sm), \
                SUM = SUM.sprintf(" + %f*g(x,%d)", x, c), x )
pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f", c-0.1, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7), x
ff(x, c) = (array(x, c), x)

plot for [i=2:col+1] file every ::1 using 0:( ff(column(i), i) )
plot file every ::1 using 0:(pal(1, column(0)))

eval(ARRAY)
eval(PALETTE.")")
eval(SUM)
         
set xrange [4:5+3*col]
set yrange [-10:3*col-9]

splot for [i=2:col+1] cylinder using ($1+3*i):2:($3*s(i,i)):(b(i,$3*s(i,i))) with pm3d

The first couple of lines set up the graph, and they are trivial, as is the plot to the 'cylinder.dat'. The definition of g(x,a) should be familiar from the last post, and h(x,a) is nothing but the Heaviside function. We have to deal with a matrix, therefore, the string ARRAY begins with "b(x,y) = 0", which will, in due course, become the definition of a two-variable function. SUM, that will bring s(x,a) to life, also appears to define a two-variable function, but this is only apparent: s(x,a) is exactly as much of a two-variable function as is g(x,a). A mere convenience, nothing more. We also begin the definition of a palette, but we stop short of its completion: that will be done during the first plot.

In the array(x,c) function, we read in the numbers from the file, and at the same time, we also calculate the sums in the columns. This function is really similar to that from the last post. We also define a function for filling up the palette. For now, this function writes random colours into the palette at every integer.

Having defined the functions, we "plot" our data file, so as to read in the numbers. We plot the second column of the same file again, in order to prepare our palette. Note that this latter plot is really nothing but a strange way of creating a for loop: we do something as many times as there are elements in a column. In both plots, by applying the every keyword, we skip the first line, which is the column header.

At this point we are almost done: the only remaining thing is the evaluation of our new definitions, and the actual plot. The setting of the xrange and yrange is necessary only in order to ensure that the cylinders are cylinders, not elliptical.

Our first script results in the graph below:
 

This is OK for a start, but could we improve it a bit? With some work, we could. For one thing, we could add the sum at the top of each cylinder. This can easily be done by augmenting the last plot with the line
for [i=2:col+1] file every ::0::0 using (3*i):(0):(s(i,i)+5e3):(sprintf("%d", s(i,i))) w labels

We can also give a title to each cylinder by reading out the values in the first row. This is done by adding
for [i=2:col+1] file every ::0::0 using (3*i):(-2):(0):(stringcolumn(i)) w labels centre

Finally, we can easily add a legend to the figure. All we have to do is to read out the first column of our data file, and draw 5 cylinders with the appropriate colour. We can achieve this by invoking
file using (0):(0):(2e4-$0*5e3):1 w labels right, \
for [i=1:row] cylinder using ($1+5):($2-5.0):($3*2e3+i*5e3):(i-1) with pm3d
in the last plot. With these modifications, the complete script would look like this
reset

unset key
unset colorbox
unset xtics; unset xlabel
unset ytics; unset ylabel
unset ztics
file = 'marimekko.dat'   
cylinder = 'cylinder.dat'

set ticslevel 0   
set border 1+2+4+8
set parametric; set urange [0:2*pi]; set vrange [0:1]; set iso 2, 200
set table cylinder
splot cos(u), sin(u), v, cos(u)*v, sin(u)*v, 1
unset table
unset parametric

col = 4 
row = 5 
sm = 0.0
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
h(x,a) = (x <= a ? 0.0 : 1.0)

ARRAY = "b(x, y) = 0"
SUM = "s(x,a) = 0"
PALETTE = "set palette defined (-1 0.7 0 0"

array(x, c) = (sm = h($0,0.5)*sm + x, ARRAY = ARRAY.sprintf(" + g(x,%d)*h(y,%.3f)", c, sm), \
                SUM = SUM.sprintf(" + %f*g(x,%d)", x, c), x )
pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f", c-0.1, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7), x
ff(x, c) = (array(x, c), x)

plot for [i=2:col+1] file every ::1 using 0:( ff(column(i), i) )
plot file every ::1 using 0:(pal(1, column(0)))

eval(ARRAY)
eval(PALETTE.")")
eval(SUM)
         
set xrange [4:5+3*col]
set yrange [-10:3*col-9]

splot for [i=2:col+1] cylinder using ($1+3*i):2:($3*s(i,i)):(b(i,$3*s(i,i))) with pm3d, \
for [i=2:col+1] file every ::0::0 using (3*i):(0):(s(i,i)+5e3):(sprintf("%d", s(i,i))) w labels, \              
file using (0):(0):(2e4-$0*5e3):1 w labels right, \
for [i=1:row] cylinder using ($1+5):($2-5.0):($3*2e3+i*5e3):(i-1) with pm3d, \  
for [i=2:col+1] file every ::0::0 using (3*i):(-2):(0):(stringcolumn(i)) w labels centre  
and would the following figure:

Well, this is sort of OK, but what if we still do not like it? There are two things that we could easily implement, and would change the character of our graph completely. One is that we can give the cylinders a true 3D lookout, by adding phongs to them. The other one is that we could remove the legends, and add it to one of the cylinders.

So, let us see what we could do in the way of phonging. This is really simple: if we think about it, the phong is nothing but a white spot on our graph, where the saturation of the colours increases towards the centre of the spot. Therefore, all we have to do is to insert white into the palette, but to do it in a way that white saturates all little cylinders. We could, then, modify our palette function as

pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f, %.1f 1 1 1", \
      c-0.01, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7, c+0.99), x)
and add a function that changes the saturation as

colour(x,y) = 0.25*exp(-(x-0.7)**2/0.2-(y+0.7)**2/0.2)
If we look at the palette function, the random numbers will be between 0-0.7, and the function colour(x,y) will add 0.25 to those numbers at the centre of the spot, which, in this particular case, will be at 45 degrees with respect to the x axis. When plotting, we have to add this function to our cylinders.

As for the labels, we might want to place them at the centre of each coloured cylinder in the last cylinder, representing Nauru. That is, we use the first column, and add the labels from that at half of the height of the cylinders whose size is read from the fifth column. We could use this function
lab(x) = (sm = sm + x, sm-0.5*x)
The value of sm is updated when a new value is read from the fifth column, and the return value of the function is just the cumulative sum minus half of the last value. Note that since we use sm, which was also utilised in array(x,c), we will have to re-set its value to zero before we use it.

With these modifications, the complete script reads as
reset

unset key
unset colorbox
unset xtics; unset xlabel
unset ytics; unset ylabel
unset ztics
file = 'marimekko.dat'
cylinder = 'cylinder.dat'

set ticslevel 0
set border 1+2+4+8
set parametric; set urange [0:2*pi]; set vrange [0:1]; set iso 2, 200
set table cylinder
splot cos(u), sin(u), v, cos(u)*v, sin(u)*v, 1
unset table
unset parametric

col = 4
row = 5
sm = 0.0
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
h(x,a) = (x <= a ? 0.0 : 1.0)

ARRAY = "b(x, y) = 0"
SUM = "s(x,a) = 0"
PALETTE = "set palette defined (-1 0.7 0 0, -0.5 1 1 1"

array(x, c) = (sm = h($0,0.5)*sm + x, ARRAY = ARRAY.sprintf(" + g(x,%d)*h(y,%.3f)", c, sm), \
  SUM = SUM.sprintf(" + %f*g(x,%d)", x, c), x )
pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f, %.1f 1 1 1", \
      c-0.01, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7, c+0.99), x)
ff(x, c) = (array(x, c), x)

colour(x,y) = 0.25*exp(-(x-0.7)**2/0.2-(y+0.7)**2/0.2)
lab(x) = (sm = sm + x, sm-0.5*x)
 
plot for [i=2:col+1] file every ::1 using 0:( ff(column(i), i) )
plot file every ::1 using 0:(pal(1, column(0))) 

eval(ARRAY)
eval(PALETTE.")")
eval(SUM)

set xrange [4:5+3*col]
set yrange [-10:3*col-9]

splot for [i=2:col+1] cylinder using ($1+3*i):2:($3*s(i,i)):(b(i,$3*s(i,i))+colour($1,$2)) with pm3d, \
for [i=2:col+1] file every ::0::0 using (3*i):(0):(s(i,i)+5e3):(sprintf("%d", s(i,i))) w labels, \
sm = 0, file using (3*col+4):(1):(lab($5)):1 w labels left, \
for [i=2:col+1] file every ::0::0 using (3*i):(-2):(0):(stringcolumn(i)) w labels right rotate by 30

I think I cannot add more to this figure, we have explored and exhausted all possibilities here. Next time I will come back to an older topic from this blog, and show how that can be done in an elegant way, applying the new functionalities of gnuplot 4.4.
Cheers,
Zoltán

Monday, 26 April 2010

Bending the arrows - "delaying" the plot

The other day, I would have needed a couple of curved arrows on my plot, so I started to work out a method to get what I wanted. This, however, turned out to be rather interesting, so I thought that I would share the details with you.

First, we should just define what I mean by a curved arrow. Perhaps, the easiest way to define it is to show a plot, similar to this


In gnuplot, when one wants an arrow, one can invoke the following command:
set arrow from 0,0 to 1,1
or something similar. This will produce a straight arrow from (0,0) to (1,1). But what if we wanted to have an arrow, which is not straight. Well, in this case, we set a very short arrow, and draw a curve separately. The key to this is to set the arrow in such a way that it is tangential to the curve at the end point. It is easy to see that the following script would just do that

reset
unset key
eps = 0.001

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1
cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)
f(x) = 0.5+(x-1)*(x-1.2)*(x-1.4)

x1 = 0.5
x3 = 1.95
new_x = x1 + (1.0-eps)*(x3-x1)
set arrow from new_x, f(new_x) to x3,f(x3) as 1
plot [0:3] sin(x) with point ps 1 pt 6, f(x)*cut(x,x1,new_x) with line lt -1

First, we define an arrow style that we will use later. The arrow will be 0.03 screen sizes big, and the two angles determining the shape of the head are 15, and 45 degrees, respectively. Finally, we stipulate that the arrow be black, i.e. linetype -1. Then we define a window function, cut, which depends on the two end points, x1, and x3 (the reason for 3 will become clear soon), and the curve, f(x). In our plot, beyond what we actually want to plot, we will also plot f(x), but only between x1, and new_x, where new_x is a bit off with respect to the second end point. The degree of "bitness" is given by eps, which was defined at the beginning. However, before we actually plot anything, we have got to set the arrow, between new_x, f(new_x), and x3, f(x3). This construction ensures that the arrow is tangential to the curve.
At this point, we are ready to plot, which we actually execute in the next, and last line.

What we have created is great, but there are problems: first, we have to define our function, f(x), beforehand, we have to set the arrows by hand, and we also have to add the appropriate lines to our plot command. Quite tedious. There has got to be a better way!

For the say of example, let us suppose that we want a curved arrow that, say, connects (0,0) and (1,1) via a parabola that passes through the point (0.5, 0.25). If we are really pressed for it, we could do the following: First, we have to figure out the parameters of our parabola. In this case, it is quite easy, for it is nothing but x*x. Then we would draw a parabola between (0,0), and (0.99, 0.9801), and then draw an arrow from (0.99, 0.9801) to (1,1).

First, let us see, how we figure out the parameters of our parabola! We have two end points, and a "control" point, i.e., we have to solve the following set of equations
y1 = a*x1*x1 + b*x1 + c
y2 = a*x2*x2 + b*x2 + c
y3 = a*x3*x3 + b*x3 + c
for the unknown a, b, and c. You can convince yourself that the following will do

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)
a = A(x1,y1,x2,y2,x3,y3)
b = B(x1,y1,x2,y2,x3,y3)
c = C(x1,y1,x2,y2,x3,y3)

We have done most of the hard work, the only thing that remains is how we "automate" this whole machinery, i.e., what do we do, if we have several arrows that we want to set. Again, as so many times in the past, we will utilise this new notion of function definition: the fact that a function is not only a x -> f(x) mapping, but this mapping, and a set of possibly unrelated instructions. What we will do is to define a "function" that sets our arrows, and, as the supplementary instruction, augments the plot command accordingly. First, let us take the following function definition


arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
        a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
        PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
        ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))
and try to understand what it does! For a start, it takes 6 arguments, which are nothing but the coordinates of the end points, and the control point. Then, it defines new_x, which we have already seen in the first example. In the next step, based on the 6 input arguments, calculates the three parameters of our parabola, and in the next line, adds the plot of this parabola to a string called PLOT. When adding to PLOT, we simply use the sprintf function. In the last line, we concatenate a string called ARROW, and another one, produced by another sprintf. It is easy to see that this sprintf returns the definition of an arrow between new_x, f(new_x), and x3, f(x3). We should also note that this line is the last line, which consequently means that whatever happens here is returned.

At this point we are really done, we only have to "populate" our plot. The full script takes on the form
reset
unset key
eps = 0.01

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1

cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)

ARROW = ""
PLOT = "p [0:3] sin(x) w p ps 1 pt 6"
arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
        a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
        PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
        ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))

ARROW = arrow(0,0,1,1.5,pi/2,1.03)
ARROW = arrow(0,0,1,0.3,pi/2,0.97)
eval(ARROW)
eval(PLOT)
which would result in the graph shown here:


Now it is clear what was PLOT: it is nothing, but the actual plot that we want to have. This is the string to which we concatenate our parabolae, one by one, every time we define a new arrow. After we defined all our arrows, we have two strings, ARROW, and PLOT. As such, they are no good, they will become instructions only when we evaluate them. That is what we do in the last two lines.

I would like to point out that my main reason for posting this was not that it can be used for creating curved arrows, but that this method is quite general. First, we can add to the plot, if that is needed, without having to keep track of all the tiny details. Second, the set command can be "fooled" by using the sprintf function. With the help of the string augmentation and the eval command, we can actually use parameters in our set instruction very efficiently.

Well, this is for today. I am waiting for suggestions as to what we should discuss next time. Cheers,
Zoltán

Tuesday, 16 March 2010

Bubble plots

Yesterday, I discussed a method for adding an edge to an arbitrary symbol. If you recall (or roll down on this page), the idea was to trick gnuplot into plotting our data file twice, but in a way that each point was plotted twice in succession. Now, what if we plotted more times? There was really nothing special about the number 2, so there is no reason why we could not do this. But if we can, then we should, and see what comes out of it. With very small modifications, our script from yesterday can be turned into a bubble graph, like this


So, let us see how the machinery works!

reset
plot 'new_bubble1.dat' u 0:2
red_n = GPVAL_DATA_X_MAX

plot 'new_bubble2.dat' u 0:2
blue_n = GPVAL_DATA_X_MAX

plot 'new_bubble3.dat' u 0:2
green_n = GPVAL_DATA_X_MAX

rem(x,n) = x - n*(x/n)
size(x,n) = 3*(1-0.8*rem(x,n)/n)
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n))
blue(x,n) = sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255)
green(x,n) = sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n))

posx(X,x,n) = X + 0.03*rem(x,n)/n
posy(Y,x,n) = Y + 0.03*rem(x,n)/n

unset key
set border back
level = 40
plot for [n=0:level*(red_n+1)-1] 'new_bubble1.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level*(blue_n+1)-1] 'new_bubble2.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level*(green_n+1)-1] 'new_bubble3.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
Again, the first three plots are there for determining the sample size, and nothing more. We, thus, start out with a number of function definitions. The first one is a remainder function, the second one uses the remainder to return the size of the bubble, the third one is a simple helper function, returning values between 0 and 240, and red, blue, and green determine the colour of our bubbles. If you look carefully, you will notice that these colours are successively whiter as the remainder increases. Finally, again by making use of our remainder function, we define two position shifts: in order to give the impression that the bubbles are lit from the top right corner, we have to shift successive circles in that direction. The value of this shift is important in the sense that, if chosen too high, the circles belonging to the same data point will no longer cover each other. (This is not necessary a tragedy, see below.)

Then we decide to have 40 colour levels (we could have anything up to 255, although it might be a bit time consuming and unnecessary), and call our plots. The structure is the same as it was yesterday: we use a for loop for each data set, move the circles a bit, and set the colours to whiter shades. That is all.

Now, what happens, if we take too big a value for the shift? This, actually, might lead to interesting effects, as shown in this graph, where droplets represent the data points.




After having seen the simplest implementation, we should ask whether it is possible to add some decorations. E.g., whether it is possible to add a thin black edge to the symbols. It is relatively simple, as the following script shows. We only have to re-define some of our functions as follows
size(x,n) = (rem(x,n) == 0 ? 3.3 : 3*(1-0.8*rem(x,n)/n))
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n)))
blue(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255))
green(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n)))

posx(X,x,n) = (rem(x,n) < 2 ? X : X + 0.03*rem(x,n)/n)
posy(Y,x,n) = (rem(x,n) < 2 ? Y : Y + 0.03*rem(x,n)/n)
All these functions do is to check whether we are plotting the first round, and if so, set the colour to black. There is a small difference in the shifts, for we do not move the circles, if they are in the first or the second round. The reason is obvious, as is the result

OK, so we can plot bubbles, with or without black circumference, but we would also like to add a legend. Well, that is simple, in fact, nothing could be simpler. Just add the following the following three lines to our code

set label 1 'Red bubbles' at 9,6 left
set label 2 'Blue bubbles' at 9,5 left
set label 3 'Green bubbles' at 9,4 left
and the following six
for [n=0:level-1] 'new_bubble1.dat' using (posx(8.5,n,level)):(posy(6,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level-1] 'new_bubble2.dat' using (posx(8.5,n,level)):(posy(5,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level-1] 'new_bubble3.dat' using (posx(8.5,n,level)):(posy(4,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
and we are done! All we do here is to plot our data files in a silly way: we plot a single point at (8.5,6), (8.5,5), and (8.5,4). The plotting of the data file does not happen in this sense, we use it for convenience's sake only. (This trick can also be used for the post from yesterday.) There, you have it!

Defining new symbols

Some time ago, I showed a method with which we could add a "frame" to a symbol. If you recall, what we did was to plot everything twice, and in order to duplicate our data set, we used a simple gawk script. Now, there is another way of doing this, one which does not rely on the gawk script, in fact, on any external script. I will discuss this method today. The gist of the trick is discussed in the old post, therefore, you are encouraged to cast, at least, a cursory glance at that, if you haven't yet done it.

As I have already pointed out, we had to duplicate our data set. To be more accurate, we haven't got to duplicate anything, we have simply got to plot the data twice. Now, the difficulty is that is we do this in a primitive way, issuing the plot command twice, and taking the same data set, the points might overlap, and leads to some undesired results. So, the task is to plot the data set twice, but to plot each plot twice, and not the data set as a whole. For this, we will use the for loop introduced in gnuplot 4.4, and the 'every' keyword. To cut a long story short, I give my script here, and discuss it afterwards.
reset 
plot 'new_symbol1.dat' u 0:2
red_n = GPVAL_DATA_X_MAX

plot 'new_symbol2.dat' u 0:2
blue_n = GPVAL_DATA_X_MAX

plot 'new_symbol3.dat' u 0:2
green_n = GPVAL_DATA_X_MAX

parity(n) = (n/2.0 == int(n/2.0) ? 0 : 1)
size(n) = 2 - parity(n)*0.4
colour(n,r,g,b) = sprintf("#%02X%02X%02X", parity(n)*r, parity(n)*g, parity(n)*b)

unset key
set border back
plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0) ,\
for [n=0:2*blue_n+1] 'new_symbol2.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 9 ps size(n) lc rgb colour(n,100,100,255) ,\
for [n=0:2*green_n+1] 'new_symbol3.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 5 ps size(n) lc rgb colour(n,0,150,0)
Then, let us see what we have here! The first 6 lines are only to retrieve the number of data points in our data sets. If you know this from somewhere else, you can skip these, with the caveat that 'red_n', 'blue_n', and 'green_n' should still be defined somewhere.

Next we define three functions, the first of which determines the parity of an integer, returning 1, if the number is odd, and 0, if it is even. The second function returns a number, depending on the parity of its argument. Surprising as it is, this function will determine the size if the symbol, when we plot. Finally, the third function returns a string, which is equal to the colour given by the triplet (r,g,b), if the first argument, 'n', is odd, and black, if the first argument is even. At this point, it should be clear that we could have defined a function that returns a different colour for even numbers.

We are done with everything, but the plotting, so let us do that! As you see, for each data set, we step through the numbers, but not once, but twice: first plotting in black, and second, plotting with some decent colour. At the same time, we change the symbol size, so that the black symbols are always a bit bigger, than the red, blue, or green. Once all three plots have been called, the following graph will appear:

We can see that the symbols overlap each others, as they should. Now, what about the keys, should we need them? Well, that requires some handwork, but it is not hard, actually. The following self-explanatory script should do
set label 1 'Red symbols' at 1.3, 8 left
plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
n=0, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
n=1, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0)
1 8
e
1 8
e

and this produces the following figure

Thursday, 25 February 2010

Parametric plot from a file II.

As I promised yesterday, we will take a closer look at the pie chart, once more, and see how we can utilise what we have learnt recently. I should point out here, that this is not the only way of plotting a pie from a file. If you feel like building your gnuplot from source, you can check out either the CVS tree, or the patch tracker, where you can find a patch that makes it possible to plot slices. You can see a demo here. But we will try a different route here.

First, here is our data file (it could be anything, really)
1 1 Dolphins
2 1 Whales
2 0 Sharks
3 0 Penguins
4 1 Kiwis
5 0 Tux

and here is our script
reset
unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
set urange [0:1]
set vrange [0:2*pi]
set macro
sum = 0.0
ssum = 0.0
n = 0
PLOT = "splot 0, 0, 1/0 with pm3d"

count(x) = (ssum = ssum + $1, 1)
g(x,y,n) = \
sprintf(", \
u*cos((%.2f+%.2f*v)/ssum), \
u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", x, y, x, y, n)

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"
set parametric; set pm3d map; 
eval(PLOT)

There is really nothing that we haven't discussed before: we set a couple of things at the beginning, but most importantly, the macro, and sum, ssum, and n. Then we define a string, PLOT, and two functions. One is to sum the values in our file (we need this, so that we can scale the full range of angles to two pi), and another one, that writes our PLOT command for later use. Note that the first plot in PLOT is actually empty, we plot 1/0. This seems a bit silly, doesn't it? Well, it does, but there is a good reason: successive plots must be separated by a comma, and if we have an empty plot at the very beginning, then we can put the commas before the plots, not after, and in this way, we needn't keep track of which plot we are actually processing. Remember, yesterday we used a separate counter, and an if statement, to determine, whether we need the comma, or not. This we can avoid here.
Next we call the two dummy plots, and finally, we evaluate our PLOT string. Oh, no! At the very end, we marvel in awe at the figure that we produced.

So far, so good, but what if we wanted to add labels, e.g., the value of the slice? That is really easy. All we have to do is to define a function that produces the label. Here is our updated script
reset
unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
set urange [0:1]
set vrange [0:2*pi]
set macro
sum = 0.0
ssum = 0.0
n = 0
PLOT = "splot 0, 0, 1/0 with pm3d"
LABEL = ""

count(x) = (ssum = ssum + $1, 1)
g(x,y,n) = \
sprintf(", \
u*cos((%.2f+%.2f*v)/ssum), \
u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", x, y, x, y, n)

lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f; ", \
x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, sprintf("%2.f", x)), \
sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"
set parametric; set pm3d map; set border 0; unset tics; unset colorbox;
set size 0.6,1
eval(LABEL)
eval(PLOT)
We have an addition string, LABEL, which we initialise with the value "". Then we define a function that prints "set label ..." with the proper positions, and finally, we insert this function in the definition of f(x). Of course, once we called f(x) in the plot, we have to evaluate the string LABEL. So, this is what we get

This script can trivially be modified to print strings that are stored in our file. Watch just the following two lines
...
lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3)), \
sum = sum+x, n = n + 1, x)
...
and we are done. Here is the new pie

Now, you might wonder why we had three columns in our data file, if we didn't want to use it. Well, perhaps, we wanted, just haven't got time till now. So, what could we do with those ones and zeros in the second column? We will make the pie explode! It is really simple, we have to modify two lines in our last script

...
g(x,y,n,dx,dy) = \
sprintf(", \
%.2f+u*cos((%.2f+%.2f*v)/ssum), \
%.2f+u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", 0.2*dx, x, y, 0.2*dy, x, y, n)

lab(alpha, x, r) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
x, (1.25+0.2*r)*cos(alpha), (1.25+0.2*r)*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n, \
$2*cos(2*pi*sum/ssum+pi*x/ssum), \
$2*sin(2*pi*sum/ssum+pi*x/ssum)), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3), $2), \
sum = sum+x, n = n + 1, x)
...
And here is the pie, when exploded

I should mention here that if you are not happy with the colours, it is really easy to help it: all we have to do is to modify the colour palette, using whatever colour combinations. We have covered a lot of material today. Till next time,
Gnuplotter

Wednesday, 10 February 2010

The map, the inline function, and the macro

Some time ago, on the 26th of July, to be more accurate, I showed how somewhat decent-looking maps can be created with gnuplot. With the wisdom of hindsight, that was a rather ugly hack, I must say. Even worse, it seems that it is not quite fail-safe. At least, I have obtained reports complaining about it. Could we, then, do better? Could we, perhaps, throw out that disgusting gawk script, with all the hassle that comes with it? Could we, possibly, manage the whole affair in gnuplot? Sure, we could. And here is how, just keep reading!

On the 17th of January, we saw that in the new version of gnuplot, functions take on a funny property, namely, they can contain algebraic statements not related to the return value. We also saw that this feature can be used to perform searches of some sort: as we "plot" a file, and step through the numbers in the file, we can assign values to variables, provided that some conditions are fulfilled. It is easy to see that in this way, we can determine the minimum or the maximum of a data file, e.g. But we can do much more than that.

We should also recall from that old post on the map what the contour file looks like. In case you have forgotten, here is a small section of it
# Contour 0, label:        2
-0.391812  3.63636  2
...
-0.959596  3.50978  2

-0.959596  3.50978  2
...
-0.391812  3.63636  2


# Contour 1, label:      1.5
-1.20098  4.51515  1.5
-1.16162  4.54423  1.5
-1.15982  4.54545  1.5
...

What we have to realise is the following: first, contours lines belonging to the same level are not necessarily contiguous (this is quite obvious, for there is no reason why they should be), and if there is a discontinuity, it manifests itself in a single blank line in the contour file, and second, contour lines belonging to different levels are separated by two blank lines. So, in the data file above, there is a blank between the lines -0.959596 3.50978 2, and
-0.959596 3.50978 2, and there are two blanks between -0.391812 3.63636 2, and # Contour 1, label: 1.5. By the way, the third column is the value of that particular contour line.

This observation has at least one important consequence: we can decide which contour line we want to plot, simply by using the index keyword. You might recall, that indexing the data file pulls out one data block, which is defined by a chunk of data flanked by two blank lines.

Now, what about the labels, and the white space that they need? Well, the white space is quite easy: what we will plot is not the contour line, but a function, which returns an undefined value at the place of the white space, e.g., this one (whatever eps and xtoy mean)
f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)
Normally we would plot the contour lines as
plot 'contour.dat' using 1:2 with lines
but instead of this, now we will use this
plot 'contour.dat' using 1:(f($1,$2)) with lines
This will leave out those points which are too close to (x0, y0). And the labels? Well, that is not difficult either. Take this function
lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")
and this plot
plot 'contour.dat' using 1:2:(lab($1,$2)) with labels
This will put the labels at (x0, y0), and even better, we haven't got to set the labels by hand, they are taken from the data file.

So, we have seen how we can plot the contour, leave out some white space, and then put a label at that position. The only remaining question is how we determine where the label should be. And this is where we come back to our inline functions. For the sake of example, let us take this function and the accompanying plot

g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)
plot 'contour.dat' using 1:(g($1,$2))
What on Earth does this plot do? The plot itself does absolutely nothing: it is always 1/0. However, while we are doing this, we set the value of x0, and y0, if the two arguments are not too close to the edge of the plot. This latter condition is needed, otherwise labels could fall on the border, which doesn't look particularly nice.

By now, we have all the bits and pieces, we have only got to put them together. Let us get down to business, then!

I will split the script into two: the first produces the dummy data, while the second does the actual plotting. So, first, the data production.
reset
filename = "cont.dat"
xi = -5; xa = 0; yi = 2; ya = 5;
xl = xi + 0.1*(xa - xi); xh = xa - 0.1*(xa-xi);
yl = yi + 0.1*(ya - yi); yh = ya - 0.1*(ya-yi);
xtoy = (xa-xi) / (ya-yi)
set xrange [xi:xa]
set yrange [yi:ya]
set isosample 100, 100
set table 'test.dat'
splot sin(1.3*x)*cos(.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)
unset table
set cont base
set cntrparam level incremental -3, 0.5, 3
unset surf
set table filename
splot sin(1.3*x)*cos(0.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)
unset table

What we should pay attention to here is the definition of a handful of variables at the very beginning. Some are already obvious, like xi, xa and the like, and some will become clear in the second part. Now, the plotting takes place here

reset
unset key
set macro
set xrange [xi:xa]
set yrange [yi:ya]

set tics out nomirror
set palette rgbformulae 33,13,10
eps = 0.05

g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)
f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)
lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")

ZERO = "x0 = xi - (xa-xi), y0 = yi - (ya-yi), b = b+1"
SEARCH = "filename index b using 1:(g($1,$2))"
PLOT = "filename index b using 1:(f($1,$2)) with lines lt -1 lw 1"
LABEL = "filename index b using 1:2:(lab($1,$2)) with labels"

b = 0
plot 'test.dat' with image, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL

A bit convoluted, isn't it? OK, we will walk through the script, line by line.

First is the range setting, and then ticks go to the outside, just for aesthetic reasons. We also define eps here, which determines how much "white space" we have for the labels. Then we define the three functions that we discussed above. We have already seen eps, and the meaning of it, but what about xtoy? Despite its name, this is not something to play with, rather the ratio of x to y, or more precisely, the ratio of the xrange to the yrange. This is needed, if the two ranges are of different order of magnitude, e.g., if xrange is something like [0:1], while yrange is [0:1000]. But this ratio is automatically calculated at the beginning, you haven't got to worry about it.

After this, we define 4 macros. These are abbreviations for longer chunks of code, and make life really easier. The idea is that when confronted with a macro, gnuplot expands it as a string, and then acts accordingly. In my opinion, if written properly, macros can make the script rather readable.

The first of the macros, ZERO, is needed, because in our SEARCH macro, which is nothing but a call to the function g(x,y), if the condition is not satisfied for a particular data block, then x0, y0 wouldn't be updated, therefore, the label would end up at the wrong position. At the same time, ZERO also increments the value of b, which determines which data block we are actually plotting. b is used in the indexing in the macros SEARCH, PLOT, and LABEL. We have already mentioned SEARCH, PLOT plots the contour with the white space at the position given by x0, and y0 (this is calculated in the SEARCH macro), and finally, LABEL places the value of the contour line at that position.

At this point, we have defined everything, all that is left is plotting. We do it 13 times, because our zrange, or the contour lines were given between -3, and 3, with steps of 0.5. In this particular case, there are only 10 contour lines, and gnuplot will complain that the last 3 data blocks are empty, but this is not an error, only a warning. Shouldn't we look at the figure, perhaps? But of course! Here it is:



The only thing that I should like to point out is that the white space is made for a particular contour line, but there is no guarantee that, if the contour lines are too close to each other, the label does not cover a neighbouring contour line. If that happens, I would simply suggest to increase the contour spacing by incrementing the parameter in the set cntrparam line.

I hope that this method proves better, than the other one, and that it will be easier to use. In the next post, I will re-visit the inline functions, and show a nifty trick with them. Cheers,
Gnuplotter

Sunday, 17 January 2010

Further new features in gnuplot 4.4

Today I would like to discuss another new feature in gnuplot 4.4. This will be the notion of "inline" data manipulation. I don't really know what the proper name would be for this feature, so I will just show what it is.

Normally, when one plots a function or file, the command has the following structure
plot 'foo' using 1:2 with line, f(x) with line

That was the old syntax. In the new version of gnuplot, we can insert arithmetic expressions in the plot command as follows

f(x) = a*sin(x)
plot a = 1.0, f(x), a = 2.0, f(x)

Now, this has some implications. First, one has to be a bit careful, because the arithmetic expressions are separated from the actual function by a comma. However, the 'for' loop that we discussed a week ago, reads statements up to the comma, and then returns to the beginning of the statement. In other words,
plot for [i=1:10] a = i, f(x)
will evaluate the expression a = i ten times, and then plots f(x). At that point, the value of 'a' will be 10, therefore, we have only one plot, and that will be 10*sin(x).

The second implication is that the notion of a function has completely changed. What we do in a plot command now is no longer a mapping of the form
x -> f(x)
but rather, the evaluation of a set of instructions, one of which is the above-mentioned mapping. But the crucial point here is that the mapping is not the only allowed statement. The upshot is that "functions" have become a set of operations, and the following statement is completely legal
f(x) = (a = a+1.0, a*sin(x))
a = 0.0
plot for [i=1:10] f(x)

(It is a complete different question, whether this plot makes any sense...)

What we should notice is the fact that now a function can have the form of
f(x) = (statement1, statement2, statement3, return value)
and when the function is called, statement1, statement2, statement3 are evaluated, and 'return value' is returned. We should not underestimate the significance of this! Many things can be done with this. I will show a few of them below.

The first thing I would like to dive into is calculating some statistics of a file. Let us see how this works!

reset
set table 'inline.dat'
plot sin(x)
unset table
num = 0
sum = 0.0
sumsq = 0.0

f(x) = (num = num+1, sum = sum+x, sumsq = sumsq+x*x, x)
plot 'inline.dat' using 1:(f($2))

print num, sum, sumsq
which will print out
100 -1.77635683940025e-15 0.295958848441
We expected this, for the number of samples is 100 by default, and the sum should be 0 in this case.

So, what about finding the minimum and its position in a data file? This is quite easy. All we have to do is to modify our function definition, and insert a statement that determines whether a value is minimal or not.

reset
set table 'inline.dat'
plot sin(x)
unset table

num = 0
min = 1000.0
min_pos = 0
min_pos_x = 0

f(x,y) = ((min > y ? (min = y, min_pos_x = x, min_pos = num) : 1), num = num+1,  y)
plot 'inline.dat' using 1:(f($1, $2))

print num, min, min_pos_x, min_pos
which prints

100 -0.999385 4.74747 73
i.e., the minimum is at the 73rd record (we count from 0), at x = 4.74747, and its value is -0.999385. Note that instead of an 'if' statement, we use the ternary operator to decide whether min, min_pos_x, and min_pos should be updated.

The implementation of calculating the standard deviation, e.g., should be trivial:
sum = 0.0
sumsq = 0.0
f(x) = (num = num+1, sum = sum + x, sumsq = sumsq + x*x, x)
plot 'inline.dat' using 1:(f($1))

print num, sqrt(sumsq/num - (sum/num)*(sum/num))
We have thus seen how the "inline" arithmetic can be used for calculating quantities, e.g., various moments, minima/maxima and their respective positions. These involve the sequential summing or inspection of the data set. But this trick with the function definition can be used for back-referencing, too. This is what we will discuss next.

The trick is to use a construct similar to this

backshift(x) = (prev = pres, pres = x, prev)

which will store the last but one value in the variable 'prev', and return it. That is, the following code shift the whole curve to the right by one
reset
set table 'inline.dat'
plot sin(x)
unset table

pres = 0.0

backshift(x) = (prev = pres, pres = x, ($0 > 0 ? prev : pres))
plot 'inline.dat' using 1:(backshift($2)) with line, '' u 1:2 with line
(In cases like this, we always have to decide what to do with the first/last data record. In this particular case, I opted for duplicating the first record, - this is what happens in the ternary operator - but this is not the only possibility.) If, for some reason, you have to shift the curve by more, you do the same thing, but multiple times. E.g., the following code shifts by 3 places.

backshift(x) = (prev1 = prev2, prev2 = prev3, prev3 = x, prev1)

Once we have this option of back-referencing, we should ask the question what it can be used for. I show two examples for this.
The first example is drawing arrows along a line given by the data set. Drawing arrows one by one is done by using
set arrow from x1,y1 to x2,y2
but we have to use a different method, if we want to plot the arrows from a file. Incidentally, there is a plotting style, 'with vectors', that works as
plot 'foo' using 1:2:3:4 with vectors
where the first two columns specify the coordinates of the beginning, and the second two columns specify the relative coordinates of the vectors. So, it works on four columns. What should we do, if we want to plot vectors from the points in a file. Well, we use the back shift that we defined above. Our script is as follows:
reset
unset key
set sample 30
set table 'arrowplot.dat'
plot [0:3] sin(x)+0.2*rand(0)
unset table

px = NaN
py = NaN
dx(x) = (xd = x-px, px = ($0 > 0 ? x : 1/0), xd)
dy(y) = (yd = y-py, py = ($0 > 0 ? y : 1/0), yd)

plot 'arrowplot.dat' using (px):(py):(dx($1)):(dy($2)) with vector
which results in the following figure:

Note that we used the ternary operator to get rid of the very first data point. This is needed, because the arrows connect two points, that is, there will be one less arrow, than data points.

In the second example, we will turn this around. In my post in last August, plotting the recession, I showed how the background of a plot can be changed, based on whether the the curve is dropping, or increasing. Let us take the following script
reset
set sample 20
set table 'inline.dat'
plot [0:10] exp(-x)+1.0+rand(0)
unset table

unset key

px = 0
py = 1000
dx(x) = (xd = x-px, px = x, xd)
dyn(y) = (yd = y-py, py = y, (yd < 0 ? yd : 1/0))
dyp(y) = (yd = y-py, py = y, (yd >= 0 ? yd : 1/0))

plot 'inline.dat' using (px):(py):(dx($1)):(dyp($2)) with vector nohead lt 1 lw 3, \
px = 0, py = 0, '' using (px):(py):(dx($1)):(dyn($2)) with vector nohead lt 3 lw 3
which creates the following graph
First we produce some data; old trick. Then we take our difference functions, in this case, three of them. The first one is identical to that in the previous script. The second and the third are identical, except that the second returns a sensible value, if and only, if the slope is negative, while the third one returns 1/0, if the slope is negative. Then we just plot our data, making sure that we re-initialise px, and py before the second plot. Simple.

Another utilisation of the back reference can be found on gnuplot's web site, under running averages.

Next time I will try to go a bit further, and demonstrate some other uses of the inline data processing.
Cheers,
Gnuplotter

Sunday, 10 January 2010

Plot iterations and pseudo-files

As I promised some time ago, I will discuss some of the new features in gnuplot 4.4. The first one that I would like to show is the concept of iteration in the plot command, and the concept of certain pseudo-files. If you have ever had to script the creation of your plots, you will appreciate these features.

First, let us see what the for loop looks like in the plot command. There are forms of it: once one can loop through integers, while in the other case, one can step through a string of words. So, the iteration either looks like

plot for [var = start : end {:increment}]

or

plot for [var in "some string of words"]

After this introduction, let us see how these can be used in real life. The first example that I will show is that of waterfall plots, i.e., when a couple of curves are plotted on the same graph, and they are shifted vertically. This is common practice, when one has several spectra, and wants to show the effect of some parameter on the spectra.

reset
f(x,a) = exp(-(x-a)*(x-a)/(1+a*0.5))+0.05*rand(0)
title(n) = sprintf("column %d", n)
set table 'iter.dat'
plot [0:20] '+' using (f($1,1)):(f($1,2)):(f($1,3)):(f($1,4)):(f($1,5)):(f($1,6)) w xyerror
unset table

set yrange [0:15]
plot for [i=1:6] 'iter.dat' u 0:(column(i)+2*i) w l lw 1.5 t title(i)

I would like to walk through the script line by line, for there is something unusual in almost each line. So, after re-setting the gnuplot session, we define a function, which will be a Gaussian, whose centre and width is determined by the parameter 'a'. We then plot this function to a file, 'iter.dat', and do it 6 times, and each time with a different parameter, so that the Gaussian is shifted, and becomes broadened. Note, however, that we do this by plotting a special file, '+'. This was introduced in gnuplot 4.4, and the purpose of this special file is that by invoking this, one can use the standard plot modifiers even with functions. I.e., we can specify 'using' for a function. The importance of this is that many plot styles require several columns, and we could not use those plot styles with functions without the '+' pseudo-file. Consider the following example

reset
unset colorbox
unset key
set xrange [0:10]
set cbrange [0:1]
plot '+' using ($1):(sin($1)):(0.5*(1.0+sin($1))) with lines lw 3 lc palette, \
'+' using ($1):(sin($1)+2):($1/10.0) with lines lw 3 lc palette
which produces the following graph

We can thus colour our curve by specifying the colour in the third column of the pseudo-file. Of course, this is only one possibility, and there are many more. If one wants to plot 3D graphs, then the pseudo-file becomes '++', but the concept is the same: the two variables are denoted by $1 and $2, and the function is calculated on the grid determined by the number of samples and the corresponding data range.

Now, back to the iteration loop! We produce 6 columns of data by plotting '+' by invoking a plot style that requires 6 columns. In this case, it is the xyerrorbars. Having created some data, we plot each column, but we call plot only once: the iteration loop does the rest. In each plot, the curve is shifted upwards, and the title is taken from the column number. For specifying the title, we use the function that we defined earlier: it takes an integer, and returns a string. At the end of the day, we have this graph

This was an example, when we plot various columns from the same file. We can also use the iteration to plot different files. When doing so, there are two options available. One is that we simply specify the file names in a string, as below

reset
filenames = "first second third fourth fifth"
plot for [file in filenames] file using 1:2 with lines
which will plot files 'first', 'second', 'third', 'fourth', and 'fifth'. At this point, note that 'file' is a string, i.e., we can manipulate it as a string. E.g., if we wanted to, instead of 'first', 'second', etc., plot 'first.dat', 'second.dat', and so on, we would do this as
reset
filenames = "first second third fourth fifth"
plot for [file in filenames] file."dat" using 1:2 with lines

The second option is, if the data files are numbered, e.g., if we have 'file_1.dat', 'file_2.dat', and so on, we can use the iteration over integers as follows
reset
filename(n) = sprintf("file_%d", n)
plot for [i=1:10] filename(i) using 1:2 with lines
which will plot the second column versus the first column of 'file_1.dat' through 'file_10.dat'.