Saturday, October 20, 2018

Scala Numerical Performance with Scala Native and Graal

This post is a follow-on to my earlier post looking at the performance of different approaches to writing an n-body simulation using Scala. That post focused on the style of Scala code used and how that impacted the performance. This post uses that same code, so you should refer to it to see what the different types of simulations are doing.


Comparing Runtimes

For someone who is interested in doing numerical simulations, Scala and the JVM might seem like odd choices, but the benefits of having a good language that is highly expressive and maintainable can be significant, even for numerical work. In addition to the impact of programming style and the optimizations done by the compiler that produces JVM bytecode, performance can also be significantly impacted by the choice of runtime environment.

Historically, there wasn't much in the way of choice. Sun, then later Oracle, made the only JVM that was in even reasonably broad usage, and enough effort was put into making the hotspot optimizer work that it was generally a good choice. Today, however, there are a few more options. If you are running Linux, odds are good that you have OpenJDK by default and would have to specifically download and install the Oracle version if you want it. In addition, Oracle has recently been working on Graal, a new virtual environment for both JVM and non-JVM languages. Part of the argument for Graal was that the old C2 hotspot compiler, written in C++, had simply because too brittle and it was hard to add new optimizations. Graal is being built fresh from the ground up using Java, and many new types of analysis are included in it. While I have seen benchmarks indicating the Graal, though young, is already a faster option for many Scala workloads, I wasn't certain if that would be the case for numerical work. This is at least in part due to the fact that one of the Graal talks this last summer at Scala Days mentioned that Graal was not yet emitting SIMD instructions for numerical computations.

In addition, the newest addition to the list of supported environments for Scala is Scala Native. This project uses LLVM to compile Scala source to native executables. One of the main motivators for this right now is using Scala for batch processing because native executables don't suffer from the startup times of bringing up the JVM. This project is still in beta, but I wanted to see if it might be able to produce executables with good numerical performance as well.

For these benchmarks, the Scala code was compiled with -opt:_ and run on each JVM with no additional options. I am using a different machine from my earlier post, which explains the significant runtime differences between this post and the earlier one using a similar JVM. The following table gives timing results for the five approaches using the five different runtimes.

EnvironmentStyleAverage Time [s]Stdev [s]
Oracle JDK 8-191Value Class0.3940.012
Mutable Class0.6830.015
Immutable Class0.8090.010
Functional 14.2460.439
Functional 21.7230.027
Oracle JDK 11Value Class0.3780.006
Mutable Class0.6900.012
Immutable Class0.9400.083
Functional 14.4200.059
Functional 21.5890.021
OpenJDK 10Value Class0.3880.008
Mutable Class0.7150.006
Immutable Class0.8920.013
Functional 14.4050.039
Functional 21.6890.013
GraalVM 1.0.0-rc7, Java 1.8Value Class0.3770.003
Mutable Class0.3960.003
Immutable Class0.6940.108
Functional 14.0540.151
Functional 20.7930.016
Scala Native 0.3.8Value Class2.6030.185
Mutable Class1.0280.020
Immutable Class2.5950.020
Functional 116.841.39
Functional 25.2320.655

Looking at the first three runtimes, we notice that there is very little difference between Oracle and OpenJDK over various Java versions from 8 to 11. In all three, the value class approach is fastest by far followed by the version with the mutable classes, then the immutable classes with functional approaches being slowest by a fair margin.

Things get more interesting when we look at Graal. The performance of the value class version is roughly the same as for the other JVMs, but every other version runs significantly faster under Graal than in the others. The ordering stays the same as to which approaches are fastest, but the magnitude of how much slower each version is than the value class version changes dramatically. Under Graal, the mutable class version is almost as fast as the value class version instead of being almost a factor of two slower. Most impressive is that the second functional version is only a factor of two slower than the value class version instead of being four times slower. This is significant for two reasons. One is that it is the version written in the most idiomatic Scala style. The other is that this version literally does twice as much work in terms of distance calculations as the other versions. That means that we really can't expect it to do any better than being 2x as slow. The fact that it takes nearly twice as long to run means that using Graal there isn't a significant overhead to the functional approach the way there is using the older JVMs.

At the end of the table, we have the results for Scala Native. Unfortunately, it is clear that Scala Native is not yet ready for running with performance-critical numerical code. One result that stands out is that the value class version is not the fastest. Indeed, it runs at a speed roughly equal to the immutable class and 2.5x slower than the mutable class. I assume that this means that the value class optimizations have not yet been implemented in Scala Native. As to why even the mutable class version is more than 2x slower than Graal and at least 50% slower than the other VMs is a bit puzzling to me as I did the timing using a release build. I expect that this is something that will improve over time. Scala Native is still in the very early stages, and there is a lot of room for the project to grow.


Comparison to C++

As before, I also ran a test comparing these Scala results to C++ code compiled with the GNU compiler using the -Ofast flag. This uses a simpler test with the value class technique. You can see in the table below that the Scala code is performing about 15% slower than C++ in all of the environments except Scala Native, which is several times slower. Given the results above indicating that Scala Native isn't nicely optimizing value classes yet, this result for Scala Native isn't surprising.

EnvironmentAverage Time [s]
g++3.29
Oracle JDK 8-1913.88
Oracle JDK 113.77
OpenJDK 103.82
GraalVM 1.0.0-rc73.82
Scala Native21.6


Conclusions

For me, there are two main takeaway messages from this. The first is that while Scala Native holds the longterm potential to give Scala a higher performance platform for running computationally intensive jobs, it isn't there yet. A lot more work needs to go into optimization to get it to reach its potential of competing with other natively compiled languages. I firmly believe that it can get there and is moving in that direction, but it isn't ready yet.

On the other hand, these results indicate to me that if you are programming Scala, you should strongly consider using Graal, even if you are doing numeric work. Based on presentations at Scala Days 2018 in New York, I know that this is the case for non-numeric codes, but at the time Graal wasn't emitting SIMD instructions, so it wasn't clear if it would compare well to the old C2 hot-spot optimizer. These results show that regardless of style, Graal is at least as performant as the other JVM options and that in some cases it is much faster. Perhaps most significantly, the functional 2 style, which is written in a much more idiomatic style for Scala, is more than 2x as fast with Graal was with the other JVMs. I should also note that Graal still allows me to run graphical applications like my SwifVis2 plotting package, so there isn't any loss of overall functionality.

Going forward, I want to test the performance of more complex n-body simulations using trees and also look at multithreaded performance to see how Graal compares for those. Scala Native is still only single threaded for pure Scala code, so it will likely be left out of those tests.

GraalVM native images are another feature that I would really like to explore, but there are some challenges in building them from Scala code that I did take the time to overcome for this post.

Monday, August 6, 2018

Why is JavaFX so slow? SwiftVis2 and Speed Problems with JavaFX 2D Canvas

For the last year or so I've been working on a plotting package called SwiftVis2. There are a number of goals to this project, but a big one is to provide a solid plotting package for Scala that can be used with Spark projects in Scala and which can draw plots with a large number of points. My own personal use case for this is plotting ring simulations often involving millions of particles, each of which is a point in a scatter chart. The figure below shows an example of this made with SwiftVis2 using the Swing/Java2D renderer with 8.9 million particles.

SwiftVis2 plot of ring simulation using Swing renderer.

This particular use case pretty much precludes a lot of browser-based plotting libraries that seem to be popular these days as converting 8.9 million data points to JSON for plotting in JavaScript simply isn't a feasible thing to do for both memory and speed reasons.

As the name SwiftVis2 implies, there is an earlier SwiftVis plotting program. It is a GUI based program written in Java. One of the things that excited me about the upgrade was the ability to use JavaFX instead of Java2D. My understanding is that one of the main reasons for building JavaFX new from the ground up was to take better advantage of graphics cards, and I was really hoping that a JavaFX based rendering engine would outperform one based on the older Java2D library.

I didn't really test this until I was writing up a paper on SwiftVis2 for CSCE'18 and I did performance tests against some other plotting packages. In particular, I compared to Breeze-Viz, which is a Scala wrapper for JFreeChart. JFreeChart uses Swing and Java2D, so I was really hoping that SwiftVis2 would be faster. At the time, I only had a renderer that used JavaFX in SwiftVis2, and I was really disappointed when Breeze-Viz turned out to run roughly twice as fast on a plot like the one above. Tests of NSLP, a different Scala plotting package using Swing/Java2D, showed that it also ran at roughly twice the speed of SwiftVis2 using JavaFX.

Some searching on the internet showed me that there were known issues with drawing too many elements at once on a JavaFX canvas because of the queuing. So I enhanced my renderer to batch the drawing. This has a nice side effect that users can see the plot draw incrementally, so they know their program isn't frozen, but it didn't help the overall speed at all.

Since SwifVis2 was written to allow multiple types of renderers, I went ahead and wrote a Swing renderer that uses Java2D, just to see what the performance was like. The results, shown in the following table, were pretty astounding to me. Note that these times were for drawing plots like the one above. It is also worth noting that upgrading my graphics driver improves the performance for JavaFX more than it did for the Swing based libraries, but even with new drivers, JavaFX is still slower.


PackageRender Time for 8.9 million Points
Breeze-Viz80 secs
SwiftVis2 with JavaFX108 secs
SwiftVis2 with Swing/Java2D13 secs

Keep in mind here that the two SwiftVis2 options are running the exact same code for everything except the final drawing as the only difference is which subclass of my Renderer trait is being used. While I do feel a certain amount of happiness in the fact that SwiftVis2 using Swing is significantly faster than the Breeze-Viz wrapper for JFreeChart, I'm still astounded that JavaFX is nearly 10x slower than Swing/Java2D.

Not only is JavaFX slower, it does an inferior job of antialiasing when drawing circles that are sub-pixel in size. The following figure shows the plot created using the same data and the JavaFX renderer. The higher saturation is obvious. The rendering with Swing/Java2D is the more accurate of the two.

Ring simulation plot made using the JavaFX renderer. Note that the points are sub-pixel and this renderer over-saturates the drawing.
To me, this seems like a serious failing on the part of JavaFX. JavaFX is actually harder to use than Swing, and I always forgave that based on the assumption that special handling was needed to work with the GPU, but that the tradeoff would be significantly improved performance. I can only hope that whatever ails JavaFX in this regard can be fixed and that at some point in the future, JavaFX rendering can match the performance promise that comes with completely rebuilding a GUI/graphics library from the ground up.

Also, in case anyone is wondering why I'm bothering to create yet another plotting package, I will note that SwiftVis2 looks significantly better than JFreeChart, especially in the default behavior. For comparison, the figure below shows the output of the most basic invocation of Breeze-Viz for this plot, which is comparable to the above plots for SwiftVis2. Even if the range on the y-axis is adjusted, it still generally doesn't look as good as the SwiftVis2 output, especially in regards to axis labels. This is the reason I never really got into using JFreeChart in the past. SwiftVis2 is still in the early stages of development, but it already does a better job with my primary use cases.

This is the same figure as those above made with default settings using Breeze-Viz instead of SwiftVis2.
You can find the code for my basic performance tests on GitHub. The data file is not on GitHub because it is rather large. You can find a copy of it on my web space.

Sunday, June 24, 2018

Supporting Scala Adoption in Colleges

Scala Days North America just finished, and one of the big surprises for me was how many talks focused on teaching Scala. The first day had roughly one talk on teaching Scala in every time block.


The topic of increasing the use of Scala in colleges also came up in the final panel (Presentation Video).

I think it is pretty clear that the companies using Scala feel a need to get more developers who either know the language or can transition into it quickly and easily. Large companies, like Twitter, can run their own training programs, but that is out of reach for most companies. What most companies need is for more colleges to at least introduce Scala and the idea of functional programming in their curricula.

The Current State of Affairs

The current reality of college CS education is that imperative OO is king. For roughly two decades, Java dominated the landscape of college CS courses, especially at the introductory level. In the last ~5 years, Python has made significant inroads, again, especially at the introductory level. Let's be honest, that is really a move in the wrong direction for those who want graduates that are prepared to work with Scala because not only is Python not generally taught in a functional way, the dynamic typing doesn't prepare students to think in the type-oriented manner that is ideal for Scala programming. (I have to note that this move to Python has been predicated on the idea that Python is a simpler language to learn, but an analysis of a large dataset of student exercise solutions for zyBooks indicates this might not actually be the case.)

It is also true that some aspects of functional programming, namely the use of higher-order functions, has moved into pretty much all major programming languages. However, it isn't at all clear that these concepts are currently appearing in college curricula. Part of the reason for this is that in nearly all cases, features added to languages late are more complex than those that were part of the original development. Yes, Java 8 has lambdas and you can use map and filter on Streams, but there is a lot of overhead, both syntactically and cognitively, that makes it much harder to teach to students in Java than it is in Scala. That overhead means professors are less likely to "get to it" during any particular course.

The bottom line is that the vast majority of students coming out of colleges today have never heard of Scala, nor have they ever been taught to think about problems in a functional way.

What Can You Do?

If you work for a company that uses Scala, it would probably benefit you greatly if this state of affairs could be changed so that you can hire kids straight out of college who are ready to move into your development environment. The question is, how do you do this? I have a few suggestions.

First, contact the CS departments at local colleges and/or your alma mater. Tell them the importance of Scala and functional programming to your organization and that you would love to hire graduates with certain skills. This has to be done in the right way though, so let's break it down a bit.

Who Should You Talk To?

My guess is that a lot of people will immediately wonder who they should be reaching out to, but I'm pretty sure it doesn't matter, as long as it is a faculty member in Computer Science. You can always start with the chair and ask them if there is someone else in the department that would be better to talk to, but no matter who you start with, you'll probably wind up being directed to the most applicable person.

What Types Of Schools?

I know that it will be tempting to focus on big schools with the idea that you can get more bang for the buck if you do something with the local state school that graduates 300+ CS majors a year. The problem here is that many of the stereotypes for big organizations not being agile apply to colleges as well. The bigger the school, the harder it likely is to create change. Smaller schools might not graduate as many students, but they are probably more adaptable and open to change. Departments with <10 faculty members might not crank out hundreds of majors, but they can probably produce a few very good ones each year that you would love to have on your team.

This doesn't mean you skip on the big schools. If you can convince your local state school, the payoffs are huge. I would just urge you to cast a wide net. Any school with a CS department is worth reaching out to.

How To Start The Conversation

It probably isn't the best idea to just go to a college and tell them that they are doing things wrong and that you have ideas on how they can do it better. One way to start a conversation would be to say that you are interested in talking about pedagogy and their curriculum to get a better idea of what they do and how they do it. However, an even better idea would be to initiate the conversation by offering to do something for them.

Most colleges will have some type of venue for outside entities to give talks to their majors. Ask if they have a seminar/colloquium that you might be able to speak at. I am the current organizer for the CS Colloquium at Trinity and I would love to have speakers from industry offer to come to give interesting talks to our majors (hint, hint). I'm quite certain that I'm not alone. This gives you a great venue to say all the things that I mention in the next section, both to faculty and students, efficiently.

If you have a local Scala Meetup, you might see if they would be willing to send out an invitation to their majors for that as well.

What Should You Say?

This is where things are a bit more tricky. Most colleges do not view themselves as vocational training. They don't just put things in their curricula because companies would like it, though they do feel some pressure to make sure their graduates have the skills needed to find employment. College curricula focus on fundamental concepts, not tools because we know that technology is ever-changing, and our students are going to have to continue learning throughout their careers. We have to build the foundation for that learning. So the key is to show them how the tools that you are using represent the paradigms of the future of the industry.

With this in mind, your goal isn't to convince them to teach Scala, at least not directly. There is a reason that your company uses Scala, and my guess is that you believe that the reasons your company chose Scala are generally indicative of the way that a lot of the industry needs to move. It might be that you see that data keeps growing in importance and size and that processing that data in a way that scales is critical for the future. You know that using a functional approach is key to writing these types of applications both today and in the future. You know that while the framework you are currently using probably won't be dominant in 10 years, the ideas behind how it works and how you use it to solve real problems will still be folded into the frameworks of the future.

I would say that your first goal is to establish that in the highly parallel and distributed environment we live in, the functional paradigm has moved from an academic play area into an essential real-world skill. You might also tell them why the pragmatic approach of Scala makes sense for bringing the functional paradigm into practical usage. Highlight how it impacts your use case and how you see that being a use case that will only grow with time.

Going back to the fact that colleges want to teach things that are foundational, I would point out how many other languages are following in the footsteps of Scala. This is key because it makes it clear that learning Scala isn't just learning one language. Instead, it is opening the door to where most older languages are evolving as well as where many new languages are going. Knowing Scala helps future-proof students in this regard, and Scala isn't just sitting still and getting older. It is a dynamic language with creators who are working to keep it on the cutting else of language development while maintaining an eye to the practical aspects of what makes a language usable in industry.

If you get to the point where they are convinced but aren't certain how to put Scala and functional programming into their curricula, point them to academics who have been using Scala for teaching. I would personally welcome all such inquiries. Just tell them to write to [email protected]. In the US both Konstantin Läufer and George K. Thiruvathukal at Loyola University Chicago have experience in teaching with Scala and have an interest in spreading the word. So does Brian Howard at DePauw University. While Martin has the most experience of anyone teaching Scala, personally, I'd rather he focus his time on Scala 3 development. Also, while Heather might be a freshly minted academic at CMU, she is definitely a Scala expert and her experience with the MOOCs means that she has taught more people Scala than almost anyone in the world.

Going Further

Giving a talk every so often is a nice way to let schools know what types of technologies you use and why you see them being important in the future. As such, it can provide a subtle way to influence the curriculum. However, if you are willing and able to put in a bit more time, you can have a more explicit impact on what is being taught by offering to teach a course as an adjunct professor the way Austin is doing at UT Dallas and the way Twitter does at CU Boulder.

The challenge with this approach is that not everyone is a naturally gifted teacher and it will be a fairly significant time sink. Technically, whoever does it will get paid, just not that much. (Teaching one class as an adjunct pays well under $10,000 dollars and is often as low as $3000.) The real question is whether your employer is willing and able to let you do this. I will note that for various reasons, a developer doing this who doesn't have a Master's degree will need to be reasonably senior to make it work at a lot of Universities.

The upside of teaching a course is that you can probably get into an upper division course where you have fairly complete control over the curriculum and you can teach exactly the topics that you would most like new hires to arrive with. Due to growing enrollments, the vast majority of colleges are likely quite open to having an adjunct help out, and many will be open to an interesting upper division offering as they can probably move faculty around to cover other courses. (At Trinity, we are looking for an adjunct for next fall and while they are currently scheduled for an introductory course, if someone wanted to come in and teach my Big Data course using Scala, I could certainly switch to the introductory course to make things work.) Of course, if you want to teach CS1/CS2 with Scala, I've got a bunch of materials you can use to help organize the class.

You might also see if the department has an advisory board. Having a seat on such a board will give your company insight into the inner workings of the department and how they think about the field while also giving you a venue to talk about what you value and where you see the future of the field going.

In The Meantime

Until other colleges catch on to the value of Scala, remember that at Trinity University we are teaching all of our students both Scala and how to think functionally, so let me know when you are looking for interns or junior devs.

Sunday, August 20, 2017

Want to make America Great? Go learn something new.

"Make America Great Again" is a slogan that all Americans are inevitably familiar with these days. Whether they love hearing it, or roll their eyes at it, everyone has heard it many time. While I am not opposed to the argument that America is currently great, the reality is that it doesn't feel very great for a lot of people because of the way our economy has changed. The US stock market has actually been on a very nice climb since around 2009. Nothing really special has been happening in 2017 other than perhaps we have a President who likes to claim credit for everything positive, even if he didn't have anything to do with it. The problem is that the gains from those stocks and the corporate profits that have driven them there aren't trickling down to a large fraction of the population.

Unfortunately, the "plan" from the Trump administration on how to fix things in America was largely to try to roll us back to the 1950s. He promised to bring back jobs in things like coal mining and manufacturing. Of course, those promises are all empty. You can't just roll back the clock and proclaim that the economy is going to work the way that it did decades ago. Things have changed. Technology has changed them.

There are a variety of reasons that lots of people feel left behind in the current US economy, but the one that really stands out to me is the skills gap. Technology has changed the economy, but not enough people have stayed up with technology to keep themselves relevant in the current job market. The manufacturing sector is an area often sited for people losing jobs, but the reality is that US manufacturing output is at an all time high. The thing is, the factories don't need as many workers, and the worker that they need are ones with higher level skills. This was highlighted in the article "In US, factory jobs are high-tech, but the workers are not".

As that article mentions, what US employees need is training, but US employers aren't prone to pay for it. Neither is the Republican leadership we currently have running most states as well as the federal government. There are some serious challenges with training everyone as well. A number of these were outlined in "Technology is setting us up for a training crisis".

The thing is, we really have two problems. The one we are feeling right now is that labor is losing out to capital in terms of employer spending, so money isn't trickling down, even when stock prices and company profits are really high. The other problem is that the lack of skilled people is setting up a really bad dynamic where the top companies become superstars. This problem was highlighted in the MIT Technology Review article, "It Pays to Be Smart". That article had the much more informative subtitle of , "Superstar companies are dominating the economy by exploiting a growing gap in digital competencies".

This article addresses the fact that for a number of years now, economists have worried about a stall in the growth of productivity. This is significant, productivity growth has been the key driver of economic growth and the increase in societal wealth for about as long as humans have had organized civilization. This slowing of productivity has also seemed very counter intuitive because during this time digital technology has grown by leaps and bounds and has changed our lives in many very fundamental ways. What "It Pays to Be Smart" discusses is a different analysis that looks only at the top performing companies in each segment of the economy. It turns out that the top companies still have great productivity growth. All the other companies are lagging behind though, and putting a drag on the economy. Why is that? Well, the top companies are doing a really good job of harnessing digital technology, while everyone else struggles to do so in an effective way.

Why is it that the top companies can use technology efficiently while most other companies struggle? To my mind the answer is once again the skills gap. In this case, a shortage of people with sufficient skills to really harness technology in a way that improves productivity. To put it another way, the small companies suck at technology because there are only so many people with the skills, and the big companies have the money to attract or buy out that small population of people. There are certainly other factors like the growth in the importance of data and the fact that large pools of data are worth a lot more than small pools of data (some of these are discussed by the Economist in "The world's most valuable resource is no longer oil, but data"), but it seems to me that the real challenge for small businesses that want to succeed is that they have to somehow attract and retain good people with high end skills who can bring the efficiencies of modern digital technology to bear. This would be a lot easier to do if there were more such people.

Based on this reasoning, the fundamental problem holding back our economy today isn't anything related to foreign competition or immigrants, it is a lack of people with the right skills to drive the economy forward. So if you believe that America has lost its greatness, and you want to actually do something the make America great again, go out and learn something new. Thanks to the multitude of online learning sites (see links below for some), it has never been easier or cheaper to pick up new skills. It does take effort, but if you really want to improve America, sitting around complaining about the loss of the jobs of old isn't going to do it. Putting in the effort to learn new skills yourself is the activity that I truly believe will have the greatest long term benefit, and unlike many other things, your learning is something you have control over.

Online Learning Sites

I'm listing some of the big ones here. If you think I missed one, let me know in the comments and I'll add it.

Sunday, June 18, 2017

How will technological unemployment impact birthrate?

This blog post is a short exploration of a thought that hit me about how birth rates might be impacted by the changes in employment caused by future technological change. I consider technological unemployment to be a significant issue as AI and robotics continue to become more capable. I'm not going to support that belief in this post, as that is many posts worth of material. I will assume that to be the case here, and briefly explore one impact of it that I hadn't thought of until recently.

Birth rates have been decreasing in the developed world so much that in many countries, the death rate is larger than the birth rate. According to the CIA World Fact book, there were 26 nations in 2014 for which this was the case. Japan (-1.8 net) and Germany (-3.1 net) are the ones most commonly mentioned, but there are many others. A nice treatment of this can be found at https://ourworldindata.org/fertility/. There are a variety of reasons for this. Historically, the first was probably lower infant mortality rates. The more significant ones to me though are the ones that center on the increase in women's rights. As women gain more control over their their reproductive rights, they generally choose to have fewer children. In developed nations we are also seeing a decline in marriage rates and more women entering the workforce. Both of these tend to further reduce the birth rate. It is the last factor mentioned that I want to focus on, as it is the one most related to technological unemployment.

It is worth taking a few sentences to explain why these thoughts popped into my head. I often debate the future impacts of technology with a friend who we will just call by his last name, McLane. He doesn't believe that technological unemployment will be an issue, and one of the many reasons he has sited is articles about there not being enough people to work in countries that have low population growth. He is correct that there are a number of countries providing incentives for people to have kids because they are worried about not having enough people to care for the elderly and keep the economy moving forward. Given the variety of reasons for lower fertility rates, I can indeed imagine a day when the global population goes into a tailspin simply because there are no kids born.

So let's assume that McLane is wrong and we get technological unemployment and have to do something, possibly a basic income, so that a large fraction of the population can live a life with dignity is a world where they can't do anything to earn a living wage because everything they have the skills to do can be done better and cheaper by machines. What happens to the birth rate at that point?

I'm going to by optimistic and assume that in such a world, women have at least equal power in society relative to men. I also assume that women maintain general control over their reproductive rights, and generally have all the freedoms of men. (I can actually imagine women having more power at that time as there are indications that technological unemployment so far has hit men harder than women, so if more women have jobs than men for any period of time, they could be the gender with greater social power.) Still, you get a different dynamic when people aren't chasing careers. Right now both men and women often put much of their personal lives on hold until they are well established in a career. If we have a social structure where machines are doing so much of the work that people live comfortable lives without having careers, that changes.

One of the standard arguments that I hear "against" technological employment is that people get lots of meaning from their jobs. I use quotes, because that isn't really an argument for why it won't happen, just something that could cause problems if it does. Personally, I think that people can find meaning in lots of other things, like hobbies or families and friends, if they don't need to work to live a comfortable life. Let's be honest, lots of people today have rather meaningless jobs and they already get much more of the meaning for their lives from other activities.

Given that, I can see a scenario where the birth rate goes up, and people decide to spend a lot more of their time raising children and focusing on family life. Of course, I can also imagine a world where everyone is just watching things on their screens, they have little physical contact with other humans, and the birth rate continues to sink. I'm wondering what other people think? If technology takes away the need to chase a career, will birth rates in the developed world start to rise again, or will we be so far into a situation where we enjoy our technology that the idea of having kids and a family will continue to decline?

Wednesday, April 19, 2017

Does Big Data Make Centralized Control the Better Option?

I grew up in the waning years of the Cold War. I remember doing a project in High School on the possibility and repercussions of nuclear war. I got to see the fall of the Berlin Wall and many other events that went along with the crumbling of the world's other superpower as their system of government, communism, proved to be inferior to the combination of democracy and free market capitalism.

One of the standard arguments I recall for why communism failed was that it was inefficient and wasteful, particularly when it came to the allocation of resources. A standard example was that local farmers knew better when to plant and harvest than the central government, but that in the USSR they had to do those things when the federal government told them to. By contrast, in the US, farmers would make decisions based on their experience to try to maximize their yields. They did that because that was how they made money. A year of bad yields would be an economic hardship in the best case and possibly lead to losing the farm.

A thought that struck me recently is that this particular argument against communism might not apply anymore. The reason is that we have moved into the era of "big data". In fact, I wonder if centralized control might not have the advantage these days, assuming that centralized control does things in an optimal way.

To illustrate this, consider the farming example I just gave. The local farmers have a lot of experience they can fall back on, but these days I'm guessing that even better decisions for allocating resources could be made using a data set that included crop yields across an entire nation for many years, combined with detailed weather data during that same time and other potentially relevant information.

Consider this article about a company that uses predictive AI to place orders in advance to improve shipping efficiency and reduce the number of returns. It is just one of many examples of how computer systems can now make decisions much better than humans are capable of because they can pull in much larger quantities of data.

Of course, the key here isn't centralized control, and I'm sure that many people would argue that centralized control still fails because it lacks the motivation to do well that individuals have. In that regard, this isn't a call to switch to communism. Instead, the key here is the data, and this is an area where government can play a role. Even better than one big centralized decision making process is a system where everyone has access to all the relevant data, and they can all try out various ways to process it to make optimal decisions.

In that regard, I think that government could play a role by making data available and helping different groups to make their data accessible and consistently formatted so that it can be more broadly used. This doesn't just apply to crops with data on weather, planting dates, harvesting dates, and yields by location, this could also be useful for a lot of data related to health including air and water quality and potentially consumption habits. I don't have a full mental image of what exactly this looks like in a broad sense, and I can clearly see challenges related to privacy issues. Still, I feel that we need to push for making more data generally available so that individuals and companies can utilize it to make better decisions. Regardless of where the control comes from, the way forward for efficient decision making is clearly availability of useful data.

Wednesday, January 4, 2017

Performance of N-Body Simulation Styles in Scala

I spend a lot of time doing N-body simulations. My working simulation code is written in C++. The reason for that is simple, speed. It is a serious number crunching code and even written in C++ runs can take months to complete. I would love to do these simulations in some other language that I enjoyed writing code in more, but I'm not willing to give up all that much speed in order to make that happen. A number of years ago a student and I explored the performance of a Java version of the code, and we found that we could get performance within 10-20% of the C++ version, but doing so required doing a number of things in a way that made the Java almost as challenging to work with as the C++. These days, I'd really love to be able to convert my code to Scala, as I find it much more enjoyable to code in, but I'd have to be certain that I can get performance close that what I'm getting in C++. My guess is that in the end I'll wind up updating my C++ code to use features of C++11, 14, and 17, but I still like to explore the possibilities of using a different language.

There are lots of ways that you can write N-body simulations in Scala. Some of them take advantage of the expressivity of the language to produce shorter code that is easier to read and generally less bug prone. However, there are reasons to believe that those approaches might also tend to be slower. For a while I've been meaning to explore different approaches to see how much of an impact it really has on the speed of execution. While there aren't all that many people who do N-body simulations, my expectation is that the results of these performance tests will apply to many other numerically intensive applications.

Array of Mutable Classes

In C++, one represents particles with a struct and makes sequences of these using arrays, vectors, or valarrays. Everything is generally mutable. This first version of the code in Scala is written to mirror the C++ code. Of course, there is a huge difference in the memory model. When you make a sequence of a simple struct in C++, the entire memory for all elements is laid out in a single contiguous block. The way this code is written in Scala, the array is an array of references to instances of MutableBody and each of those contains two references to instances of MVect3. This matters because caching is vitally important on modern computers and contiguous memory accesses have much better caching performance. In this simple example, odds are that because all of the objects are allocated in order at the beginning of the program, they wind up being contiguous in the heap, but there are still multiple levels of references that have to be followed, which are likely to impose some performance cost.

This code really does look similar to what the C++ code for this type of thing looks like, unless you happen to have a library set up for SIMD vector operations or expression templates for the 3-D vectors. Unfortunately, that isn't a good thing. Not only is this code verbose, I can speak from experience that it is bug prone. It is just too easy to have one of those x, y, or z fields typed wrong and the resulting error is very difficult to diagnose and track down.

Array of Immutable Classes

Given the capabilities of Scala, a much "better" way to write this is to use immutable classes and add symbolic methods to a 3-D vector class so that you can simplify the math expressions and make them less error prone. Of course, this change means that you are doing a lot of memory allocation making small objects for all of the 3-D vectors.

Clearly this code is shorter and more readable. It is still using the same basic approach with a mutable array for storage, but the ability to use mathematical operators on vectors improves the code significantly. It is worth noting that in the mutable version you can make operations like +=, but given the math that is being done, they aren't all that useful unless you are making temporary values to store scaled versions and such.

Arrays with Value Class Wrapper

Now we are going to get a bit more interesting and produce a version that tries to give us contiguous memory without hurting program readability and without making lots of temporaries. Basically, this is an attempt to make a version of the code that has the best chance of being speed competitive with the C++ code while still being reasonably readable. The real key areas of this code are lines 5-29 where we make a number of arrays of doubles and then define a value class. The value class is a newer feature of the Scala language that is stack allocated and has no more overhead than a primitive, while allowing you to have nice OO syntax. Instances of it are stored as just primitives on the stack and the methods wind up being static methods in the JVM, so there is no memory or speed overhead of object allocation. In order to use a value class, we have to put the whole thing in an object declaration and not a class declaration. Value classes can't go inside of other classes because then they would be non-static inner classes and that would mean they have the overhead of keeping a reference to their enclosing instance. They could go at the top level, outside of all other declarations, but then they wouldn't have access to the arrays. Putting both the arrays and the value class in an object declaration gives us what we need here.

Unfortunately, value classes are limited in Scala because they aren't really supported by the JVM at this time. The main limitation is that they can only have a single primitive value in them. There are plans to add value types to Java and the JVM. Once that happens, which probably will be Java 10, then you could make classes with three doubles in them that are stack allocated and arrays of them would be memory contiguous, giving you behavior much more similar to C++. Until that time, code like that above can give you a reasonable approximation.

Purely Functional Approaches

Of course, the real key to Scala is the functional aspects, something that none of the above examples took advantage of. The version that used immutable classes still mutated an array that stored references to those objects. I tried two different approaches to doing things functionally that are shown in the code below. The first approach is pretty much just like our immutable class version, but it uses a Vector and the updated method. Even without speed tests, we can feel confident that this is not an efficient approach, but it is pretty much required if we want to do the minimum number of distance calculations where each pair is only visited once. The second version, called forSim2, does twice as many distance calculations, but this allows it to be written in a much more succinct manner because we produce a full sequence with the accelerations to each particle without every mutating them. As it happens, this is the approach that we need to take if we parallelize the code to prevent race conditions. I will explore that more in a future blog post.

This code uses the ImmutableBody and the immutable Vect3 class shown earlier. That makes the code more compact than the non-mutable versions. Those unfamiliar with fold methods might find forSim2 difficult to read, but in general a fold can be used to replace a loop that would accumulate a value in a mutable way. Note that like the previous code, this is put in an object, but for completely different reasons. This code isn't stateful, so there is no reason to have a class. The state is created by the initBodies method and then passed into the forSim methods, which return modified values. This code never stores the state locally. This is a significant difference from the earlier code and should probably be expected for a functional approach.

Timing Results

So how do these different versions perform? We tested all of them using Scala 2.12.1 both with no optimization flags and with -opt:l:classpath -opt:closure-invocations -opt:simplify-jumps -opt:copy-propagation -opt:redundant-casts -opt:box-unbox -opt:nullness-tracking -opt:inline-global. The timing results are shown in the following table.

OptimizationStyleAverage Time [s]rms [s]
NoneValue Class0.7040.004
Mutable Class0.9040.006
Immutable Class1.2660.020
Functional 18.330.09
Functional 22.3430.048
OptimizedValue Class0.6790.001
Mutable Class0.6820.001
Immutable Class1.1910.026
Functional 18.550.11
Functional 22.3250.023

It is good to see that in many ways, my intuitions as to which versions would be fastest proved correct. Without optimization, the version that uses a value class and contiguous blocks of memory in arrays is clearly the fastest, followed by the mutable classes, then the immutable classes, then the purely functional approaches at the end. Note that the second functional approach takes nearly twice as long as the immutable class version. Since it is doing twice as many distance calculations, this indicates that the overhead of being functional is small and the speed difference is basically due to reorganizing the math. I will note again that this reorganization is also required for parallelization, so I would speculate that in parallelized versions, the second functional approach will be comparable to the immutable class version. The first functional approach is a clear loser. Calling updated so frequently on the Vector type is clearly inefficient. I will note though that I tried changing Vector to Array, and the resulting code was so slow for the first functional approach that I never saw it complete. On the other hand, using an array, even if it isn't mutated, seemed to slightly improve performance for the second functional approach.

It is also interesting to note that optimization benefited the mutable class version significantly, making it nearly comparable to the value class version.

Comparison to C++

We will close out this post with a comparison to C++. After all, we can't know if it is reasonable to use Scala for numerical workloads if we don't explore how it compares. I wrote a version of the code in C++ that was basically an adaptation of the mutable class version using a std::valarray for storage. I compiled it using g++ with the -Ofast compiler flag. I also made a separate application in Scala that only ran the value class version and had both the C++ and the Scala time 1000 steps of integration. The result was that the C++ generally completed in 6.22 seconds and the Scala completed in 6.88. Both were consistent across multiple runs. So the final result is a 10% advantage for C++. That isn't a huge advantage, and my guess is that most applications would be happy to live with that for the advantages that they get in coding in Scala over C++ in terms of developer productivity. Granted, that was with g++ I expect that using the Intel compiler or some other highly optimizing, and not free or open source, compiler will produce even faster executables for C++.

For me, the bottom line is that if you hear people say that Scala (or Java/JVM) are slow, there are good odds that they haven't really tested it compared to other languages/platforms. For straight number crunching applications, feel free to point them to this blog post to show them that the difference in speed really doesn't have to be all that large. My guess is that using macros, it would also be possible to create Scala code that has the readability of the immutable Vect3 class with symbolic methods, but without the speed cost for allocating lots of temporary values. This would be akin to expression templates for that purpose in C++. Maybe I'll take a stab at creating that code and write a blog about it as well. I also look forward to the Java 10 JVM adding value types, as I believe that they have the potential to significantly improve the performance of numerical code in all JVM languages.

Other Code

Here is the code for C++ and the timing in Scala in case anyone wants to be able to run everything.

C++ Code

Scala Timing