You are here: Home torrent No.6 Software Development in CMSI

Software Development in CMSI

The K computer has been made available for general use since last September. Besides the development of algorithms, the development of community code is also essential for making effective use of this leading edge computer. In this issue we look at the current state of software development in the field of computational materials science, as well as introducing initiatives to develop open source programs that have already started within CMSI, and the order-N first-principles software CONQUEST being developed under an international joint project, aiming todefine the direction that future software development should take.

Improved computer performance and advances in algorithms

In the 66 years between the emergence of the world’s first general-purpose electronic computer, ENIAC, in 1946, the computational performance of leading edge supercomputers has progressed by a factor of more than 50 trillion. Since the TOP500 list (http://top500.org/) has started in 1993, performance has improved by a factor of about 270,000 in just 19 years. Thanks to the overwhelming evolution of these computers, it is becoming possible to conduct highly accurate simulations of large-scale systems that were unthinkable in the past, such as first-principles calculations with systems of one million atoms, or molecular dynamics simulations with systems of ten million atoms. In condensed matter physics, molecular science, and materials science, which are the research field of CMSI, various other approaches have come to be used including quantum chemical calculation, Monte Carlo method, exact diagonalization of matrices, as well as multiscale and hybrid approaches combining several methods. One of the main characteristics of simulations in “computational materials science,”which covers these fields, is the focus on the equilibrium state or steady state of systems with strong corre ations, rather than the temporal evolution of systems that can be obtained from a specific initial state. Or when researchers are considering nonequilibrium states, very long simulations from femtoseconds to picoseconds and nanoseconds are required. Furthermore, the systems to be simulated are not necessarily three-dimensional. In simulations of strongly correlated quantum many-body systems, non-local operations are often used as a means to incorporate the effects of quantum correlation accurately and efficiently, or to arrive at equilibrium state with as few iterations as possible. Consequently, huge computation and network communication capacity are required.

To make full use of leading edge computers like the K computer and to make advances in next generation materials science, development of the performance of computers themselves is not sufficient. In order to efficiently solve the fundamental equations of materials science, advances in computational science theory̶in other words algorithms̶have played an extremely important role. Some representative examples are the fast Fourier transform (FFT), the Monte Carlo method and other means of quickly solving given equations. In addition, the development of the approximation techniques that can reduce the computational cost without spoiling the accuracy, such as the fast multipole method (Torrent No.2), the divide-and-conquer method (Torrent No.5) and the order-N method mentioned in this issue as well as the methods designed for modern computer architecture such as the real-space method (Torrent No.3) should also be mentioned.

The current state of the computational materials science

In the field of computational materials science, new methods are being proposed daily, and researchers are constantly testing them in applications. It is not unusual that just a single researcher or a very small group pursue the whole cycle of identifying an effective model for explaining a phenomenon seen in an actual material, developing algorithms based on various ideas, programming, conducting simulations, analyzing the results, and providing feedback to the effective model. In fields close to experiments and applications, gigantic application programs such as GAUSSIAN for quantum chemistry calculations ( see the column on page 5 ) are widely used, but this is the exception, and many research groups use their own individual programs. When users within a group move on, for example when graduate students graduate, the existence of the programs will be forgotten totally.

The existence of many individual codes means that the input parameters used for calculation and the output of simulation results are frequently done in different formats, too. As a result, it is difficult to save and search data in a standardized way. In addition, this is becoming a significant obstacle to analysis and coupled calculations (hybrid calculations combining two or more different application programs) using tools developed by other researchers.
Meanwhile, the cost of developing and upgrading applications is beginning to increase. This applies for example to hybrid parallelization to utilize massively parallel computers and multicore architecture which are currently the mainstream, cache tuning, and optimization of communications for a specific network topology. In order to make effective use of large-scale computers such as the K computer, it is becoming imperative to undertake software development in an organized fashion that goes beyond individuals. Furthermore, cooperation with specialists in computer science and numeric calculation in addition to the field of computational science is becoming increasingly important.

What “publishing software”means?

The Division Researcher program run by CMSI is different from the ordinary doctoral research fellow program (see TorrentNo.3). Rather than staying within a specific research group, the mission of participants is to promote the field of computational materials science overall, in other words, developing advanced component technology and establishing the basic technology crucial for next-generation computational materials science. As such, it is a totally unprecedented and unique program. Correspondingly, in order to foster programs themselves in the field as a whole, it is necessary to promote publishing software.

The open source programs referred to here are programs with source code that users can download from the web and so on with comparatively lax terms of use (license). Users can compile and execute the programs themselves, and publish the results of simulations. Open source programs (in the narrow sense) do not include programs that require a joint research agreement, or that are only distributed in binary format. Although the terms of use differ slightly according to the program, users may be able to alter the source code or incorporate their own program in accordance with the calculation required, or they may be able to make modifications to the code and redistribute it. Well known examples are the Linux OS, the Emacs editor, and the GCC compiler.

There are various types of users in the field of computational materials science. Theorists may use the programs as platforms for verifying new theories and approximations, or for trying out new ideas, while experimental researchers and researchers in corporations will use them for interpreting experimental results and modeling real materials. For computational science experts, open source software should be useful for verifying the output results of their own programs, benchmarking computing speed, and as modules for coupled calculations. Meanwhile, the requirements for open source software differ according to the use to which they are put. They include ease of use, richness of feature, speed, accuracy, and so on (Fig.1).

The necessity of community code

Software where the source code is simply made available on the web cannot be called an open source program. In order for it to be used by a wide range of users, first the code must work correctly, and it must have an easy-to-use user interface such as a GUI, as well as documentation in Japanese or English such as manuals for installing and using it, and tutorials for typical users. In addition, user support using email, bulletin boards and so on, and regular courses for users are essential. It is also necessary to respond to requests for bug fixes and new features on an ongoing basis.

To date there is little open source software from Japan in the field of computational materials science. The main reasons for this are the trouble and cost involved in making software freely available. In order to lower the barriers to providing and maintaining open source programs, CMSI is working continuously to provide tools for creating sites for managing source code and providing programs and documentation, as well as improving courses for users and the website which explains the applications available (known as “AppliCafé”). At the same time, we are working to improve the skills of Division Researchers.

Conversely, what is the point of making software open source, and what are the benefits for developers? Releasing programs allows developers to receive feedback from lots of people, allowing them to make their programs more reliable. By accepting code from collaborators, it is also possible to enhance the software by adding new functions. In addition, communication with users frequently leads to the development of new algorithms, new joint research, and the creation of communities. Even if a superior algorithm is developed, there is almost no chance for that particular approach to be evaluated unless it is implemented in the form of software.

This trend has been particularly pronounced recently in the field of quantum chemistry calculation, but in future, it is likely to spread to the whole computational science in general. Ideally, code for new approaches to calculation will be made available as open source, and at the same time, the program will grow in the community as basic software for the implementation of new techniques. If this sort of “community code” can be fostered, it will greatly benefit not only the developers but also the whole community. Providing open source programs is certainly not just a service for other people, it is nothing short of leaving a record of your achievements (science) in the form of software.

In Japan, the K computer has just been made available for public use from the end of September 2012, but already feasibility studies have been started towards construction of next generation exa-scale supercomputers. We will provide application programs critical to the field of computational materials science that are tuned for massively parallel architecture, as well as work proactively with researchers in the computer science field to design the next generation of supercomputers accordingly. Without question, it will become increasingly important to pursue the consolidation of community code for computational materials science in order to establish a productive cycle.

(CMSI public relations subcommittee representative: Synge Todo, the Institute for Solid State Physics, the University of Tokyo)