Grid Computing: Potential Opportunities For Pakistan: Paper No. 674
Grid Computing: Potential Opportunities For Pakistan: Paper No. 674
674
362
Ali
363
364
Ali
and developing grid applications. Adoption of grid computing is growing, both in the public and private sectors. In short it can be said the grid computing is the answer to all the questions that scientific development faces today, due to lack of resources and grid initiatives and adoption can have a strong positive economic impact along with increased information access and data sharing for Pakistan.
365
I.
INTRODUCTION
Ever since, the creation of universe man has been eager to solve the mysteries of this universe. Mans restless nature forced him to solve the unsolved, unfold the folded and conquer mysteries. This eagerness and restlessness led to what is termed today as technological developments and advancements. From the Stone Age to the present era man has made leaps of advancement in the fields of science and technology. What once seemed as impossible has now been achieved or about to be achieved. Abacus, once considered to be the most efficient and promising machine, is now rarely in use anywhere. Charles Babbages Analytical and Difference engine proved to be the first in the field of machine development. They might well be the first but they were not last by any means, history witnessed loads and loads of developments after that which include, Mark I (72 accumulators each holding a 23 digit decimal number), ENIAC (20 accumulators, each holding 10 decimal digits, and could perform 5000 arithmetic operations per second), EDVAC (could store 1024 44-bit words of data), ATLAS, TITAN so on and so forth. Presently, what promises to fulfill the needs and requirements of present day world are machines called supercomputers. With computing machines intervening every activity of daily life the world is moving towards an era where everybody would require access to all the resources, everywhere and all the time, in order to carry out their tasks efficiently. One such example is of CMS (Compact Muon Solenoid) and LHC (Large Hadron Collider [2] ) at CERN (European Organization of Nuclear Research), Switzerland. CMS and LHC carry out nuclear physics experiment which results in petabytes of output everyday. To fulfill this demand almost unlimited storage and computational resources are needed. To solve this issue and issues of the similar nature the birth of a new wave of computing environment termed as Grid Computing is being witnessed. Grid computing is the coordinated use of a large number of distributed servers and storage acting as one computer. With the grid, your business no longer needs to worry about spikes in demand and excess capacity costs-computing power is available when you need it. Grid computing delivers a higher quality of service at a lower cost with increased flexibility, and is the best way to virtualize and provide all IT resources. Grids are built with low-cost modular components; no additional hardware is required to realize grid environments. It can be surely said that Grid computing enable high performance super computing environments, which super computers promise to enable, making use of the existing resources. Considering the preliminary stages of this new technology, some experts postulate the concept of Grid technology arriving in three waves: Firstly in academic research communities, followed by corporations which is beginning to happen now. The ultimate goal however is the third wave, which will see the
366
Ali
technology coalesce to create a processing network analogous to the Web, and called simply the Grid.
II.
TYPES OF GRID
Grid computing can be used in a variety of ways to address various kinds of application requirements. Often, grids are categorized by the type of solutions that they best address. The three primary types of grids are summarized below.
Computation Grids
Computational Grids aim at merging and sharing computational resources. There can be several ways in which computational resources can be merged. Any job/application can be remotely executed using computational grids. An application/job can be logically partitioned and partitioned modules can be run parallel on computational grids. Lastly, there can be scenarios where a single job/application is required to run more than once to achieve the desired results, computational grids can be used to execute such applications parallel on multiple computational grid nodes.
Data Grids
Data grids as the name suggest aims at enabling sharing of storage systems. Each machine in the grid provides some storage space to be used on the grid. An individual database or file can span several storage devices and machines. These resources appear to the user as a huge virtual storage device. Provision of storage space is managed by agreeing on various policies between the grid nodes and grid management. Data grids can be used to store huge amounts of data which otherwise would have been not possible to store on a single machine. One such example is of distributed databases used at CMS, CERN.
Scavenging grid
A scavenging grid is most commonly used with large numbers of desktop machines. Machines are scavenged for available CPU cycles and other resources. Owners of the desktop machines are usually given control over when their resources are available to participate in the grid.
III.
COMPONENTS OF GRID
As in each and every form the purpose of the grids are to enable resource sharing between existing resources, therefore, grid consists of some basic components (shown in Figure 1) that are similar in every grid enabled environment. Some of these main components are explained below:
367
On the basis of requirements Grid Computing jobs are evaluated, and then allocated to the respective resources for execution (using Execution Service). This will involve complex workflow management and data movement activities to occur on a regular basis. Scheduling jobs demand some basic tasks such as assuring resource reservation before the jobs are scheduled and managing resources while jobs are running. Furthermore, estimating best possible runtime for the jobs, agreements and policy management for resource sharing (using Replica Management Service), gathering partial results to form final result (using Data Collection Service) and rescheduling jobs in case of unsatisfactory results or failures are also carried out by schedulers and planners. Collection of scattered data required for a specific job is also managed by schedulers using the Replica Management Service (shown in Figure 1) Monitoring job status while it is being executed is also the responsibility of schedulers in some frameworks.
Job Estimators
Job Estimators are used to predict the resource consumption of a job so that other applications, particularly the Steering Service and the Scheduler may know in advance, how much resources/time a particular action might take. The Steering service uses estimators to give feedback to users about the estimated runtime, queue time and transfer time of their jobs. Similarly, the Grid Scheduler will contact estimators while selecting optimum site for job execution. Main functionalities or types of estimators are:
Runtime Estimator
Runtime estimator computes the estimated runtime of an input task (the atomic component of a job) at a single execution site; this estimated runtime is then used by Steering service and Scheduler during the selection of optimum site for execution of that task
368
Ali
369
status feedback to other services in Interactive Grid Analysis Environment while operating in close interaction with execution service.
The emerging Web services and Web service portal standards have played a major significant role in portal development. JClarens, for example, is a Java-based framework, for hosting Web services (XML-RPC [5], SOAP [6]) and Grid services. JClarens provides web services for an interactive analysis environment to dynamically access and analyze the tremendous amount of data scattered across various locations. The use of XML-RPC based Web Services enabled JClarens to be a language neutral server and demonstrated interoperability with its Python variant. Grid portals yield strong emphasis on security and virtual organization management (VOM). This shall provide a common platform to support development of larger, more flexible framework with future aims to integrate it with a loosely coupled, decentralized, and autonomous framework for Grid enabled Analysis Environment (GAE) shown in Figure 1.
IV.
APPLICATIONS OF GRID
Grid computing offers a wide variety of applications. As the applications tend to grow more complex and resource consuming and users get more eager for efficient and reliable data access grid computing emerges as the solution. Following are some of the main domains where grid computing can prove to be beneficial [7].
370
Ali
Grid Computing efforts have realized that these challenges include huge amounts of data analysis, data movement, data caching, and data mining. In addition to the complexity of processing data, there needs to be additional requirements surrounding data security, secure data access, secure storage, privacy, and highly flexible integration. Another area that requires attention is the querying of nonstandard data formats and accessing data assets across complex global networks. The above requirements presented by life sciences require a Grid Computing infrastructure to properly manage data storage, providing access to the data, and all while performing complex analysis respective to the data. The Grid Computing systems can provide a common infrastructure for data access, and at the same time, provide secure data access mechanisms while processing the data. Today, life sciences utilize the Grid Computing systems to execute sequence comparison algorithms and enable molecular modeling using the above-collected secured data. This now provides the Life Sciences sector the ability to afford world-class information analysis respective to this discussion, while at the same time providing faster response times and far more accurate results.
Research Collaboration
371
Research-oriented organizations and universities practicing in advanced research collaboration areas require the analysis of tremendous amounts of data. Some examples of such projects are subatomic particle and high energy physics experiments, remote sensing sources for earth simulation and modeling, and analysis of the human genome sequence. These virtual organizations engaged in research collaboration activities generate petabytes of data and require tremendous amounts of storage space and thousands of computing processors (for example CMS [1] and LHC projects at CERN, Switzerland). Researchers in these fields must share data, computational processors, and hardware instrumentation such as telescopes and advanced testing equipment. Most of these resources are pertaining to data-intensive processing, and are widely dispersed over a large geographical area. The Grid Computing discipline provides mechanisms for resource sharing by forming one or more virtual organizations providing specific sharing capabilities. Such virtual organizations are constituted to resolve specific research problems with a wide range of participants from different regions of the world. This formation of dynamic virtual organizations provides capabilities to dynamically add and delete virtual organization participants, manage the on-demand sharing of resources, plus provisioning of a common and integrated secure framework for data interchange and access.
Grid computing systems provide a wide range of capabilities that address the above kinds of analysis and modeling activities. These advanced types of solutions also provide complex job schedulers and resource managers to deal with computing power requirements. This enables automobile manufacturers (as an example) to
372
Ali
shorten analysis and design times, all while minimizing both capital expenditures and operational expenditures.
Collaborative Games
There are collaborative types of Grid Computing disciplines that are involving emerging technologies to support online games, while utilizing on-demand provisioning of computation-intensive resources, such as computers and storage networks. These resources are selected based on the requirements, often involving aspects such as volume of traffic and number of players, rather than centralized servers and other fixed resources. These on-demand-driven games provide a flexible approach with a reduced up-front cost on hardware and software resources. We can imagine that these games use an increasing number of computing resources with an increase in the number of concurrent players and a decrease in resource usage with a lesser number of players. Grid computing gaming environments are capable of supporting such virtualized environments for enabling collaborative gaming.
Data Rendering
Data rendering is becoming a major part of game and visual development as more and more powerful graphics emerge high storage and computational resources are required to render such data. Grid computing provides the solution to such environments by providing enormous storage and computational resources and parallel processing.
Government
The Grid Computing environments in government focus on providing coordinated access to massive amounts of data held across various agencies in a government. This provides faster access to solve critical problems, such as emergency situations, and other normal activities. These key environments provide more efficient decision making with less turnaround time. Grid Computing enables the creation of virtual organizations, including many participants from various governmental agencies (e.g., state and federal, local or country, etc.). This is necessary in order to provide the data needed for government functions, in a real-time manner, while performing the analysis on the data to detect the solution aspects of the specific problems being addressed. The formation of virtual organizations, and the respective elements of security, is most challenging due to the high levels of security in government and the very complex requirements.
V. 1.
Grid computing can simplify the way students and faculty members access education and computing resources. It promises to enable the exchange of information on research, scientific and education projects. The Grid can be used to
373
store data of large number of universities and research organization providing easy and quick access to all the potential users. Universities will be connected to a common virtual hub that automatically finds the appropriate application resources, from life sciences research to video courses and e-learning. Grid computing will get rid of the tedious manual processes to which students and researchers have become accustomed as it will involve pooling the computing power of hundreds of servers--both existing ones and new ones over a network to run programs more reliably--this will save on development costs. Currently, universities in Pakistan develop applications that are incompatible from campus to campus and could only be shared across the university network on a limited basis. This results in time-consuming and costly duplication of development efforts. With the grid, the universities can organize the vast computational and informational resources of its entire higher educational system into a centralized, Internet-based hub to perform a wide range of complex tasks instantaneously. Take an example of China Grid project [8] undertaken by IBM and Chinas Ministry of Education. They are using grid technology to enable universities across the country to collaborate on research, scientific and education projects. This is one of the worlds largest implementations of grid computing which takes untapped application service, data and computing resources from different computing systems and makes them available where and when theyre needed, resulting in a single, virtual system. The China Education and Research Grid the most ambitious grid project by a government to date will link more than 200,000 students and faculty members at nearly a hundred universities across China when the project is completed. The grid will be capable of performing more than six teraflops, or trillions of calculations per second, and eventually will be capable of more than 15 trillion calculations per second.
2.
There has been an exponential growth of data in the field of engineering and life sciences that has created a need to manage this enormous amount of data to extract knowledge for new discoveries. This increases the collaborative challenge of dealing with redundant data sets and information lifecycle management, bandwidth issues, and shared resources. Grid computing allows for more efficient virtualization and provisioning of available resources. It enables virtual teams to collaborate on complex tasks beyond organizational boundaries, over secure networks. In Pakistan, grid computing has its applications in the area of drug discovery and treatment. Instead of each research organization having its own data a grid can be shared by National Institute of Health (NIH), Agha Khan Medical College and research laboratories, pharmaceutical companies, KRL and other such organizations. Grid Technology can be used in understanding and analyzing
374
Ali
complex biological and genetic structures and discovering mysteries of chemical and molecular sciences. We know that Pakistan is not fully developed in terms of medical resources and personnel. There are a lot of areas where either there is no medical facility or the available facility lacks resources. Using grid computing we can setup nodes at remote stations which will provide the people with every possible information and awareness and will keep a check on demand and supply of drugs etc to that area. Furthermore, to enhance medical analysis, to provide doctors access to all patients records, to enable doctors to share their views and ideas with their peers, we can setup grid networks along all medical hospitals and research centers which will improve doctors efficiencies and will increase the probability of proper diagnosis and treatment.
3.
Government Sector
The government organizations all over Pakistan have there own databases of records. A single grid shared by all the government organizations with Nation wide data repositories can be used for a coordinated access to massive amounts of data held across various agencies in a government like NADRA, FIA, and Newspaper Agencies. This will result in more efficient decision making with less turnaround time and decrease Data Redundancy. The Creation of virtual organizations will enable coordination of participants from various governmental agencies in order to provide the data needed for government functions, in a real-time manner.
3.1
NADRA mainly deals with the registration projects of Pakistan like National ID Cards, Computerized Passports and Car Theft Control Project etc. No matter what is the title of the project if it deals with nationwide registration there is no doubt that a large amount of storage resources will always be required. Not to forget that this requirement will not be static but continue to rise with every increase in the entity it is dealing with. Therefore, we can safely say that all the projects being done at NADRA require huge storage resources, not only huge storage resources but efficient and reliable data management for the data stored. With the existing database models and computing resources, there surely can be scenarios where either of the two fails. Grid computing on the other hand addresses this issue with its inbuilt capability to store distributed data. Not only does grid provides the storage resources but it also offers efficient, secure and reliable data management.
3.2
Law Enforcing Agencies and Intelligence Agencies play an important role in maintaining peace and security in a society and identifying potential threats to the peace and security of a society. With the ever increasing rate of population and increasing number of criminals it has become difficult for the law enforcing agencies to keep a check on every possible threat or to even identify all the possible threats.
375
Efficient, reliable and redundant records storage can help a lot in improving the efficiency of law enforcing agencies. Data grids maintaining records of all people and computational grids providing efficient analysis, speedy search, intelligent heuristics and reliable results from the data grids.
3.3
Newspaper, Print Media and electronic media daily provide loads of information and records. These records vanish with time, even if maintained, the retrieval of information when required is time consuming and hectic. These records if maintained properly can be a lot of value to the country as a history of records, as a guide to future decisions, as a cultural heritage and a lot more.
3.4
Presently, in Pakistan a common citizen has to go through a lot of trouble to get required information or get his or her work done through a governemt organization. The information provided by these institutions is very limited and often incomplete. The people working in an organization are unaware of other departments in the same organization. This results in a lot of confusion and waste of time to carry out a small piece of work. To avoid all this useless effort, a grid can be designed to offer broad community impact by delivering local government information to citizens. It will improve the visibility of government information and services to citizens, while improving customer service at a reduced cost.
4.
Agriculture
Agriculture is the main occupation in Pakistan and a major source of earning. There is a need to computerize this sector and educate the farmers. An Agriculture grid can be used to aggregate, share and disseminate information of importance and interest to the farmers, agriculture workers and officials in ways that enhances the total agriculture development. This can would facilitate a total technological transformation consisting of changes in materials inputs, complimentary farming techniques, storage technology, and research findings provided by experts, R&D, supply and marketing agencies. This will involve active networking and distributed knowledge management capabilities involving agricultural scientists, students, farmers, farm workers, marketing, agro-processing industry, administrators and decision makers A similar project KISSAN - Karshaka Information Systems, Services And Networking [9] has been undertaken by Department of Agriculture (Govt. of Kerala, India), Kerala Agriculture University and IIITM-K. It is a farmer centric integrated distributed information system which consists of advanced data warehouses, query management system, GIS-based information, land utilization pattern, soil information, crop management, weather forecasting etc. It will empower the various agencies - Krishi Bhavans, Farm Information Bureau, Plantations, Disaster and Distress Management Programmes, Agro-Institutions, education - both formal and non-formal to participate and have a coherent view of
376
Ali
the total information system in ways that will add substantial value and thereby enhance agriculture productivity. Specifically in Pakistan, some of the potential projects include: Data grid nodes on each agricultural sector providing information about the statistical data, expected output, means of transport, agricultural storage etc of the sector. Monitoring nodes, monitoring above mentioned grid nodes, to analyze and provide the status of the crops, potential threat alarms and necessary guidance. Statistical nodes, gathering data from all nodes performing statistical analysis for the whole country and managing history for the future analysis.
In short, grids can offer a wide variety of applications for the agricultural sector which will help in improving the agricultural output, hence, benefiting countrys economy.
5.
Financial trading requires fast, accurate decision-making in order to optimize investment profiles and manage associated risks. Grid computing can radically accelerate these analytical processes reaching end results far more rapidly than in conventional environments. Advantages of distributed environments for financial analysis operations include the ability to run several trade scenarios in parallel, dividing applications to execute independent tasks concurrently and applying additional processing power, through clusters or by faster processors. Therefore, all the Stock Exchanges across Pakistan can form a part of a grid. This will provide real-time access to the current and historical market data and faster response times to user queries by Market Share holders and International Investors. It will also be useful for the Ministry of Finance for complex financial modeling. Business and Finance is therefore, one such industry where grids offer a wide variety of applications. In the recent era business models and finance analysis have expanded by leaps in terms of complexity. Where precise and accurate analyses have become necessary, their precise and accurate execution has become more and more resource consuming and complex. Grid computing offers variety of solutions to business and finance industry with both data and computational grids. As most of the business and finance models are dependent on the historical data and a lot many input factors, we can easily predict that the time is not too far when single computers with their limited storage and processing capabilities will fail to execute efficient or in cases any analyses. Furthermore, as the storage demand increases, also will increase the demand of processing resources. Grid computing, in one computing paradigm, provides solution
377
to all these problems. To provide a brief overview following are some of the potential domains: Storage resources all around the country (stock exchanges, finance ministries, business research centers, etc) can combine to form a single data grid, which will enable: o o o o Unlimited storage resources Maximum available data for history based business models Data Redundancy Data Sharing
Computational resources all around the country (stock exchanges, finance ministries, business research centers, etc) can combine to form a single computational grid, which will enable: o o o o Efficient execution of complex models and algorithms. Collaborative ideas, discussions and results. Data Sharing. Parallel execution of complex business models.
6.
Computer graphics, computer animations, animated movies and computer games are a few industries which have recently evolved but witnessed rapid development. Computer animations and animated movies have become an essential part of television and film industries. Today, we dont see a single movie or ad without any sort of animation. These animations require a lot of computational resources before they can be viewed as animations. The data forming these animations is immense and needs to be rendered to build a movie with best graphic options. Demand for resources depends on the length of movie, complexity of graphics, variation of coding encoding schemes, but in general the resources required are mostly beyond the scope of personal computers or even if they are capable they consume a lot of time. Grids tackle this problem by offering developers a mechanism where they can submit their job of animation rendering to the grid and let it manage all the execution and storage. Gaming is another industry that has seen loads of development in the recent era. The new trend in gaming is online/network gaming where players scattered geographically play with each other using a network. Most of the games due to their heavy graphics require these networks to offer high data rates and resources. Here also, grid computing can be beneficial in providing these high resource nodes. Gaming industry if developed can generate a lot of revenue for the country.
378
Ali
7.
This collaborative Grid can allow hospitals to share information, ultimately allowing for improved health care for patients through collaboration among medical professionals. Currently, the medical professionals in Pakistan are not much accustomed to computers and their advantages. They still rely on the paperwork. Therefore, training has to be carried out inorder to develop this infrastucture. This Grid can connect hospitals all over Pakistan especially the rural area where there is a shortage of experienced doctors. The doctors and other staff in these remote areas can collaborate easily and share resources through the Grid. This type of grid has already been proposed by IBM [10] for the Economic Development Grid Project that is being implemented across USA. IBM is helping communities in the U.S. and globally to expand economic development in innovative ways by embracing Grid computing and open-standards-based technologies. Communities are able to harness their immense unused computational power, increase collaboration between governments, educational and business institutions, to utilize new and existing technologies in innovative ways, by providing access to information, IT resources and computing capabilities previously unavailable.
VI.
CONCLUSION
To summarize the discussion we can say that Grid Computing has all the potential to solve issues that might arise due to lack of computational resources and performance issues that application demanding large scale resources face. Grid computing tools and grid enabled environments are in their early phases but the rate at which the development is progressing we can surely say that the time is not too far away when grid enabled resource sharing will become a world wide standard for high performance computing like WWW (World Wide Web). Grid computing initiatives have been taken by many countries across the world for their data and computational needs as well as collaborating on research and academia. Furthermore, grid offers a wide range of research domains that promise to become key technologies of the future. Grid computing is already providing tremendous benefit to public and private institutions worldwide; therefore, it is about time that Pakistan joins hand with the rest of the world by adopting and contributing actively to the grid technology.
VII. REFERENCES
1. 2. 3. Compact Muon Solenoid Outreach Activities (http://cmsinfo.cern.ch/) Larhe Hadron homepage/) Collider (http://lhc-new-homepage.web.cern.ch/lhc-new-
Conrad Steenberg et al, The Clarens Web Services Architecture, in Proceedings of Conference of High Energy Physics, 2004, paper no. MONT008. Conrad Steenberg et al, The Clarens Web Services Architecture, in Proceedings of Conference of High Energy Physics, 2004, paper no. TUCT005.
4.
379
5. 6. 7. 8. 9. 10.
XML RPC (www.xmlrpc.com/) SOAP (http://ws.apache.org/soap/) Grid Computing, Joshy Joseph, Craig Fellenstein, Prentice Hall, Apr 16, 2004 China Grid Project (http://www-1.ibm.com/grid/grid_press/pr_1013.shtml) KISSAN Grid (www.keralakarshakan.net) Grid Today, Daily News and Information for the Global grid Community, May 18, 2005 (http://news.taborcommunications.com/msgget.jsp? mid=384710)
380
Ali