Vol.4 No.1 2011
46/78
Research paper : Development and release of a spectral database for organic compounds (T. Saito et al.)−43−Synthesiology - English edition Vol.4 No.1 (2011) It is frequently emphasized in the paper that the database is “open to the public free of charge”. I think releasing free useful information to the public that can be used widely as public goods is a crucial role for the public research institute. I think, as a paper of Synthesiology, it is beneficial if an independent chapter discussing these points is created. Answer (Takeshi Saito)About the access log, a figure which summarizes countries of accessing users and domains such as “.ac” and “.co” of domestic users is added.We agree it is beneficial to discuss the topic of “open to the public free of charge”. We created a subchapter, 4.2, and discussed the meaning of free services. 3 Persons and expenses for the databaseQuestion (Akira Ono)I would like to ask about the cost of development and data release of the spectral database for organic compounds of AIST. Would you estimate roughly costs and human resources spent in the development of hardware and software, sample purchases, acquisition of spectra, data maintenance and their quality control, and user support for the database, respectively?Answer (Takeshi Saito)Between years 2001 and 2007, the strategy of this database operations and development, and evaluation of the spectral data were charged to two researchers. Four contract staff were employed for acquisition of MS, IR and NMR spectrum data and maintaining chemical dictionary data. Each person was also responsible for maintaining the disclosing data. All the disclosure processes which opened the data through the Web, were maintained by the system engineers (SE) of the Research Information Data Base (RIO-DB) of AIST. Estimation of the total work load per year as a researcher was 0.25 person for the database system construction, 0.25 person for the spectral measurements, 0.8 person for the quality assurance, 0.25 person for the data maintenance and user support. Looking at the budget, 200,000 yen for constructing the database hardware, 1.5 million yen for the software construction, 250,000 yen for obtaining chemical compounds, 1.8 million yen for the consumable items and the maintenance of instrumentations, and 700,000 yen for the data maintenances were roughly spent each year. Besides this, we asked the SE to do much work, but we cannot estimate the cost of work done by them.4 Balance of comprehensiveness, reliability and urgencyQuestion (Akira Ono)(1) It has been described in this paper that it is important to make a balance between comprehensiveness and reliability of data for database construction. I understood that the primary objective of this database was to focus on compiling and offering standard spectral data to help identify compounds that were widely used. I also understood that you took a policy to limit information and measurements on compounds to the range which your group (AIST) could grasp and control. My understanding of this point was that you took a policy of taking the reliability of the data over the comprehensiveness so if the achievement of the comprehensiveness was postponed, it was considered unavoidable (or took a policy of “time would solve the problem of comprehensiveness”). Thirty years after the start of this activity, the database has reached a sufficient number of spectra (30,000 compounds). Is this a correct understanding?(2) I think the spectral data for special compounds such as pesticides and deleterious substances are requested urgently by our society. It seems to me to be important to construct a spectral database for these and release the data to the public. Is there any organization in the world which releases such data? I would like to ask whether the current situation of such a database is satisfactory to the users.(3) If it is not satisfactory, the current policy of AIST may not be speedy enough to cover a large number of spectra in a short period of time. I think spectral information of pesticides and deleterious substances need to be covered more comprehensively and rapidly even if you lose some reliability. I would like to ask how the authors think about this point.Answer (Takeshi Saito)(1) It is true that as a result of having given the priority to reliability over comprehensiveness, it was not possible to increase the quantity of data rapidly. As a result of having actively compiled the data for a long period of time, the database now contains more than 100,000 spectra from more than 30,000 compiled compounds. We think the compounds that are widely used have been covered by now.When limiting it to NMR, increasing the data bulk and speeding up the data release had become difficult and the work load had almost reached the limit of capacity of our human resources and instrumentations. We not only acquired the spectra but also assigned them for data release. We think another reason obstructing comprehensiveness of spectral data other than NMR was budget that was too limited to collect enough compounds for the data acquisition.(2) The mass spectral database of medicine, poison, pesticides, and contaminants is offered from the John Wiley & Sons Co. as a set of CD-ROM and a paper book format, and the IR spectral database of pesticides and environmental materials is offered from the Bio-Rad Co. I do not think a database of spectral data of compounds classified as deleterious substances exists because it is a classification based on Japanese law. We believe, although there is no such classifications, many databases cover such compounds in their data entry. However, as we think the situation is not satisfactory to the users, this database will keep collecting such spectral data.(3) There is a limit in our current resources for compiling urgent data quicker just by cutting down the reliability of the spectrum. For achieving this, we think a project with priority on acquiring, evaluating and releasing the spectra of such compounds that have high urgency is useful. Another way to achieve this is to collect spectral data from people all over the world as an open data recruit system. To make this possible, we have to build up at least a standard spectral data format, data evaluation criterion, and data submission protocol for our database. With instructions covering our requirements, we should be able to collect spectral data with a certain quality much more quickly.5 Digital data format and copyrightQuestion (Akira Ono)I understood that all data were managed digitally in the development site while these were converted into analog format for the data open to the public so that the users from the Web were unable to access the digital data. Is this correctly understood?The reason for a user not being able to access the digital data is because the spectral data acquired by AIST is copyrighted. When a third party requests to use the data, they have to pay a royalty. Is this correctly understood?Answer (Takeshi Saito)When the “digital data” in the question is “data consist of coordinate point information” and the “analog data” is a “GIF image data”, then your indication is correct. The main reason for the users not being able to reach the digital data format is not because of the rights or the royalties of copyright that we may receive but is for the protection of the copyright. This is based on the protection of SDBS from unjust
元のページ