Vol.4 No.1 2011
41/78
Research paper : Development and release of a spectral database for organic compounds (T. Saito et al.)−38−Synthesiology - English edition Vol.4 No.1 (2011) data released through the Web. There are many more spectra that had not been released to the public. The total number of bottles of chemical reagents exceeds 39,000.Among these, more than 10,000 reagents have been offered free of charge by Tokyo Chemical Industry Co. Ltd., from which the most abundant chemicals have been supplied. Therefore, although the selection of chemical reagents partially has followed this company’s policy in development of reagents, it has indirectly reflected our user’s needs. In the development of a new material by chemical synthesis or other methods at research and development departments, a starting material is in many cases a commercial chemical reagent. Thus, the support we have been given by the company has been valuable to us.Since 2001, our strategy for a spectral collection has focused on pesticides and deleterious substances. Collection of much spectral information on regulated chemicals is an important function of public research institutes like AIST. Thus the number of spectra collected for such substances has been slowly increasing. Recently, the concern for the safety of food has been increasing, which enhances the need for such information. It is important that our strategy focus on collecting the spectral data of pesticides and regulatory chemicals.3.2 Selection of visual data form (digital data)The most important decision for the selection of data format was made at the early stage of this database. Although it is not surprising now, this database chose to collect all spectral information in digital coordination format on a computer since the activity was started. In the 1970’s, spectral information was collected more often in a data book format. Although it was recognized that the handling of spectral data would become easier by making them digital, limitation of the computer memory prevented this. Due to the limitation of the computer, digitization of the data often resulted in the loss of information. As a result, the analog data format recorded on paper was still the majority[1] at that time. For example, NMR data was composed of several tens of thousand points of data. Digitalizing such data about thirty years ago must have been a big decision because of the limitations in disk and memory capacity. An achievement of such a system would have been extremely difficult if there had been no mainframe computer operated at the former Agency of Industrial Science and Technology at that time. Under this condition, management of the spectral database required not only to concentrate on accumulating spectral data, but also to find a creative way to minimize the data points. This system was the world’s first 1H NMR spectral database with digital coordinate data of a collected spectrum[8]. We adopted compression of data size by collecting data which represented only peak areas. For the 13C NMR, values of peak positions, their intensities and their peak width at half high were recorded. From these data, all spectra were reconstructed with the assumption of all peaks as the Lorenz function. For the IR and Raman, coordinate data of the spectral points were collected. For the MS, the mass numbers and the signal intensities were collected. For ESR, each point of spectral data was digitalized. Some of the data was reconstructed from paper data by using a curve reader. 1H NMR has a capability of spectrum simulation by using chemical shifts and spin-spin couplings[9]. After AIST was established, all digital data including the peaks and noise have been collected for 13C NMR and 1H NMR. Users can recognize the strength of the peak signal compared to the noise level. In 1997, this database became disclosed to the public through the Web by the former Agency of Industrial Science and Technology. If all the data were not collected digitally, there would have been problems for converting old analog data, and many data might have had to be recollected.3.3 Balance between quality and quantity of data; stick to the high quality dataThe spectral database consists of data acquired, evaluated and compiled in our institute with some exceptions in ESR and 1H NMR spectra. This is the most reliable way to keep the quality of the spectral data. This makes quality of the data reliable. However, on the contrary, the number of accumulated spectral data has become limited. To cover a wide variety of data is one of the important elements of a database. How we balance the two different concepts for the data collection, i.e. quantity and quality of spectral data makes for serious argument. Our first decision was to take a strategy of collecting reliable standard data. On this basis, the quantity of the data would be increased as a result of data accumulation over a long period of time.The criteria for keeping quality and reliability of spectral data and for accumulation of data were established. For example, tetramethylsilane (TMS) was not only used as a chemical shift standard for NMR spectra but also its line width was used for a criterion of spectrum resolution. When the peak resolution of the TMS peak was sharper than the criterion, the resolution of the spectrum was determined well even if the resolution of the peak from a compound showed poor resolution. This was considered as the nature of the compound giving such a spectrum, and not caused by the bad experimental condition. For IR, no interference noise, no water peaks, or no surge in baseline were the criteria. A criterion for each spectral data evaluation was established by respective researchers in charge of the spectrum.3.4 Policy of data registrationThis database only compiles unique spectral data. In other words, when several spectra for a compound have been acquired with identical conditions, only the spectrum of the best quality is compiled and released to the public. For MS, a direct sample injection method was adopted for the measurement. Therefore, each compound had a unique
元のページ