|
|
Selection and Application of Spectral Data Preprocessing Strategy |
SUN Jia-hao,ZHANG Wei,SHI Jian-qin,LI Yan-kun |
Department of Environmental Science and Engineering, North China Electric Power University, Baoding, Hebei 071003, China |
|
|
Abstract Spectral pre-processing is an important link in the establishment of spectral measurement metrology model, and at present, many spectral preprocessing methods have appeared. Based on the purpose and effect of the pretreatment, the origin, developments, theories, characterizes and actual application examples of common spectral preprocessing methods in scatter correction, scaling, smoothing, baseline correction and noise filtering areas were reviewed. According to the problems existing in the application of preprocessing methods, the principle, developments and characteristics of the ensemble strategy of preprocessing methods were discussed, and its research trend and development prospect were analyzed, for providing basis and references for the selection of spectral preprocessing methods and strategies.
|
Received: 29 December 2022
Published: 22 August 2023
|
|
|
|
|
[5] |
李艳坤, 董汝南, 张进, 等. 光谱数据解析中的变量筛选方法[J]. 光谱学与光谱分析, 2021, 41(11): 3331-3338.
|
[6] |
Engel J, Gerretzen J, Szymanska E, et al. Breaking with trends in pre-processing?[J]. Trends in Analytical chemistry, 2013, 50: 96-106.
|
[9] |
曲岩, 宦克为, 安保林, 等. 主动式双波长红外激光测温标定实验研究[J].计量学报, 2021, 42(2): 137-143.
|
[10] |
章海亮, 罗微, 刘雪梅, 等. 应用遗传算法结合连续投影算法近红外光谱检测土壤有机质研究[J]. 光谱学与光谱分析, 2017, 37 (2): 584-587.
|
[1] |
Parastar H, Jalali-Heravi M, Tauler R. Comprehensive two-dimensional gas chromatography (GC×GC) retention time shift correction and modeling using bilinear peak alignment, correlation optimized shifting and multivariate curve resolution[J]. Chemometrics and Intelligent Laboratory Systems, 2012, 117: 80-91.
|
[3] |
Ma X P, Pang J F, Dong R N, et al. Rapid prediction of multiple wine quality parameters using Infrared spectroscopy coupling with chemometric methods[J]. Journal of Food Composition and Analysis, 2020, 91: 103509.
|
[7] |
Yun Y H, Li H D, Deng B C, et al. An overview of variable selection methods in multivariate analysis of near-infrared spectra[J]. Trends in Analytical Chemistry, 2019, 113: 102-115.
|
|
Li Y, Ma Y C, Liu M, et al. Combination of near-infrared spectroscopy and partial least squares discriminant analysis detecting the quality of Panax notoginseng[J]. Journal of Food Safety and Quality, 2022, 13(12): 3923-3929.
|
[32] |
阚相成, 李耀翔, 王立海, 等. 基于光谱预处理的低温水曲柳原木含水率检测[J]. 中南林业科技大学学报, 2022(11): 154-163.
|
[50] |
孙志兴, 赵忠盖, 刘飞. 堆叠监督自动编码器的近红外光谱建模[J]. 光谱学与光谱分析, 2022, 42(3): 749-756.
|
|
Li Y K, Dong R N, Zhang J. Variable Selection Methods in Spectral Data Analysis, Spectroscopy and Spectral Analysis[J]. Spectroscopy and Spectral Analysis, 2021, 41(11): 3331-3338.
|
[11] |
Gerretzen J, Szymanska E, Jansen J, et al. Simple and Effective Way for Data Preprocessing Selection Based on Design of Experiments[J]. Analytical Chemistry, 2015, 87(24): 12096.
|
[12] |
Afseth N K, Kohler A. Extended multiplicative signal correction in vibrational spectroscopy, a tutorial[J]. Chemometrics and Intelligent Laboratory Systems, 2012, 117: 92-99.
|
[14] |
Zhang Y, Wang X, Wang Ch, et al. Discrimination of Infected Silkworm Chrysalises using Near-Infrared Spectroscopy Combined with Multivariate Analysis during the Cultivation of Cordyceps militaris[J]. Journal of Applied Spectroscopy, 2021, 88(1): 187-193.
|
[17] |
Martens H, Nielsen J P, Engelsen S B. Light Scattering and Light Absorbance Separated by Extended Multiplicative Signal Correction. Application to Near-Infrared Transmission Analysis of Powder Mixtures[J]. Analytical Chemistry, 2003, 75(3): 394-404.
|
[19] |
Isaksson T, Kowalski B, Piece-wise multiplicative scatter correction applied to near-infrared diffuse transmittance data from meat products[J]. Applied Spectroscopy, 1993, 47(6): 702-709.
|
[21] |
Tormod N, Tomas I, Bruce K. Locally weighted regression and scatter correction for near-infrared reflectance data[J]. Analytical Chemistry, 2002, 62(7): 664-673.
|
[22] |
Zamora-Rojas E, Garrido-Varo A, Van den Berg F, et al. Evaluation of a new local modelling approach for large and heterogeneous NIRS data sets[J]. Chemometrics and Intelligent Laboratory Systems, 2010, 101(2): 87-94.
|
[24] |
Guo Q, Wu W, Massart D L. The robust normal variate transform for pattern recognition with near-infrared data[J]. Analytica Chimica Acta, 1999, 382(1): 87-103.
|
[2] |
Zhou Y Q, You L Z, Zi H, et al. Determination of pore size distribution in tight gas sandstones based on Bayesian regularization neural network with MICP, NMR and petrophysical logs[J]. Journal of Natural Gas Science and Engineering, 2022, 100: 104468.
|
[4] |
Casale M, Sinelli N, Oliveri P, et al. Chemometrical strategies for feature selection and data compression applied to NIR and MIR spectra of extra virgin olive oils for cultivar identification[J]. Talanta, 2010, 80(5): 1832-1837.
|
[8] |
李颖, 马雨辰, 刘萌, 等. 近红外光谱技术结合偏最小二乘判别分析检测三七品质[J]. 食品安全质量检测学报, 2022, 13(12): 3923-3929.
|
|
Qu Y, Huan K W, An B L, et al. Study on Dual-Wavelength Unfrared Laser Thermometry Calibration Experiments[J]. Acta Metrologica Sinica, 2021, 42(2): 137-143.
|
|
Zhang H L, Luo W, Liu X M, et al. Measurement of Soil Organic Matter with Near Infrared Spectroscopy Combined with Genetic Algorithm and Successive Projection Algorithm[J]. Spectroscopy and Spectral Analysis, 2017, 37(2): 584-587.
|
[15] |
Kaur H, Künnemeyer R, McGlone A. Correction of Temperature Variation with Independent Water Samples to Predict Soluble Solids Content of Kiwifruit Juice Using NIR Spectroscopy[J]. Molecules, 2022, 27(2): 504.
|
[18] |
Gallagher N B, Blake T A, Gassman P L, et al. Multivariate Curve Resolution Applied to Infrared Reflection Measurements of Soil Contaminated with an Organophosphorus Analyte[J]. Applied Spectroscopy, 2006, 60(7): 713-722.
|
[20] |
Andersson C A. Direct orthogonalization[J]. Chemometrics and Intelligent Laboratory Systems, 1999, 47(1): 51-63.
|
[25] |
Rabatel G, Marini F, Walczak B, et al. VSN: Variable sorting for normalization[J]. Journal of Chemometrics, 2020, 34(2): 1-16.
|
[27] |
Roger J M, Boulet J C, Zeaiter M, et al. 3.01-Pre-processing Methods[M]. Comprehensive Chemometrics (Second Edition), Elsevier, 2020, 1-75.
|
[29] |
张甜甜, 李兵, 蔡贵民, 等. 国产新型高密度光栅光谱仪数据处理方法研究[J]. 光谱学与光谱分析, 2019, 39(8): 2651-2656.
|
[37] |
黄秀, 康嘉诚, 王淇, 等. 基于盲源分离的有机物混合信号特征提取与解析[J]. 计量学报, 2023, 44(4): 645-652.
|
[13] |
Ilari J L, Martens H, Isaksson T. Determination of Particle Size in Powders by Scatter Correction in Diffuse Near-Infrared Reflectance[J]. Applied Spectroscopy, 1988, 42(5): 722-728.
|
[16] |
Helland I S, Ns T, Isaksson T. Related versions of the multiplicative scatter correction method for preprocessing spectroscopic data[J]. Chemometrics and Intelligent Laboratory Systems, 1995, 29(2): 233-241.
|
[23] |
Barnes R J, Dhanoa M S, Lister S J. Correction to the Description of Standard Normal Variate (SNV) and De-Trend (DT) Transformations in Practical Spectroscopy with Applications in Food and Beverage Analysis-2nd Edition[J]. Journal of Near Infrared Spectroscopy, 1993, 1(3): 185-186.
|
[26] |
庞佳烽, 汤谌, 李艳坤, 等. 中红外光谱联合模式识别鉴别奶粉中三聚氰胺[J], 光谱学与光谱分析, 2020, 40(10): 3235-3240.
|
[30] |
Chen H Z, Pan T, Chen J M, et al. Waveband selection for NIR spectroscopy analysis of soil organic matter based on SG smoothing and MWPLS methods[J]. Chemometrics and Intelligent Laboratory Systems, 2011, 107(1): 139-146.
|
[31] |
杜一平, 潘铁英, 张玉兰. 化学计量学应用[M]. 北京: 化学工业出版社, 2008.
|
|
Kan X C, Li Y X, Wang L H, et al. Moisture content detection of Fraxinus mandshurica logs at low temperature based on different spectrum pretreatments[J]. Journal of Central South University of Forestry & Technology, 2022(11): 154-163.
|
[33] |
Savc M, Potocnik B. Combinational illumination estimation method based on image-specific PCA filters and support vector regression[J]. Machine Vision and Applications, 2018, 29(1): 1-9.
|
[35] |
Bouveresse D J R, Benabid H, Rutledge D N. Independent component analysis as a pretreatment method for parallel factor analysis to eliminate artefacts from multiway data[J]. Analytica Chimica Acta, 2007, 589(2): 216-224.
|
[38] |
Coifman R R, Wickerhauser M V. Entropy-based algorithms for best basis selection[J]. IEEE Trans Information Theory, 1992, 38(2): 713-718.
|
[39] |
Grossmann A, Morlet J. Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape[J]. SIAM Journal on Mathematical Analysis, 2006, 15(4): 723-736.
|
[41] |
Tavassoli N, Tsai W, Bicho P. Multivariate classification of pulp NIR spectra for end-product properties using discrete wavelet transform with orthogonal signal correction[J]. Analytical Methods, 2014, 6(22): 8906-8914.
|
|
Pang J F, Tang C, Li Y K, et al. Identification of Melamine in Milk Powder by Mid-Infrared Spectroscopy Combined With Pattern Recognition Method[J]. Spectroscopy and Spectral Analysis, 2022, 40(10): 3235-3240.
|
[28] |
Frank D, Alfred R, Gtz S, et al. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics[J]. Analytical chemistry, 2006, 78(13): 4281-4290.
|
|
Zhang T T, Li B, Cai G M, et al. Study on Spectral Data Processing Methods of New Type High-Density Grating Spectrometer Made in China[J]. Spectroscopy and Spectral Analysis, 2019, 39(8): 2651-2656.
|
[34] |
Shen H C, Geng Y R, Ni H F, et al. Across different instruments about tobacco quantitative analysis model of NIR spectroscopy based on transfer learning[J]. RSC Advances, 2022, 12(50): 32641-32651.
|
[36] |
Shi J Y, Wang Y Y, Li Z H, et al. Characterization of invisible symptoms caused by early phosphorus deficiency in cucumber plants using near-infrared hyperspectral imaging technology[J]. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy. 2022, 267(2): 120540.
|
|
Huang X, Kang J C, Wang Q, et al. Feature extraction and analysis of organic mixture signal based on blind source separation[J]. Acta Metrologica Sinica, 2023, 44(4): 645-652.
|
[40] |
Wen S Y, Shi N, Lu J W, et al. Continuous Wavelet Transform and Back Propagation Neural Network for Condition Monitoring Chlorophyll Fluorescence Parameters Fv/Fm of Rice Leaves[J]. Agriculture, 2022, 12(8): 1197.
|
[42] |
Zhang M, Guo J M, Ma C Y, et al. An Effective Prediction Approach for Moisture Content of Tea Leaves Based on Discrete Wavelet Transforms and Bootstrap Soft Shrinkage Algorithm[J]. Applied Science-Basel, 2020, 10(14): 4839.
|
[43] |
Ali F, Kabir M, Arif M. DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space[J]. Chemometrics and Intelligent Laboratory Systems, 2018, 182: 21-30.
|
[45] |
Mishra P, Biancolillo A, Roger J M, et al. New data preprocessing trends based on ensemble of multiple preprocessing techniques[J]. Trends in Analytical Chemistry, 2020, 132: 116045.
|
[47] |
Bian X H, Wang K Y, Tan E X, et al. A selective ensemble preprocessing strategy for near-infrared spectral quantitative analysis of complex samples[J]. Chemometrics and Intelligent Laboratory Systems, 2020, 197(C): 103916.
|
[44] |
Wang H P, Chen P, Dai J W, et al. Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues[J]. Trends in Analytical Chemistry, 2022, 153: 116648.
|
[46] |
Torniainen J, Afara I O, Prakash M, et al. Open-source python module for automated preprocessing of near infrared spectroscopic data[J]. Analytica Chimica Acta, 2020, 1108: 1-9.
|
[48] |
Stefansson P, Liland K H, Thiis T, et al. Fast method for GA-PLS with simultaneous feature selection and identification of optimal preprocessing technique for datasets with many observations[J]. Journal of Chemometrics, 2020, 34(3): 3195-3209.
|
[52] |
Roger J M, Biancolillo A, Marini F. Sequential preprocessing through ORThogonalization (SPORT) and its application to near infrared spectroscopy[J]. Chemometrics and Intelligent Laboratory Systems, 2020, 199(C): 103975.
|
[53] |
Mishra P, Roger J M, Rutledge D N, et al. SPORT pre-processing can improve near-infrared quality prediction models for fresh fruits and agro-materials[J]. Postharvest Biology and Technology, 2020, 168: 111271.
|
[55] |
Campos M P, Reis M S. Data preprocessing for multiblock modelling-A systematization with new methods[J]. Chemometrics and Intelligent Laboratory Systems, 2020, 199(C): 103959.
|
[57] |
Dankowska A, Domagaa A, Kowalewski W. Quantification of Coffea arabica and Coffea canephora var. robusta concentration in blends by means of synchronous fluorescence and UV-Vis spectroscopies[J]. Talanta, 2017, 172: 215-220.
|
[49] |
Xu L, Zhou Y P, Tang L J, et al. Ensemble preprocessing of near-infrared (NIR) spectra for multivariate calibration[J]. Analytica Chimica Acta, 2008, 616(2): 138-143.
|
|
Sun Z X, Zhao Z G, Liu F. Near-Infrared Spectral Modeling Based on Stacked Supervised Auto-Encoder[J]. Spectroscopy and Spectral Analysis, 2022, 42(3): 749-756.
|
[51] |
Smilde A K, Mage I, Ns T, et al. Common and distinct components in data fusion[J]. Journal of Chemometrics, 2017, 31(7): 2900-2919.
|
[58] |
Esteves C S M , Redrojo E M M, Manjón J L G, et al. Combining FTIR-ATR and OPLS-DA methods for magic mushrooms discrimination[J]. Forensic Chemistry, 2022, 29: 100421.
|
[54] |
Yang W Y, Xiong Y R, Xu Z Z, et al. Piecewise preprocessing of near-infrared spectra for improving prediction ability of a PLS model[J]. Infrared Physics & Technology, 2022, 126: 104359.
|
[56] |
Márquez C, López I M, Ruisánchez I, et al. Ft-Raman and NIR spectroscopy data fusion strategy for multivariate qualitative analysis of food fraud[J]. Talanta, 2016, 161: 80-86.
|
|
|
|