Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry data sets

論文はこれ

Background:

GC-MS,LC-MSデータのRetention Time(RT)のアラインメントは大まかに2つのカテゴリに分けられる
Peak-based algorithms：事前のpeak detectionに対してとてもsensitiveである。ピークモデルの形やSNR(signal-to-noise ratio)などのクライテリアとの一致でpeak detectする。
Raw data-based algorithms：uniform matrixと名付けられたbinned MS dataに基づいて実行。Uniform matrixはsignal mapsまたはbinned mass spectra間のペアワイズ類似度でDTW (Dynamic Time Warping) をするuniform matrix。多くのmass tracesを使うことはコンピュータリソースを必要とし、ノイズの傾向を増加させるかもしれない。

Methods:

Best hIts Peak Assignment and Cluster Extesion(BIPACE)

${f(p,q) = s(p,q) \exp(t_p-t_q)^{2} /2D^{2}, s(p,q) = \cos(arg(i_p,i_q))}$

CEnter-star Multiple Alignment by Pairwise Partitioned Dynamic Time Warping (CEMAPP-DTW)

Pairwise DTW is a global alignment of two series A=(a1,a2,…,am) and B=(b1,b2,…,bn), where ai, bi∈R^L
optimal alignmentを行うために(M+1)×(N+1)のmatrix Qを用意
Q[i, j]には(a1,a2,…,ai) and (b1,b2…,bj)のalignmentの最適なsimilarity valueが格納される
すると、このようなパスを作ることができる
最適なパスは、pairwise similaritiesの和を最大化する。
つまり、DTW(A,B) := max (Σ pi∈P Q(pi)) (Pはパスの集合)
計算時は、i=j=0から始めて、垂直方向に進むか、水平方向に進むか、対角線方向に進むのうち、最もsimilality valueが増加する方向を選択し、Q(M,N)まで辿り着けばそれがDTW(A,B)となる
パスに柔軟性を持たせるため、垂直方向、水平方向、対角線方向にはウェイトを設定する
symmetricなDTWの場合、これらのウェイトは過適合の問題を効果的に減らすためにも使われる