As talked about above, time collection are transformed to Gram timedomain pictures, and the Gram timedomain pictures are used because the enter matrix of convolutional neural networks for classification. In an effort to remedy the issues of advanced computation and sluggish coaching pace of convolutional neural networks, we suggest a way primarily based on the Toeplitz matrix product to switch the convolution operation of the convolution layer, and introduce the concept of triplet community into the loss perform to enhance the effectivity and accuracy of classification.
Convolution primarily based on Toeplitz matrix multiplication
The convolution operation primarily based on the Toeplitz matrix product is proven in Fig. 3. In Fig. 3, the darkish blue sq. represents the convolution kernel, and the sunshine blue sq. represents the matrix being convoluted. The convolution kernel is 2 × 2, the unconvoluted matrix is 3 × 3, and the step measurement is 1. The normal convolution is proven within the higher a part of Fig. 3. The convolution kernel strikes successively on the matrix to be convoluted in accordance with the step measurement of 1, and it requires 4 traversals of the whole matrix to be convoluted. After every traversal, the convolution kernel and the matrix half with its repeated sum are multiplied and collected, and the obtained worth is the native convolution outcome on the corresponding place. Because the conventional convolution must traverse the entire picture, the computational complexity is excessive.
As proven within the decrease a part of Fig. 3, primarily based on the Toeplitz matrix product convolution, every 3 × 3 course of matrix obtained by the convolution kernel traversal matrix is expanded in step with row order to acquire a 4 × 1 × 9 row matrix, forming a big matrix H. Then, the convolution matrix is expanded right into a 9 × 1 column vector X in step with row association order. The product of the massive matrix H constructed by the convolution kernel and the column vector X to be constructed by the convolution matrix successfully replaces the convolution computation. Particularly, the convolution kernel matrix H consists of 6 small matrices, that are respectively the matrix within the crimson field and the matrix within the yellow field, in addition to the zero matrix within the two white elements.
In Fig. 3, the matrix within the crimson field conforms to the definition type of the Toeplitz matrix. Equally, the matrix within the yellow field and the zero matrix are Toeplitz matrices. Subsequently, the convolution kernel matrix constructed is a big Toeplitz matrix composed of a number of small Toeplitz matrices. The product of the Toeplitz matrix is used to switch the standard convolution operation. The convolution kernel is straight constructed into the convolution kernel matrix with out traversing the picture so as of step measurement, and the product of the 2 matrices is calculated to scale back the computational complexity.
Definition 2 Toeplitz matrix: A matrix with the identical components on every diagonal line from the highest left to the underside proper is a Toeplitz matrix, which has the properties (A_{i,j} = A_{i + 1,j + 1} = a_{i – j}). Mathematically,
$$ A = left( {start{array}{*{20}c} {a_{0} } & {a_{ – 1} } & {a_{ – 2} } & {…} & {…} & {a_{{ – left( {n – 1} proper)}} } {a_{1} } & {a_{0} } & {a_{ – 1} } & ddots & {} & vdots {a_{2} } & {a_{1} } & ddots & ddots & ddots & vdots vdots & ddots & ddots & ddots & {a_{ – 1} } & {a_{ – 2} } vdots & {} & ddots & {a_{1} } & {a_{0} } & {a_{ – 1} } {a_{n – 1} } & {…} & {…} & {a_{2} } & {a_{1} } & {a_{0} } finish{array} } proper) $$
(9)
Toeplitz convolution kernel matrix Building
In an effort to exchange the convolution calculation with the Toeplitz matrix multiplication operation, the convolution kernel matrix H is constructed because the Toeplitz convolution kernel matrix H_{t}. Given any convolution kernel matrix as observe:
$$ H = left( {start{array}{*{20}c} {h_{11} } & {h_{12} } & cdots & {h_{1D} } {h_{21} } & {h_{22} } & cdots & {h_{2D} } vdots & vdots & vdots & vdots {h_{C1} } & {h_{C2} } & cdots & {h_{CD} } finish{array} } proper) $$
(10)
The corresponding building steps of the Toeplitz convolution kernel matrix are as follows:

(1)
A small Toeplitz matrix is generated from every row ingredient of the convolution kernel matrix. Because the measurement of the convolution kernel matrix is C × D, the convolution kernel matrix H is split into C Toeplitz matrices: H_{}, H_{1}, H_{2}, H_{3}, …, H_{c1}, the place H_{} is the zero interpolation of the ingredient h_{11} within the first row and first column of H, the variety of inserted zeros is the variety of columns within the convolution kernel matrix H minus 1, and the interpolation result’s taken as the primary row of H_{}. Then h_{12} is interpolated because the second row in accordance with the properties of the Toeplitz matrix till the two × D1 rows are shaped and the H_{} building is accomplished. By analogy, H_{i} is the matrix obtained by interpolating the (i – 1) row components of H. For instance, the convolution kernel matrix is (H = left[ {begin{array}{*{20}c} 1 & 2 3 & 4 end{array} } right]), then H is split into two matrices (H_{0} = left[ {begin{array}{*{20}c} 1 & 0 2 & 1 0 & 2 end{array} } right]) and (H_{1} = left[ {begin{array}{*{20}c} 3 & 0 4 & 3 0 & 4 end{array} } right]).

(2)
The small Toeplitz matrix obtained in Step (1) is shaped into a big Toeplitz matrix:
$$ H_{t} = left( {start{array}{*{20}c} {H_{0} } & 0 & {…} & 0 & 0 {H_{1} } & {H_{0} } & ddots & vdots & vdots {H_{2} } & {H_{1} } & ddots & 0 & 0 vdots & {H_{2} } & ddots & {H_{0} } & 0 {H_{c – 2} } & vdots & ddots & {H_{1} } & {H_{0} } {H_{c – 1} } & {H_{c – 2} } & vdots & {H_{2} } & {H_{1} } 0 & {H_{c – 1} } & {H_{c – 2} } & vdots & {H_{2} } 0 & 0 & {H_{c – 1} } & {H_{c – 2} } & vdots vdots & vdots & vdots & {H_{c – 1} } & {H_{c – 2} } 0 & 0 & 0 & cdots & {H_{c – 1} } finish{array} } proper) $$
(11)
Within the instance in Step (1), (H_{t} = left[ {begin{array}{*{20}c} {H_{0} } & 0 {H_{1} } & {H_{0} } 0 & {H_{1} } end{array} } right]) is obtained by Eq. (11), the place 0 represents a zero matrix of three × 2.
Toeplitz matrix convolution
After acquiring the Toeplitz convolution kernel matrix from “Toeplitz convolution kernel matrix building” part 8, the standard convolution will be changed by the Toeplitz matrix multiplication utilizing Eq. (12).
$$ X*H = H_{t} instances X_{T} $$
(12)
the place (X = left( {start{array}{*{20}c} {x_{11} } & {x_{12} } & cdots & {x_{1B} } {x_{21} } & {x_{22} } & cdots & {x_{2B} } vdots & vdots & vdots & vdots {x_{A1} } & {x_{A2} } & cdots & {x_{AB} } finish{array} } proper)) denotes the matrix to be convolved, (H = left( {start{array}{*{20}c} {h_{11} } & {h_{12} } & cdots & {h_{1D} } {h_{21} } & {h_{22} } & cdots & {h_{2D} } vdots & vdots & vdots & vdots {h_{C1} } & {h_{C2} } & cdots & {h_{CD} } finish{array} } proper)) denotes the convolution kernel, H_{t} is the Toeplitz convolution kernel matrix in “Toeplitz convolution kernel matrix building” part, and X_{T} is the column vector obtained by arranging all the weather of X in row order. Utilizing the total convolution technique, the matrix to be convolved is full of zeros, and the outcome returns all the info after convolution. The row variety of the convolution outcome matrix is M = A + C − 1 and the column variety of the convolution outcome matrix is N = B + D − 1.
For instance, when (X = left[ {begin{array}{*{20}c} 5 & 6 7 & 8 end{array} } right]), then (X_{T} = left[ {begin{array}{*{20}c} 5 & 6 & 7 & 8 end{array} } right]^{T}), and the outcomes that use convolution calculation is (X*H = left[ {begin{array}{*{20}c} 5 & 6 7 & 8 end{array} } right]*left[ {begin{array}{*{20}c} 1 & 2 3 & 4 end{array} } right] = left[ {begin{array}{*{20}c} 5 & {16} & {12} {22} & {60} & {40} {21} & {52} & {32} end{array} } right]). The outcome that makes use of convolution operation primarily based on the Toeplitz matrix is.
(H_{t} instances X_{T} = left[ {begin{array}{*{20}c} {H_{0} } & 0 {H_{1} } & {H_{0} } 0 & {H_{1} } end{array} } right] instances left[ {begin{array}{*{20}c} 5 & 6 & 7 & 8 end{array} } right]^{T} = left[ {begin{array}{*{20}c} 5 & {16} & {12} & {22} & {60} & {40} & {21} & {52} & {32} end{array} } right]^{T}).
Then the calculated column vector is rewritten right into a 3 × 3 matrix in accordance with M = A + C − 1 = 3 and N = B + D − 1 = 3, which is identical because the outcomes of the convolution calculation.
We use the Toeplitz matrix multiplication to successfully exchange the convolution operation. When it comes to time complexity, the enter timedomain picture measurement is A × B, and the convolution kernel measurement is C × D. The convolution operation requires the convolution kernel to constantly traverse the time area picture and calculate A × B × C × D instances multiplication.
When utilizing the Toeplitz matrix multiplication, it solely must calculate the matrix multiplication as soon as. It’s realized from Fig. 3 that there are a lot of zeros in every row of the matrix which doesn’t should be calculated. Thus, the precise calculation of every row is C × D, the row quantity is the time of convolution kernel traverses, and roughly multiply A × B × C × D instances. Subsequently, throughout a calculation, the calculation quantity of the 2 strategies is roughly the identical. Nonetheless, when a brand new timedomain picture is inputted into the standard convolution every time, there are a lot of shift operations within the calculation, which enormously will increase the calculation time.
Though it takes a while to assemble the Toeplitz matrix, Toeplitz matrix multiplication solely must assemble the corresponding Toeplitz matrix as soon as in accordance with the given convolution kernel, after which can straight carry out the matrix multiply calculation on all of the enter timedomain pictures to acquire the convolution outcome. On this approach, for the datasets with a lot of pattern units and take a look at units, the convolution operation time will likely be enormously lowered.
TCNN mannequin classification
When a CNN mannequin is used for classification, its absolutely linked layers carry out convergence operations, and a given loss perform is required. On this paper, the Triplet community is launched into the loss perform, after which the TCNN mannequin is proposed.
Let the pattern set of m samples is (left{ {left( {x^{left( 1 proper)} ,y^{left( 1 proper)} } proper),left( {x^{left( 2 proper)} ,y^{left( 2 proper)} } proper),…,left( {x^{left( m proper)} ,y^{left( m proper)} } proper)} proper}), there are n lessons in these samples, which (y^{left( i proper)}) represents the anticipated output of (x^{left( i proper)}), and the loss perform of CNNs is proven as Eq. (13):
$$ Rleft( {omega ,b} proper) = frac{1}{m}sumlimits_{{i = 1}}^{m} {left( {frac{1}{2}left {p_{{omega ,b}} left( {x^{{left( i proper)}} – y^{{left( i proper)}} } proper)} proper^{2} } proper)} $$
(13)
the place (omega) is the load of every neuron, (b) is the bias, and (p_{omega ,b} left( {x^{i} } proper)) is the precise output of the pattern. The CNN mannequin constantly adjusts the parameter (omega) and (b) by coaching to attenuate (Rleft( {omega ,b} proper)). Equation (13) is the sq. loss perform of the standard convolutional neural community mannequin, which solely considers the class of the picture itself and doesn’t contemplate the variations between totally different classes. Subsequently, we’ll enhance it later.
The CNN makes use of the gradient descent technique to regulate the parameter (R(omega ,b)), as proven in Eqs. (14) and (15):
$$ omega_{ij} = omega_{ij} – afrac{partial }{{partial omega_{ij} }}Rleft( {omega ,b} proper) $$
(14)
$$ b_{ij} = b_{ij} – afrac{partial }{{partial b_{ij} }}Rleft( {omega ,b} proper) $$
(15)
the place a is the educational charge and (Rleft( {omega ,b} proper)) is the CNN loss perform. Equations (14) and (15) are used to replace the values of community parameters (omega) and (b). The calculation technique is the gradient descent technique. In different phrases, the worth of (omega) and (b) will be obtained when the byproduct of the loss perform is 0.
In an effort to enhance the classification accuracy, the Triplet community is launched into the CNN loss perform for constraint, and a TCNN mannequin primarily based on the Triplet loss perform is proposed. The thought of the TCNN mannequin is to enter threetime area pictures at a time, two of which belong to the identical class and one belongs to a different class. The TCNN mannequin can get hold of the characteristic of the time area pictures by coaching and may get hold of the characteristic distinction perform (L_{1}) of twotime area pictures from the identical class and the characteristic distinction perform (L_{2} ) of twotime area pictures from totally different lessons. Then (L_{1} ) and (L_{2} ) are used to regulate the parameters of the TCNN mannequin. (L_{1} ) and (L_{2}) are proven in Eqs. (16 )and (17) respectively:
$$ L_{1} = frac{1}{2}left {p_{{omega ,b}}^{{left( {l_{1} } proper)}} – p_{{omega ,b}}^{{left( {l_{2} } proper)}} } proper^{2} $$
(16)
$$ L_{2} = frac{1}{2}min left {n_{{omega ,b}}^{{left( l proper)}} – p_{{omega ,b}}^{{left( {l_{i} } proper)}} } proper^{2} ,left( {i = 1,2} proper) $$
(17)
the place (p_{omega ,b}^{{left( {l_{i} } proper)}}) is the output worth of the identical class and (n_{omega ,b}^{left( l proper)}) is the output worth of the totally different lessons. The picture characteristic distinction features are proven within the adjustment Eq. (18).
$$ L_{T} = max (0,L_{1} – L_{2} + gamma ) $$
(18)
the place (gamma) represents the minimal distance of the distinction perform between totally different lessons and between lessons (set to 0.1 on this paper). Within the experiment of this paper, the comparability experiment was carried out by altering the worth of (gamma), and the worth of (gamma) was 0.01, 0.05, 0.1, 0.2 and 0.5 respectively. The experiment discovered that 0.1 was the perfect experimental outcome. In every reverse iteration, L_{T} regularly approaches zero. As proven in Fig. 4, when the characteristic distinction perform L_{1} of the identical class of pictures is larger than the characteristic distinction perform L_{2} of various lessons of pictures minus the parameter α, L_{T} is larger than zero, and the mannequin is adjusted in reverse to make L_{1} smaller and L_{2} bigger. Reference ^{21} has verified that the Triplet loss perform could make samples of the identical variety shut to one another and samples of various varieties removed from one another.
In Fig. 4, A and P belong to the identical class, whereas N doesn’t belong to the identical class as A and P. Earlier than the adjustment, the space between A and P is larger than that between A and N, and the distinction perform L_{T} is larger than zero. Thus, the mannequin parameters should be adjusted in reverse. After the adjustment, the space between A and N turns into bigger, whereas the space between A and P turns into smaller.
In keeping with Eqs. (16) and (17), in every reverse iteration, it may be seen that L_{1} will make the characteristic distinction of the identical class smaller, whereas L_{2} will make the characteristic distinction of various lessons bigger. On this foundation, a Triplet loss perform is proposed as proven in Eq. (19):
$$ Lleft( {omega ,b} proper) = Rleft( {omega ,b} proper) + alpha L_{1} – beta L_{2} $$
(19)
the place (Rleft( {omega ,b} proper)) denotes the CNN sq. loss perform, (alpha) and (beta) are the load proportion coefficients better than zero. Within the experiment, we examined the values of (alpha) and (beta). The values of (alpha) have been 0.1, 0.01, 0.3, 0.4 and so forth, and the values of (beta) have been 0.9, 0.99, 0.7, 0.6 and so forth. After a number of experiments, it was discovered that the values of (alpha) and (beta) have been 0.4, 0.6 respectively, and the experimental impact was the perfect. L_{1} is the characteristic distinction perform of the identical class, and L_{2} is the characteristic distinction perform of various lessons. Subsequently, the brand new residual error of every layer by the backpropagation algorithm is as follows:
$$ omega_{ij} = omega_{ij} – afrac{partial }{{partial omega_{ij} }}Lleft( {omega ,b} proper) $$
(20)
$$ b_{ij} = b_{ij} – afrac{partial }{{partial b_{ij} }}Lleft( {omega ,b} proper) $$
(21)
The TCNN mannequin primarily based on the Triplet community provides the characteristic distinction perform between the identical class and the characteristic distinction perform between totally different lessons right into a crossentropy loss perform, which is conducive to permit the parameters to extract options with bigger variations extra shortly within the strategy of parameter weight adjustment. The partial byproduct of (Lleft( {omega ,b} proper)) could make the backpropagation residual calculation to acquire new parameters (omega) and (b). Every iteration is extra inclined to the course of gradient descent, which might make the mannequin converge quicker and enhance the classification effectivity.
The TCNN mannequin construction used on this paper is 5 × 5 convolution of 128 neurons within the first layer, 5 × 5 convolution of 128 neurons within the second layer, most pooling layer of two × 2 within the third layer, 3 × 3 convolution of 256 neurons within the fourth layer, 3 × 3 convolution of 256 neurons within the fifth layer, most pooling layer of two × 2 within the sixth layer, 1024 neurons within the full connection layer within the seventh layer. The loss perform is the Tripletbased loss perform, the activation perform is the sigmoid perform, and the worth vary of the perform is (0,1). Determine 5 is the mannequin construction. TCNN time collection classification algorithm is proven in Algorithm 1.