Model design company explains the preparation work before production

2019-07-13

Throughout the process, the model design company 's motivation and direction are very important when designing the model. It includes the definition of requirements and problems, the mathematical model for establishing problems, the relationship between validating data and solving problems, the possibility of exploring problem solutions, and the achievability and evaluability of policies.

Data preparation for experience

Enough data: Model design companies must first have sufficient model design data, which is divided into two levels. The first is the data characteristics, which are used to confirm whether the model design policy can be achieved. The characteristics should have a certain “causal relationship” and the distribution should have “directedness”. The second data set should be as much as possible, DNN requires a lot of data, and the model simply over fits on a small data set. It is argued that if the conditional agreement can test the expansion of the original data set.

Data pre-processing: Data pre-processing is a headache for many people in the industry, with different solutions for different scenarios. Briefly introduce several common ways. The first is to go to the mean processing, that is, subtract the mean of all the data from the original data, and center the data of each dimension of the input data to 0. After the mean processing, although the features are obvious, the comparison between the features is not clear, so the normalization process is used to divide the data in each dimension by the standard of the data in this dimension. There is also a PCA/Whiteing method suitable for image processing. The features between adjacent pixels in the image are very similar and cannot be easily converged. PCA can remove the correlation of these adjacent features and achieve fast convergence.

Shuffle of data: There will be a lot of batches every time the epoch is scored. In general, these batches are the same, but the ideal state is that each epoch has a different batch. Therefore, if the condition is promised, it should be shuffled (randomized) once per epoch to get different batches.