AIS Essay Template

The Abstract Regression Classification (ARC) system is a machine learning tool for white box Feature Discovery. The basic process uses deep learning to build algebra networks of new features. Each layer in the resulting Algebra Network is a new feature with measured predictive power against the target variable. Each new feature is a human readable, white box, simple algebra formula, all linked together in a human readable network.

There are many areas of data analysis wherein pure target prediction, with a black box predictive tool, is insifficient to accomplish the mission goals. One of these domains is most definitely stock trading systems wherein pure black box prediction is insufficient to accomplish the mission goals. Too demonstrate why feature discovery is so important for stock trading systems, imagine a black box system trading the oil market in 2014. Ever since the 1970's OPEC had stepped in to support oil prices as they fell to cyclic lows. If one had a white box feature discovery tool, it would have been obvious that OPEC support of oil prices was an important feature of the oil market predictions. In late 2014, in an effort to attack the US shale producers, OPEC announced that it would no longer support the oil market - resulting in a deep crash in oil prices. Anyone who realised the importance of OPEC market support, would have known to stop trading. Black box prediction systems just kept on trading oil as if nothing important had happened, resulting in unpredicented trading losses. In stock market trading systems it is insufficient to know what a prediction is. It is also important to know why the prediction was made and what data features were the drivers of the prediction.

As a side product of its main feature discovery task, the ARC system performs an ongoing machine learning tasks as follows.

There are many tools for performing regression-classification such as - neural nets, decision trees, decision forests, etc. Most of these predictive tools are black box (i.e. they do not produce human understandable information as to how they arrived at their predictions). while the ARC system delivers predictive accuracy competitive with all of these tools, ARC's main task is feature discovery. ARC's main task is to produce a prediction AND to produce a human readable, easy to follow, explanation of how the prediction was computed. This easy to read explanation comes in the form of a set of new features. Each new feature is a simple, easy to read algebra formula, which extends the original data making it more predictive. The ARC system started development in 2009 Inspired by our participation in the Genetic Programming Theory and Practice conferences. Over the years ARC has evolved to be a powerful system for discovering the hidden features which drive prediction in user data.

ARC final models are highly accurate computer programs with each deep learning feature delivered as a simple arithmetic expression. These easily understood programs are deliverable in many popular languages such as Lisp, JavaScript, KNIME, Excel, etc. No exotic hardware is required to run the final models. The translated ARC network models are easily embedded in any existing application package. ARC deep learning network models are the ultimate in white box hidden feature extraction. Once trained, ARC deep learning networks produce simple, readable Lisp code that can be translated to JavaScript, Java, KNIME, and even Excel for model execution. It's not just exceptional accuracy, but exceptional clarity which makes ARC a must have tool for the data scientist - especially those building stock market trading systems.

The ARC system learns without requiring complicated parameter settings. Just present the separate training and testing data and let ARC start learning all by itself. ARC includes formula evolution, optimal parameter evolution, and optimal search strategy evolution all internal to the learning process. ARC's academic publication history includes a series of search strategies mathematically designed to provide extreme accuracy on disparate data sets. During the learning process, ARC varies its search strategies in response to the data dynamics. A trained ARC deep learning network can always be used to resume learning from its last stopping point. Life-long learning is the hallmark of the ARC system.

ARC provides the data scientist with an exceptional tool for providing deeper insights into the hidden features which drive predictability in the user data being examined.

White Box Features

Deep learning neural networks have produced some notable successes in several fields. Inspired by the deep learning successes with neural nets, we extend our Abstract Regression Classification (ARC) system to evolve deep learning networks of algebraic expressions. The new enhanced system is used to train algebra networks on our ten theoretical classification problems with good performance advances. It would appear that the advantages of deep networks are not limited to neurons alone, but these advantages, at least some meaningful performance level, also extend to deep learning networks of more general algebraic expressions.

The problems we are attempting to solve herein are described by the simple matrix equation in (E0) where Y is a numeric vector of N elements and X is a numeric matrix of N rows and M columns, Hy is an optimized function on X and error is the term to be minimized. A perfect score would be where error = 0.

Some very impressive increases, in predictive accuracy and training ease, have been achieved by combining Deep Learning concepts with Genetic Programming. ARC networks take deep learning, once the exclusive domain of neural networks, to a very impressive level of both accuracy and ease of training.

ARC Deep Learning Networks are "evolved". ARC Networks are similar to feed-forward Neural Networks in some ways and different in other ways. Like feed-forward neural networks, ARC networks accept multiple inputs which feed into the first "layer". The outputs of each layer feed into the next layer and the next layer until the final layer becomes the output layer. Like feed-forward neural networks, ARC networks can have an unlimited number of "layers".

One difference between neural networks and ARC networks is the technology used to evolve them. Neural networks are evolved using several different connectionist algorithms including backprogagation, counterpropagation, RProp, amd others. ARC networks are evolved using Genetic Programming which is also an evolutionary technology but one of a very different nature. The differences in the fundamental evolutionary technologies used to grow each type of network leads to another very important difference between ARC networks and neural networks - the output formulas from each "layer".

Both feed-forward neural network layer outputs, and ARC network layer outputs are simple mathematical "weighted-sums" like the simple formulas used in polynomials, multiple regression, linear discriminant analysis, and/or support vectors. A typical example of a neural net layer output weighted-sum formula would be:

As one can easily see, each input feature is multiplied only by its "weight". The sum of all the weighted inputs is then run through an "activation function" - in this typical case the hypertangent function (tanh). Neural nets were inspired by the way neurons in the brain signal each other to produce computations. The difference between neural network layers and ARC network layers is in the RESTRICTIONS on the weighted-sum formulas produced as output from each layer. For instance, let us assume that we have training data for a simple regression problem whose solution is the following simple mathematical weighted-sum formula:

The neural network to "solve" this problem will contain several layers of many intertwined tanh() formulas until this regression formula's behavior is "simulated", with high accuracy, by the multiple neuron-like formuals from each layer. Unfortunately while neural net formulas are brain inspired, they are also verbose. It takes a great many interwoven tanh() formulas for neural networks to simulate most regression-classification problems.

Therein lies the difference between neural network layer output formulas and ARC network layer output formulas. Each ARC network output formula is a general mathematical weighted-sum formula (not restricted to any brain-inspired or biological-inspired format). So a typical ARC network for the above regression problem would be a single layer whose output formula looks very familiar:

Yes exactly like the original version of the regression problem. ARC network layer output formulas are "general" weighted-sum formulas without any restrictions. ARC network layer output formulas are much less verbose than neural networks and each layer output formula is "human readable".

Deep Learning Algebra Networks

White Box Features

Deep Learning Algebra Networks

ARC Brief Background

By way of providing some background, our Abstract Regression Classification (ARC) system has been under research and development since 2004. ARC has been heavily industrialized and requires no genetic programming specific input parameters. Only the names of the training and testing data files and the nature of the target variable (numeric, binary, or nary) need be specified. The selection of the fitness method, running of multiple genetic programming runs with different random number seeds, splicing the different runs together to form a layered network, determining when the system is finished learning, etc., all of these tasks are hidden from the user by the ARC system planning module. Only a single ARC training run is needed per problem, and the ARC system is guaranteed to converge on the best solution it can find in the finite time and computation resources allotted.

The ARC planning module is based around the Regression Query Language (RQL) which is an SQL inspired search language for specifying genetic programming symbolic regression and classifications runs. The RQL language is briefly described herein and can be used to set in motion single island or multiple island genetic programming runs with aged layered, pareto, elitist, and many other GP methodologies. The RQL language is quite sophisticated and [2] describes an RQL specification which is conjectured to be absolutely accurate on certain scientific problems. The ARC planning module currently contains a library of numerous predefined RQL searches known to be effective for specific problems. The planning module applies its library of known RQL searches, based upon its own heuristic and statistical analysis of the data to be optimized. Human intervention is not required.

Each ARC training run hides thousands of separate genetic programming runs from the user. An internal log is produced detailing the planning module?s decisions and the histories of each of the multiple genetic programming runs. A typical problem requires hundreds of hours of ARC training. For instance, running on a dedicated Dell laptop purchased in Dec 2019 with an Intel i9 chip 128Gig of RAM and 16 cpu cores, to train a regression on 23,198 training rows of securities historical data with 81 features per row, ARC required 295 cpu core hours of training time and over 1.53 days of elapsed time to produce a proprietary 28-layer ARC network model which exceeded management?s expectations in live trading market performance.

Linear Regression

A classic statistical problem is to try to determine the relationship between two random variables X and Y. For example, we might consider height and weight of a sample of adults. Linear regression attempts to explain this relationship with a straight line fit to the data. The linear regression model postulates that Y = a + bX + e. Where the "error" e is a random variable with mean zero. The coefficients a and b are determined by the condition that the sum of the square residuals is as small as possible.

For instance if X = #(1 2 3 4) and Y = #(4 5 6 7), then a = 3.0, b = 1.0 and e = 0.0.

If X = #(1 2 3 4) and Y = #(2 4 6 8), then a = 0.0, b = 2.0 and e = 0.0.

Also if X = #(1 2 3 4) and Y = #(1.5 4 6.2 8), then a = -0.5, b = 2.17 and e = 0.123.

Multiple Linear Regression. The general purpose of multiple regression (the term was first used by Pearson, 1908) is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. For example, a real estate Lambda might record for each listing the size of the house (in square feet), the number of bedrooms, the average income in the respective neighborhood according to census data, and a subjective rating of appeal of the house. Once this information has been compiled for various houses it would be interesting to see whether and how these measures relate to the price for which a house is sold. For example, one might learn that the number of bedrooms is a better predictor of the price for which a house sells in a particular neighborhood than how "pretty" the house is (subjective rating). One may also detect "outliers," that is, houses that should really sell for more, given their location and characteristics.

Personnel professionals customarily use multiple regression procedures to determine equitable compensation. One can determine a number of factors or dimensions such as "amount of responsibility" (Resp) or "number of people to supervise" (No_Super) that one believes to contribute to the value of a job. The personnel analyst then usually conducts a salary survey among comparable companies in the market, recording the salaries and respective characteristics (i.e., values on dimensions) for different positions. This information can be used in a multiple regression analysis to build a regression equation of the form:

Salary = .5*Resp + .8*No_Super

Once this so-called regression line has been determined, the analyst can now easily construct a graph of the expected (predicted) salaries and the actual salaries of job incumbents in his or her company. Thus, the analyst is able to determine which position is underpaid (below the regression line) or overpaid (above the regression line), or paid equitably.

In the social and natural sciences multiple regression procedures are very widely used in research. In general, multiple regression allows the researcher to ask (and hopefully answer) the general question "what is the best predictor of ...". For example, educational researchers might want to learn what are the best predictors of success in high-school. Psychologists may want to determine which personality variable best predicts social adjustment. Sociologists may want to find out which of the multiple social indicators best predict whether or not a new immigrant group will adapt and be absorbed into society.

The general computational problem that needs to be solved in multiple regression analysis is to fit a straight line to a number of points.

Least Squares. In our problem, we have an independent or X variable, and a dependent or Y variable. These variables may, for example, represent IQ (intelligence as measured by a test) and school achievement (grade point average; GPA), respectively. Each point in the plot represents one student, that is, the respective student's IQ and GPA. The goal of linear regression procedures is to fit a line through the points. Specifically, the program will compute a line so that the squared deviations of the observed points from that line are minimized. Thus, this general procedure is sometimes also referred to as least squares estimation.

The Regression Equation. A line in a two dimensional or two-variable space is defined by the equation Y=a+b*X; in full text: the Y variable can be expressed in terms of a constant (a) and a slope (b) times the X variable. The constant is also referred to as the intercept, and the slope as the regression coefficient or B coefficient. For example, GPA may best be predicted as 1+.02*IQ. Thus, knowing that a student has an IQ of 130 would lead us to predict that her GPA would be 3.6 (since, 1+.02*130=3.6). In the multivariate case, when there is more than one independent variable, the regression line cannot be visualized in the two dimensional space, but can be computed just as easily. For example, if in addition to IQ we had additional predictors of achievement (e.g., Motivation, Self- discipline) we could construct a linear equation containing all those variables. In general then, multiple regression procedures will estimate a linear equation of the form: Y = a + b1*X1 + b2*X2 + ... + bp*Xp

Unique Prediction and Partial Correlation. Note that in this equation, the regression coefficients (or B coefficients) represent the independent contributions of each independent variable to the prediction of the dependent variable. Another way to express this fact is to say that, for example, variable X1 is correlated with the Y variable, after controlling for all other independent variables. This type of correlation is also referred to as a partial correlation (this term was first used by Yule, 1907). Perhaps the following example will clarify this issue. One would probably find a significant negative correlation between hair length and height in the population (i.e., short people have longer hair). At first this may seem odd; however, if we were to add the variable Gender into the multiple regression equation, this correlation would probably disappear. This is because women, on the average, have longer hair than men; they also are shorter on the average than men. Thus, after we remove this gender difference by entering Gender into the equation, the relationship between hair length and height disappears because hair length does not make any unique contribution to the prediction of height, above and beyond what it shares in the prediction with variable Gender. Put another way, after controlling for the variable Gender, the partial correlation between hair length and height is zero.

Predicted and Residual Scores. The regression line expresses the best prediction of the dependent variable (Y), given the independent variables (X). However, nature is rarely (if ever) perfectly predictable, and usually there is substantial variation of the observed points around the fitted regression line (as in the scatterplot shown earlier). The deviation of a particular point from the regression line (its predicted value) is called the residual value.

Residual Variance and R-square. The smaller the variability of the residual values around the regression line relative to the overall variability, the better is our prediction. For example, if there is no relationship between the X and Y variables, then the ratio of the residual variability of the Y variable to the original variance is equal to 1.0. If X and Y are perfectly related then there is no residual variance and the ratio of variance would be 0.0. In most cases, the ratio would fall somewhere between these extremes, that is, between 0.0 and 1.0. 1.0 minus this ratio is referred to as R-square or the coefficient of determination. This value is immediately interpretable in the following manner. If we have an R-square of 0.4 then we know that the variability of the Y values around the regression line is 1-0.4 times the original variance; in other words we have explained 40% of the original variability, and are left with 60% residual variability. Ideally, we would like to explain most if not all of the original variability. The R-square value is an indicator of how well the model fits the data (e.g., an R-square close to 1.0 indicates that we have accounted for almost all of the variability with the variables specified in the model).

Interpreting the Correlation Coefficient R. Customarily, the degree to which two or more predictors (independent or X variables) are related to the dependent (Y) variable is expressed in the correlation coefficient R, which is the square root of R-square. In multiple regression, R can assume values between 0 and 1. To interpret the direction of the relationship between variables, one looks at the signs (plus or minus) of the regression or B coefficients. If a B coefficient is positive, then the relationship of this variable with the dependent variable is positive (e.g., the greater the IQ the better the grade point average); if the B coefficient is negative then the relationship is negative (e.g., the lower the class size the better the average test scores). Of course, if the B coefficient is equal to 0 then there is no relationship between the variables.

NonLinear Regression

Nonlinear regression in statistics is the problem of fitting a model y = f(x,P) + e to multidimensional x, y data, where f is a nonlinear function of x with parameters P, and where the "error" e is a random variable with mean zero. In general, there is no algebraic expression for the best-fitting parameters P, as there is in linear regression. Usually numerical optimization algorithms are applied to determine the best-fitting parameters. There may be many local maxima of the goodness of fit, again in contrast to linear regression, in which there is usually a unique global maximum of the goodness of fit. To determine which maximum is to be located using numerical optimization, guess values of parameters are used. Some nonlinear regression problems can be linearized if the exact solution to the guess-regression equation can be found.

For example:

If we take a logarithm of y = A*exp(B*x) and cast it as a linear regression, it will look like log(y) = log(A) + B*x, a usual linear regression problem of optimizing parameters log(A) and B, the exact solution of which is well known.

However, performing such a linearization may bias some data towards being more "relevant" than others, which may not be a desired effect. More complex problems, such as transcendental regression are optimized by more complex algorithms. Other nonlinear regressions may have several goodness of fit maxima, and will require the scientist to input guess values for the optimized parameters.

Nonlinear regression fits a mathematical model to your data, and therefore Nonlinear regression requires that you choose a model. What is a model? A mathematical model is a simple description of a physical, chemical or biological state or process. Using a model can help you think about chemical and physiological processes or mechanisms, enabling you to design better experiments and make sense of the results. Your model must be expressed as a mathematical function. You can express the model as a single algebraic equation. You can express the model as a set of differential equations or you can write an equation in a manner that lets you have different models for different portions of your data.

Choosing a model for NonLinear regression obviously requires some understanding of the problem data and some preference for one choice of model over another.

Neural Net Regression

Neural Net regression is the problem of training a neural net model y = Nf(x,S,Ch,Wh,Co,Wo) + e on multidimensional M-Vectors x in X and real numbers y in Y, such that the trained numeric coefficients, Ch, Wh, Co and Wo, optimize the least squares error component, e which is a random variable with mean zero. Normally, a neural net is a complex learning machine receiving M dimensional inputs x, containing hidden layer coefficients Ch and Wh, also containing output coefficients Co and Wo, producing hidden layer internal signals S, and producing one real number output signal y.

A standard neural net, Nf, is defined by M, the number of input dimensions, and by K, the number of hidden layers. There are 1 thru M inputs, 0 thru K layers (with 1 thru M internal signals produced for each layer), and one real number output signal. The number of components inside the neural net learning machine is complex, and they are as follows.

A standard Neural Net operates by generating internal signals which propagate up through each layer until the final output signal is produced. The number and composition of these signals inside the neural net is complex, and they are as follows.

Normally each internal signal, S[k][m], is either a weighted function of the outputs from the layer below or an original input element x[m]. All of the neural nets, which we will consider in this document, are fully connected neural nets, receiving M input signals, with zero or more hidden layers, and have one real number output signal.

Symbolic Regression

Symbolic regression is the induction of mathematical expressions from data. This is called symbolic regression (first mentioned by Koza in 1992), to emphasize the fact that the object of search is a symbolic description of a model, not just discovering a set of optimized coefficients in a prespecified model. This is in sharp contrast with other methods of regression, including NonLinear regression, where a specific model is assumed and often only the complexity of this model can be varied.

For example:

If we start with a trigonometric function y = 3.56*cos((21.456*x1)-log(x2)) and create a training data set we might produce a three column by 1000 row matrix. The first column is the independent variable X1. The second column is the independent variable X2, and the third column is the dependent variable Y. The number of training rows, in this case 1000, is arbitrary; but, the larger the number of rows, the better for training purposes.

When we present the training data set, created above, to a symbolic regression machine, we will know that the Y column can be derived from the X columns using the model: y = 3.56*cos((21.456*x1)-log(x2)). However, the symbolic regression machine will not know this fact. We want the symbolic regression machine to discover this model relationship on its own without any further hints other than the training data set.

In the case of a symbolic regression machine, what is a model? For our purposes, using Analytic Information Server, a symbolic regression machine model is an AIS Lambda which inputs a number vector, x, and outputs a single number, y. Furthermore we express all symbolic regression models in the Estimator language. If our hypothetical symbolic regression machine were to discover the relationship hidden in the training data correctly, it would return a model regression Lambda as follows: function(x) {y = 3.56*cos((21.456*x[0])-log(x[1]))}. This would be a perfect score for our hypothetical symbolic regression machine.

Grammatical Swarm Evolution

Grammatical Swarm Evolution is a just-in-time algorithm that can evolve computer programs, rulesets, or more generally sentences in any language. Rulesets could be as diverse as a regression model or a trading system for a financial market. Rather than representing the programs as syntax trees, as in Genetic Programming, a linear genome representation is used in conjunction with a grammar. Each individual genome is a vector of integer codons each of which contains the information to select production rules from a grammar. The mapping from the genotype (genome) to the phenotype (computer program) is accomplished by reading the genome, from first element to last element, and using each integer codon to select a grammar rule. Selected grammar rules are used to incrementally build a grammatically correct computer program. The mapping process has been engineered so that it will always terminate, either at or before the end of the genome, with a valid program.

For the evolutionary component of its algorithm, Grammatical Swarm Evolution, turns to a technique called Particle Swarm optimization (PSO). The PSO algorithm was introduced by Kennedy and Eberhart in 1995. In PSO, a swarm of particles, which encode solutions to the problem of interest, move around in an n-dimensional search space in an attempt to uncover better solutions. Each of the particles has two associated properties, a current position and a velocity. Each particle has a memory of the best location in the search space that it has found so far (pbest), and knows the best location found to date by all the particles in the population (gbest). At each step of the algorithm, particles move from their current location by applying a velocity vector to their current position vector. The magnitude and direction of their velocity at each step is influenced by their velocity in the previous iteration of the algorithm, simulating momentum, and the location of the particle relative to the location of its pbest and the gbest. Therefore, at each step, the size and direction of each particle's move is a function of its own history, and the social influence of its peer group.

Therefore, for grammatical purposes, the genome is seen as a vector of integer codons; however, for evolutionary purposes, the genome is viewed as a discrete binary bit vector for use as a location in an n-dimensional search space. The evolutionary search mechanism is performed by the discrete binary version of PSO.

Combining All Of These into a Tool

The Abstract Regression-Classification Lambda (ARC) is a learning machine which learns to select and score individuals from a universe of individuals over time. Over a series of discrete time steps, a universe of individuals is collected for each timestep. The individuals are things such as Stocks, People, Cities, etc. The discrete time steps are weeks, days, seconds, years, microseconds, etc.

Each individual followed by the system is given a unique identifier which remains unique across all time periods studied (no two individuals ever have the same identifier). Furthermore, each time period studied is given a unique ascending integer index (i.e. week 1, week 2, etc.). So, for a series of time periods, historical information about groups of individuals is collected for each time period. The historical information collected for each individual for each time period is stored in a Number Vector and includes: the time period index; the unique identifier of the individual; and other numeric information about the individual pertinent to the current investigation. Finally, each individual in each time period is given a numeric "score" which determines the value of the individual in that time period.

During training, the ARC is given historical information for time periods 0 through T for all individuals. The ARC is also given the "score" values for each individual in each training time period from 0 through T. During training the ARC attempts to "learn" any patterns in the available historical data. The machine (ARC) is free to discover static as well as time varying patterns.

During forward prediction, the ARC is given new information for time period T+1 for all individuals. The ARC is NOT given the "score" values for each individual in the new time period T+1. During prediction the ARC attempts to use any patterns it has learned to select and score the individuals, from the universe of individuals, seen in time period T+1. Once the machine scores the individuals, in the new time period, the accuracy of the machine is determined by: (a) the least squares error on the scored individuals in time period T+1; and (b) by the "order preserving" property of the estimated scores in time period T+1 (the degree to which the estimated scores preserve the natural sort ordering of the actual scores). Order preservation is a simple idea where if the estimated score for individual x is less than the estimated score for individual y, then the actual score for individual x should also be less than the actual actual score for individual y in time period T+1. Normally these two measures should be coincident -- especially if the least squares error is excellent. However, in cases where there is insufficient information in the training data, they may not be coincident. If a tradeoff is required, the Abstract Regression-Classification prefers that at least natural ordering be preserved.

A time series set of vectors, X, together with a set, Y, of scores for the vectors in X are used to train a learning machine. There is also a testing set, TX and TY, of of vectors similar to those in X and Y but for the testing time period (a time period not present in X or Y). After training, the machine is presented with the testing set, TX and attempts to estimate TY. The learning machine returns a Vector EY, of estimates for TY. Each element in EY contains a numeric estimate for the corresponding value in TY. The learning machine attempts to: (a) minimize the least squared error between EY and TY; and to, as much as possible, have the natural ordering of EY be predictive of the natural ordering of TY.

The order preservation mission of the Abstract Regression-Classification is an important distinguishing feature between this learning machine and general regression learning machines. The ARC is trying to fit a function to the testing data, using least squares error; but, the ARC is also trying to predict the natural order of the individuals in the testing data.

In many instances the Abstract Regression-Classification may not have an exact estimate for the scores of the individuals in the testing data. However, if the learning machine is able to predict the natural ordering of individuals in the testing data, then the machine has been partially successful even if its estimated scores are incorrect.

Let X be a set of vectors such as the numeric vector x = #(num| xtime x1 x2 x3 ... xm), and let Y be a numeric vector of "score" values. Let both X and Y be of length N.

Furthermore, let the first prefix element, xtime, of every vector, in X, contain a non-negative integer value indicating some time span of integral length, for example, if the time span were weeks, a value of 1 would indicate week one, and a value of 10 would indicate week ten, etc. (i.e. the vectors contained in X are time sequenced).

arcnetLearn

The arcnetLearn function is the main machine learning statement for training Deep Learning Algebra Networks. Calling the arcnetLearn function, with the following arguments, will evolve an ARC Deep Learning Network to best fit the training data you supply. Evolving an ARC Network is a one shot process. Unlike deep learning neural networks, ARC Network training always converges on a best solution in finite time (with 2019 laptop computers approximately several days to one week running in the background even for the toughest problems). When you notice, from the ARC console log diplays, that the system is in backwardation and no better solutions are forthcoming, it is time to stop the learning process. Running multiple arcnetLearn training runs will not improve the final solution. One arcnetLearn training run per problem is all you need.

Type: Function

Arguments come in pairs starting with argument name followed by argument value as follows:

Syntax: (arc.arcnetLearn name:"testName" train:"trainingFileName" test:"testingFileName" target:"target" forks:forks epochs:epochs ...) "

name:	testName	Select a unique and "meaningful" test name. ARC produces several output files when evolving the ARC Network. These output files will appear in your working directory and they are as follows: "testName_ArcNet.sl" - This is the ascii version of the actual ARC Algebra Network in AIS Lisp source code. "testName_ArcNet.js" - This is the executable version of the actual ARC Algebra Network in AIS javaScript source code. "testName_ArcNet.db" - This is the binary version of the actual ARC Algebra Network. "testName_ArcNet.txt" - This is human-readable text summary of the actual ARC Algebra Network. "testName_Output.csv" - This is the comma delimited output file with all output columns appended on the right.
train:	trainingFileName	The name of the training file to be supplied for this ARC network training run. The training data MUST be preprocessed before input to ARC. There can be ONLY NUMERIC data. Columns containing labels should have the labels translated to contiguous integers starting with zero (0) and with NO gaps. For instance California = 0, Arizona = 1, Nevada = 2, etc. Keep the California, Arizona, and Nevada labels in a separate dictionary file for yourself. Leave only the 0, 1, 2 integers in the training data supplied to ARC. For best training results, the range of all numeric data should be within [+-10e15]. The number of columns times the number of rows (row-columns) MUST be less than 50 million. ARC will accept tab delimited or comma delimited (.csv) training data. If the file suffix is NOT .csv, ARC will assume that the training data is tab delimited.
test:	testingFileName	The name of the testing file to be supplied for this ARC network training run. The testing data MUST be preprocessed before input to ARC. There can be ONLY NUMERIC data. Columns containing labels should have the labels translated to contiguous integers starting with zero (0) and with NO gaps. For instance California = 0, Arizona = 1, Nevada = 2, etc. Keep the California, Arizona, and Nevada labels in a separate dictionary file for yourself. Leave only the 0, 1, 2 integers in the training data supplied to ARC. For best training results, the range of all numeric data should be within [+-10e15]. The number of row-columns for the testing data MUST be less than 50 million. ARC will accept tab delimited or comma delimited (.csv) testing data. If the file suffix is NOT (.csv), ARC will assume that the testing data is tab delimited. Note: It is NOT necessary to divide the supervised data into separate training and testing data sets. If both training and testing files have the same name, ARC will perform the saparation internally (often better than any human could).
target:	target	If ("numeric"), the right most target column is numeric with inherent sort order (preferably in the range [-10e15 to +10e15]). If ("capgains"), the right most target column is numeric with inherent sort order in the range from [-100% to 1000000000%]. If ("binary"), the right most target column is the integers 0 and 1 without inherent sort order where classify errors should be proportionately distributed among all classes. If ("nary"), the right most target column is discrete contiguous integers (0, 1, thru N) without inherent sort order where classify errors should be proportionately distributed among all classes. If ("neven"), the right most target column is discrete contiguous integers (0, 1, thru N) without inherent sort order where classify need not be proportionately distributed among all classes.
forks:	forks	The computer resources to be used in training. Must be an integer from 1 to N indicating the number of separate ARC machine learning runs to run simultaneously in each training epoch. Each of these must run in a separate process on a separate core SYNCHRONOUSLY. If a multiple core computer is not available, this should be fork = 1. If there is a fraction present (i.e. 4.2) then the fraction represents the number of ASYNCHRONOUS retrograde repeat runs to be attempted for each epoch. For instance, assuming that you feel 12 cores are necessary to solve this problem AND your computer has at least 12 hyperthreaded cores, then fork = 12. However, what if your computer only has 4 hyperthreaded cores? You still need to apply 12 cores per epoch (the problem hasn't gotten easier just because resources are lacking) then fork = 4.2, indicating that each epoch will start with 4 synchronously running cores then the epoch will proceed with ANOTHER 4 cores training synchronously followed with ANOTHER 4 cores training synchronously. It is not as fast nor as accurate as running 12 cores, but 4 then 4 then 4 is all you can apply if only 4 cores are available.
epochs:	epochs	The number of epochs to be applied for this training. Each ARC epoch MAY produce a new layer in the ARC Network if the epoch produces an improvement over the previous layer. If no improvment is produced, ARC will proceed to the next epoch and try again, leaving the ARC network uneffected. Training can be halted at any time and restarted at any time. At some point the epochs will go into backwardation and simply not be able to produce any further improvments in the model. When that happens, it is time to quit further training. ARC simply cannot achieve a better fit with current technology. ARC network training is a one shot process guaranteed to converge in finite time. Running multiple arcnetLearn training runs will not improve the final solution. One arcnetLearn training run per problem is all you need.
rowID:	rowID	(Optional) If (true) the left most column is a row ID (must be ONLY NUMERIC) and will NOT be included in the training.
net:	arcnetFileName	(Optional) If this is the first attempt at training this ARC network then (#void or "") - (default ""); OR the name of the ARC network file produces by previous training attempts ("testName_ArcNet.sl"); OR the name of the ARC network definition file for complex user defined networks ("testName_ArcDef.sl").
rql:	rqlCommand	(Optional) The RQL search command by which the learning is to be guided (default #void). rqlCommand shorcut values: RQL = "ArcNet" - Selects at random either ArcNetMin, ArcNetMax, or ArcNetMux. RQL = "ArcNetFix" - Selects a fixed width complex RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetFix: target). RQL = "ArcNetFnd" - Selects a greedy RQL search for feature selection, as in the following: (arc.arcnetRQLDemo ArcNetFnd: target). RQL = "ArcNetLib" - Selects a complex RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetLib: target). RQL = "ArcNetMax" - Selects a complex RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetMax: target). RQL = "ArcNetMin" - Selects a simple RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetMin: target). RQL = "ArcNetMix" - Selects a simple RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetMix: target). RQL = "ArcNetMux" - Selects a complex RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetMux: target). RQL = "ArcNetSux" - Selects a complex RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetSux: target). RQL = "ArcNetMath" - Selects a very complex scientific math RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetMath: target). RQL = "ArcNetSMath" - Selects a very complex scientific math RQL search, including hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetSMath: target). RQL = "ArcIn" - Selects at random either ArcInMin, ArcInMax, or ArcInMux. RQL = "ArcInFix" - Selects a fixed width complex RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcNetFix: target). RQL = "ArcInLib" - Selects a complex RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInLib: target). RQL = "ArcInMax" - Selects a complex RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInMax: target). RQL = "ArcInMin" - Selects a simple RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInMin: target). RQL = "ArcInMix" - Selects a complex RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInMix: target). RQL = "ArcInMux" - Selects a complex RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInMux: target). RQL = "ArcInSux" - Selects a complex RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInSux: target). RQL = "ArcInMath" - Selects a very complex scientific math RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInMath: target). RQL = "ArcInSMath" - Selects a very complex scientific math RQL search, excluding hidden layer features, as in the following: (arc.arcnetRQLDemo ArcInSMath: target). RQL = ..else.. - Returns all other text as a user specified RQL search (for RQL programmers only).
functions:	functionList	(Optional) The list of functions to which learning is to be restricted i.e. "noop,,+,-,abs,inv,sin,cos,tan". functionList shorcut values:* functionList = #void - Selects the default function list, as in the following: (arc.generateFunctionList #void). functionList = "" - Selects the default function list, as in the following: (arc.generateFunctionList ""). functionList = "all" - Selects the default function list, as in the following: (arc.generateFunctionList "all"). functionList = "default" - Selects the default function list, as in the following: (arc.generateFunctionList "default"). functionList = "math" - Selects the major scientific function list, as in the following: (arc.generateFunctionList "math"). functionList = "nomath" - Selects the minor scientific function list, as in the following: (arc.generateFunctionList "nomath"). functionList = "nomath" - Selects the minor scientific function list, as in the following: (arc.generateFunctionList "nomath"). functionList = "safe" - Selects the safe function list, as in the following: (arc.generateFunctionList "safe"). functionList = "excel" - Selects the excel function list (which can be run in Excel), as in the following: (arc.generateFunctionList "excel"). functionList = "best" - Selects the best function list, as in the following: (arc.generateFunctionList "best"). functionList = "core" - Selects the core function list, as in the following: (arc.generateFunctionList "core"). functionList = "unary" - Selects the unary function list, as in the following: (arc.generateFunctionList "unary"). functionList = ..else.. - The exact list of functions to use, as in the following: "noop,+,-,cos,tan".
features:	featureList	(Optional) The list of features to which learning is to be restricted i.e. "x1,x4,x22,x56" - (default ""). Note: the wild card $vvi$ will be replaced with the Champion Feature list in the previous arcnetLayer record. Note: the wild card $HL$ will be replaced with the previous arcnetLayer name.
stop:	stop	(Optional) The minimum testing score at which to halt further learning - (default 0.0).
fitness:	fitness	(Optional) The fitness for this training run (cep, ecep, nmae, nlse, mad, user, or usec) - default (nlse).
mainSW:	mainSW	(Optional) The main fork switch (false) for all asynchronous forks and (true) for the main fork - (default true).
depth:	depth	(Optional) The minimum RQL expression depth of each weighted formula basis function - (default 4).
edepth:	edepth	(Optional) The extra RQL expression depth of each weighted formula basis function - (default 0).
bases:	bases	(Optional) The minimum number RQL weighted formula basis functions in each ArcNet layer (default 5).
ebases:	ebases	(Optional) The extra RQL weighted formula basis functions in each ArcNet layer (default 0).
regmax:	regmax	(Optional) The multiple regression maximum training block size (default 2000).
boost:	boostTYP	(Optional) The boosting strategy to use in each ArcNet layer (best, worst, support, random, none) - (default random).
boostsize:	boostSIZE	(Optional) The boosting training set size to use in each ArcNet layer (default 10000).
maxgens:	maxgens	(Optional) The ArcNet maximum generations to run for each arcnetLearn epoch. - default (3000).
mingens:	mingens	(Optional) The ArcNet minimum generations to run (without fitness improvement) for arcnetLearn epoch. - default (30).
weights:	weights	(Optional)The switch to turn on weight evolution in Regression, and Discriminant Analysis (makes weights more accurate but probabalistic as each learning run gets different weights). - default (false).
types:	userTypesFileName	(Optional) The file name of the user defined type rule by which the learning is to be restricted (default #void).
Returns	Always returns true unless an error occurs.

Notes: In general (other than the optional arcnetFileName argument), it is best NOT to use any of the optional arguments unless you are an ARC developer and understand what you are doing in detail.

Example

        ;; Run a single ARC training run on a classification test case.

        ;; Since no ArcNet file name is specified, a default ArcNet file will be automatically created.

        ;; Since only one fork, no repeats, and only one epoch are specified, this will be a network of only one layer (so not really a full network).

        ;; During the training run, console output will be visible.

        ;; When training is complete, these output files will appear in the working directory:

        ;; "TestCaseC08_ArcNet.sl"  - This is the ascii version of the actual ARC Algebra Network in AIS Lisp source code.
       
        ;; "TestCaseC08_ArcNet.js"  - This is the executable version of the actual ARC Algebra Network in AIS javaScript source code.
       
        ;; "TestCaseC08_ArcNet.db"  - This is the binary version of the actual ARC Algebra Network.
       
        ;; "TestCaseC08_ArcNet.txt" - This is human-readable text summary of the actual ARC Algebra Network.
       
        ;; "TestCaseC08_Output.csv" - This is the comma delimited output file with all output columns appended on the right.

        ;; 

        ;;               testName           trainingFileName               testingFileName               target        forks     epochs

        (arc.arcnetLearn name:"TestCaseC08" train:"TestCaseC08_Train.csv"  test:"TestCaseC08_Test.csv"   target:nary:  forks:1.0 epochs:1)                                                                                                                                      )

Example

        ;; Run a single ARC training run on a regression test case.

        ;; Since no ArcNet file name is specified, a default ArcNet file will be automatically created.

        ;; Since only one fork, no repeats, and only one epoch are specified, this will be a network of only one layer (so not really a full network).

        ;; During the training run, console output will be visible.

        ;; When training is complete, these output files will appear in the working directory:

        ;; "TestCaseT05_ArcNet.sl"  - This is the ascii version of the actual ARC Algebra Network in AIS Lisp source code.
       
        ;; "TestCaseT05_ArcNet.js"  - This is the executable version of the actual ARC Algebra Network in AIS javaScript source code.
       
        ;; "TestCaseT05_ArcNet.db"  - This is the binary version of the actual ARC Algebra Network.
       
        ;; "TestCaseT05_ArcNet.txt" - This is human-readable text summary of the actual ARC Algebra Network.
       
        ;; "TestCaseT05_Output.csv" - This is the comma delimited output file with all output columns appended on the right.

        ;; 

        ;;               testName           trainingFileName               testingFileName               target          forks      epochs

        (arc.arcnetLearn name:"TestCaseT05" train:"TestCaseT05_Train.csv"  test:"TestCaseT05_Test.csv"   target:numeric: forks:1.0  epochs:1)                                                                                                                                      )

Example

        ;; Train a another four layers on a previous regression test case.

        ;; We specify the ArcNet file name produced in the previous training run.

        ;; Since only five epochs are specified, this will be a network of five layers (including the original layer).

        ;; During the training run, console output will be visible.

        ;; When training is complete, these output files will appear in the working directory:

        ;; "TestCaseT05_ArcNet.sl"  - This is the ascii version of the actual ARC Algebra Network in AIS Lisp source code.
       
        ;; "TestCaseT05_ArcNet.js"  - This is the executable version of the actual ARC Algebra Network in AIS javaScript source code.
       
        ;; "TestCaseT05_ArcNet.db"  - This is the binary version of the actual ARC Algebra Network.
       
        ;; "TestCaseT05_ArcNet.txt" - This is human-readable text summary of the actual ARC Algebra Network.
       
        ;; "TestCaseT05_Output.csv" - This is the comma delimited output file with all output columns appended on the right.

        ;; 

        ;;               testName           trainingFileName               testingFileName              target          forks       epochs    ArcNetFile

        (arc.arcnetLearn name:"TestCaseT05" train:"TestCaseT05_Train.csv"  test:"TestCaseT05_Test.csv"  target:numeric: forks:1.0   epochs:5  net:"TestCaseT05_ArcNet.sl")                                                                                                                                      )

Example

        ;; Run a medium ARC training run on a classification test case on a computer with at least 8 cores availble.

        ;; Since no ArcNet file name is specified, a default ArcNet file will be automatically created.

        ;; Since 8 forks, no repeats, and only 10 epochs are specified, this will be a network of ten layers (each layer will have 40 algebra neuron factories learning).

        ;; During the training run, console output will be visible.

        ;; When training is complete, these output files will appear in the working directory:

        ;; "TestCaseC08_ArcNet.sl"  - This is the ascii version of the actual ARC Algebra Network in AIS Lisp source code.
       
        ;; "TestCaseC08_ArcNet.js"  - This is the executable version of the actual ARC Algebra Network in AIS javaScript source code.
       
        ;; "TestCaseC08_ArcNet.db"  - This is the binary version of the actual ARC Algebra Network.
       
        ;; "TestCaseC08_ArcNet.txt" - This is human-readable text summary of the actual ARC Algebra Network.
       
        ;; "TestCaseC08_Output.csv" - This is the comma delimited output file with all output columns appended on the right.

        ;; 

        ;;               testName           trainingFileName               testingFileName               target       forks     epochs

        (arc.arcnetLearn name:"TestCaseC08" train:"TestCaseC08_Train.csv"  test:"TestCaseC08_Test.csv"   target:nary: forks:8.0 epochs:10)                                                                                                                                      )

Example

        ;; Run a large ARC training run on a classification test case on a computer with no more than 10 cores availble.

        ;; Since no ArcNet file name is specified, a default ArcNet file will be automatically created.

        ;; Since we need each layer to have 200 algebra neuron factories learning, we will need to have 3 synchronous repeats.

        ;; Since 10 forks, 3 repeats, and 20 epochs are specified, this will be a network of 20 layers (each layer will have 200 algebra neuron factories learning).

        ;; During the training run, console output will be visible.

        ;; When training is complete, these output files will appear in the working directory:

        ;; "TestCaseC08_ArcNet.sl"  - This is the ascii version of the actual ARC Algebra Network in AIS Lisp source code.
       
        ;; "TestCaseC08_ArcNet.js"  - This is the executable version of the actual ARC Algebra Network in AIS javaScript source code.
       
        ;; "TestCaseC08_ArcNet.db"  - This is the binary version of the actual ARC Algebra Network.
       
        ;; "TestCaseC08_ArcNet.txt" - This is human-readable text summary of the actual ARC Algebra Network.
       
        ;; "TestCaseC08_Output.csv" - This is the comma delimited output file with all output columns appended on the right.

        ;; 

        ;;               testName           trainingFileName               testingFileName               target       forks       epochs

        (arc.arcnetLearn name:"TestCaseC08" train:"TestCaseC08_Train.csv"  test:"TestCaseC08_Test.csv"   target:nary: forks:10.3  epochs:10)                                                                                                                                      )

ARC Read Me File

The ARC system is currently undergoing heavy development. The ARC read me file provides the latest notes on changes and enhancements.

Introduction

Feature Discovery

White Box Features

Deep Learning Algebra Networks

White Box Features

Deep Learning Algebra Networks

ARC Brief Background

Linear Regression

NonLinear Regression

Neural Net Regression

Symbolic Regression

Grammatical Swarm Evolution

Combining All Of These into a Tool

arcnetLearn

Example

Example

Example

Example

Example

ARC Read Me File