Welcome to the Abstract Regression|Classification (ARC) README notes for ARC version 20230724 ;; READ ME ;; ************************************************************************************************************************************* ;; ;; Name: arcnetLearn ;; ;; Summary: ;; ARC (Abstract Regression-Classification) is a tool for automated hidden feature discovery. As such, ARC is a tool which ;; leverages the hidden feature discovery which is a part of every data scientists job description. Viewed in this manner, ;; every data science prediction project begins with a (spreadsheet-like) data table with a set of well known, independent, ;; easily obtained, highly visible input features (i.e. columns in the data table), and a single dependent target feature ;; which is to be predicted from the highly visible input features. Normally, the data scientist will select one or more ;; prediction tools (i.e. neural nets, decision trees, etc.), train them on the data, and obtain a prediction from each tool ;; with varying degrees of predictive accuracy. ;; ;; It is ARC's thesis that for every set of easily obtained, highly visible input features, there is a network of hidden features, ;; derivable via simple, easily understood computations on the visible features such that these hidden features enhance predictive ;; accuracy for ALL of the chosen prediction tools. ;; In the case of ARC, the data scientist applies ARC to the data table. It is ARC's job to automatically examine the data, automatically ;; discover the network of hidden features, and add each hidden feature (as another column) to the data set. This is all accomplished ;; without human intervention. Once the ARC run is complete, the data scientist can then apply the chosen predictive tools (i.e. neural ;; nets, decision trees, etc.), train them on the enhanced data, and obtain far more accurate predictions from ALL of the chosen tools. ;; ;; Background: ;; Some very impressive increases, in hidden feature discovery, predictive accuracy and training ease, have been achieved ;; by combining Deep Learning concepts with Genetic Programming. ARC networks take deep learning, once the exclusive domain ;; of neural networks, to a very impressive level of both accuracy and ease of training. ;; ;; ARC Deep Learning Networks are "evolved". ARC Networks are similar to feed-forward Neural Networks in some ways and different ;; in other ways. Like feed-forward neural networks, ARC networks accept multiple inputs which feed into the first "layer". ;; The outputs of each layer feed into the next layer and the next layer until the final layer becomes the output layer. ;; Like feed-forward neural networks, ARC networks can have an unlimited number of "layers". The big difference is that in ;; ARC networks each "layer" is a Hidden Feaure easily derrived from a set of simple computations on the previous layers. ;; ;; Another difference between neural networks and ARC networks is the technology used to evolve them. Neural networks are evolved ;; using several different connectionist algorithms including backprogagation, counterpropagation, RProp, amd others. ARC ;; networks are evolved using Genetic Programming which is also an evolutionary technology but one of a very different nature. ;; The differences in the fundamental evolutionary technologies used to grow each type of network leads to another very ;; important difference between ARC networks and neural networks - the output formulas from each "layer". ;; ;; Both feed-forward neural network layer outputs, and ARC network layer outputs are simple mathematical "weighted-sums" like the ;; simple formulas used in polynomials, multiple regression, linear discriminant analysis, and/or support vectors. A typical ;; example of a neural net layer output weighted-sum formula would be: ;; ;; y = tanh(sum(23.5 + (-4.234*input1) + (129.924*input2) + ... + (-5.8*inputN))) ;; ;; As one can easily see, each input feature is multiplied only by its "weight". The sum of all the weighted inputs is then ;; run through an "activation function" - in this typical case the hypertangent function (tanh). Neural nets were inspired ;; by the way neurons in the brain signal each other to produce computations. The difference between neural network layers ;; and ARC network layers is in the RESTRICTIONS on the weighted-sum formulas produced as output from each layer. For instance, ;; let us assume that we have training data for a simple regression problem whose solution is the following simple mathematical ;; weighted-sum formula: ;; ;; y = sum(-89.572 + (.678*(square(-23.8*input2)/cos(56.24*input3)))) ;; ;; The neural network to "solve" this problem will contain several layers of many intertwined tanh() formulas until this ;; regression formula's behavior is "simulated", with high accuracy, by the multiple neuron-like formuals from each layer. ;; Unfortunately while neural net formulas are brain inspired, they are also verbose. It takes a great many interwoven ;; tanh() formulas for neural networks to simulate most regression-classification problems. ;; ;; Therein lies the difference between neural network layer output formulas and ARC network feature output formulas. Each ARC network ;; output formula is a general mathematical weighted-sum formula (not restricted to any brain-inspired or biological-inspired format). ;; So a typical ARC network for the above regression problem would be a single feature whose output formula looks very familiar: ;; ;; y = sum(-89.572 + (.678*(square(-23.8*input2)/cos(56.24*input3)))) ;; ;; Yes exactly like the original version of the regression problem. ARC network feature output formulas are "general" weighted-sum ;; formulas without any restrictions. ARC network feature output formulas are much less verbose than neural networks and each feature ;; output formula is "human readable". ;; ;; There are three types of algebra neurons which can be trained/evolved/grown by ARC. These are regress(...) for numeric targets, ;; logit(...) for binary targets, and lda(...) for nary targets. Each of these neurons accept a series of possibly nonlinear inputs ;; and may be capped by an optional user specified activation function. ;; ;; Calling the arcnetLearn function, with the following arguments, will evolve an ARC Deep Learning Network to best fit the training ;; data you supply. Evolving an ARC Network is a one shot process. Unlike deep learning neural networks, ARC Network training always ;; converges on a best solution in finite time (with 2019 laptop computers approximately several days to one week running in the background ;; even for the toughest problems). When you notice, from the ARC console log diplays, that the system is in backwardation and no better ;; solutions are forthcoming, it is time to stop the learning process. Running multiple arcnetLearn training runs will not improve ;; the final solution. One arcnetLearn training run per problem is all you need. ;; ;; Args: ;; Note: Arguments come in pairs starting with argument name followed by argument value as follows: ;; ;; inputs: name:testName - The prefix for all output files for this deep learning training-testing run. ;; train:trainingFileName - The name of the training file. ;; target:target - The target column description (numeric, capgains, binary, nary, or neven). ;; cpu:cpus - The number of cpus to use for training - each cpu creates a synchronous training run called a "fork" (defaults to 1). ;; repeat:repeats - The number of repeats to use for each training - each repeat creates a serial training run called a "repeat" (defaults to 1). ;; epochs:epochs - The number of epochs before training halts. ;; rowID:rowID - (Optional) Does the data have row ids in the first column (true, false) (default false). ;; net:arcnetFileName - (Optional)The existing ARC network file name whose training is to be continued (true iff a network file is to be constructed from the TestName) (default is #void). ;; rql:rqlCommand - (Optional)The RQL search command by which the learning is to be guided (default #void). ;; functions:functionList - (Optional)The list of functions to which learning is to be restricted i.e. "+,-,*,/,maximum,minimum" (default is "all"). ;; features:featureList - (Optional)The list of features to which learning is to be restricted i.e. "x1,x4,x22,x56" (default is "all"). ;; stop:stop - (Optional)The maximum testing score at which to halt further learning (default 0.0). ;; fitness:fitness - (Optional)The fitness for this training run {cep ecep roc roca mad nlse nmae user or usec} (default=#void then arcnetLearn matches optimally with target choice). ;; mainSW:mainSW - (Optional)The main fork switch which is false for asynch out of process forks (default true). ;; depth:depth - (Optional)The minimum depth of each ArcNet node (default 3). ;; edepth:edepth - (Optional)The extra depth of each ArcNet node (default 0). ;; bases:bases - (Optional)The minimum number of expressions in each ArcNet feature (default is maximum 50 for nary, 250 for numeric, 250 for capgains, and 250 for binary). ;; ebases:ebases - (Optional)The exta number of nodes in each ArcNet feature (default 0). ;; regmax:size - (Optional)The multiple regression maximum training block size (default 2000). ;; boost:option - (Optional)The boosting option ("none", "best", "worst", "support", "random" - default "random") ;; boostsize:size - (Optional)The boosting training set size - (default 5000). ;; quick:boolean - (Optional)The ArcNet run standard strategy in quicl mode switch - default (false). ;; maxhrs:count - (Optional)The ArcNet maximum number of hours to run for each epoch (will halt evolution even if maximum generations have not been reached) - default (100). ;; maxgens:count - (Optional)The ArcNet maximum generations to run for each epoch. - default (25). ;; mingens:count - (Optional)The ArcNet minimum generations to run (without fitness improvement) for each epoch. - default (25). ;; weights:weights - (Optional)The switch to turn on weight evolution in Regression, and Discriminant Analysis (makes weights more accurate but probabalistic as each learning run gets different weights). - default (true). ;; wtmin:minvalue - (Optional)The minimum absolute value below which a weight, AND its associated expression, will be eliminated from the expression. ;; wtmaxgens:count - (Optional)The maximum number of generations to train weights for each WFF candidate during training. - default (100). ;; wtendgens:count - (Optional)The maximum number of generations to train weights for each WFF candidate during the final End Of Run period. - default (5000). ;; wtmingens:count - (Optional)The minimum generations to train weights (without fitness improvement) for each WFF candidate. - default (5000). ;; wtrepeats:count - (Optional)The maximum number of repititions to train weights for each WFF candidate. - default (4). ;; act:"function" - (Optional)The neuron activation function to cap all regress(), logit(), and lda() trained algebra neurons. - default (none). ;; types:userTypesFileName - (Optional)The file name of the user defined type rule by which the learning is to be restricted (default #void). ;; Return: testingScore - Always returns the final ArcNet testing score. ;; ;; ;; Argument Explainations and Notes: ;; ;; testName Select a unique and "meaningful" test name. ARC produces several output files when evolving the ARC Network. ;; These output files will appear in your working directory and they are as follows: ;; ;; "testName_ArcNet.sl" - This is the ascii version of the actual ARC Network in AIS Lisp source code. ;; "testName_ArcNet.js" - This is the executable version of the actual ARC Network in AIS javaScript source code. ;; "testName_ArcNet.db" - This is the binary version of the actual ARC Network. ;; "testName_ArcNet.txt" - This is human-readable text summary of the actual ARC Network. ;; "testName_Champions.sl" - This is executable Directory of the current expressions and their scores after the latest training generation. ;; "testName_BestOfBreed.sl" - This is executable Directory of the current expressions and their scores after the latest training epoch. ;; "testName_Output.csv" - This is the comma delimited output file with all output columns appended on the right. ;; ;; trainingFileName The name of the training file to be supplied for this ARC network training run. The training data MUST be ;; preprocessed before input to ARC. There can be ONLY NUMERIC data. Columns containing labels should have the labels ;; translated to contiguous integers starting with zero (0) and with NO gaps. For instance California = 0, Arizona = 1, ;; Nevada = 2, etc. Keep the California, Arizona, and Nevada labels in a separate dictionary file for yourself. Leave ;; only the 0, 1, 2 integers in the training data supplied to ARC. For best training results, the range of all numeric ;; data should be within [+-10e15]. The maximum number of columns is 1000. The number of columns times the number of ;; rows (row*columns) MUST be less than 250 million. ARC will accept tab delimited or comma delimited (.csv) training data. ;; If the file suffix is NOT .csv, ARC will assume that the training data is tab delimited. ;; ;; rowID If (true) the left most column is a row ID (must be ONLY NUMERIC) and will NOT be included in the training. ;; ;; target If (numeric), the right most target column is numeric with inherent sort order (preferably in the range [-10e15 to +10e15]). ;; If (capgains), the right most target column is numeric with inherent sort order in the range from [-100% to 1000000000%]. ;; If (binary), the right most target column is the integers 0 and 1 without inherent sort order where classify errors should be proportionately distributed among all classes. ;; If (nary), the right most target column is discrete contiguous integers (0, 1, thru N) without inherent sort order where classify errors should be proportionately distributed among all classes. ;; If (neven), the right most target column is discrete contiguous integers (0, 1, thru N) without inherent sort order where classify errors need not be proportionately distributed among all classes. ;; from 0 to N (WITH NO GAPS) for some finite N (preferably where N is 10 or less but N can be as high as 100).;; ;; ;; net The existing ARC network file name whose training is to be continued (default is #void). ;; If the net argument is void, then training will begin from scratch at epoch 0 and a new ARC network file will be created from scratch, using the TestName argument, as in "TestName_ArcNet.sl". ;; If the net argument is true, then training will be continued from the following contructed ARC network name - "TestName_ArcNet.sl". ;; If the net argument is a file name, then training will be continued from the specified ARC network file. ;; ;; cpu The computer cpu resources to be used in training. Must be an integer from 1 to N indicating the number of separate ;; CPU machine learning runs to run simultaneously in each training epoch. Each of these will run on a separate ;; cpu SYNCHRONOUSLY called a "fork". If a multiple cpu computer is not available, this should be cpu = 1 (the default). ;; Note: ARC uses Genetic Programming (GP) for machine learning. GP is a statistical technology which learns with increasing ;; accuracy as the number of machine learning "runs" increases. The number of runs per training epoch is the number of cpu ;; "forks" allocated times the number of repeats (see the repeats argument). The number of cpu allocations should be less ;; than the number of "threads" available on your workstation's cpu (some threads will need to be reserved to run the operating system). ;; Memory requirments are 10G RAM to run the slimmed down Console version of Arc. While 14G RAM is required to run the Developer's IDE ;; version of Arc. Each additional "fork" will require one cpu "thread" and 6G RAM. Therefore ... ;; On a workstation with 64G RAM and with 14 threads available running the slimmed down Console version of Arc, one can reasonably run with cpu:10. ;; On a workstation with 128G RAM and with 32 threads available running the slimmed down Console version of Arc, one can reasonably run with cpu:20. ;; On a workstation with 256G RAM and with 56 threads available running the slimmed down Console version of Arc, one can reasonably run with cpu:42. ;; ;; repeats The number of ASYNCHRONOUS machine learning repeatitions to run ASYNCHRONOUSLY in each training epoch (default is 1). ;; Note: ARC uses Genetic Programming (GP) for machine learning. GP is a statistical technology which learns with increasing ;; accuracy as the number of machine learning "runs" increases. The number of runs per training epoch is the number of cpu ;; "forks" allocated times the number of repeats. ;; For instance, assuming that you feel 50 training runs are necessary to solve this problem with high accuracy AND ;; your computer has at least 50 cpus available, then cpu = 50 will suffice. However, what if your computer only has 10 cpus? ;; You still need to apply 50 runs per epoch (the problem hasn't gotten easier just because resources are lacking). In this case ;; cpu = 10, and repeats=5. Each epoch will start with 10 synchronous runs each on their own cpu. Then the training will repeat 5 times ;; for a total of 50 runs per training epoch. It is not as fast as running 50 cpus; but it gets the job done with the resources available. ;; ;; epochs The number of epochs to be applied for this training. Each ARC epoch MAY produce a new feature in the ARC Network if the ;; epoch produces an improvement over the previous feature. If no improvment is produced, ARC will proceed to the next epoch ;; and try again, leaving the ARC network uneffected. Training can be halted at any time and restarted at any time. ;; At some point the epochs will go into backwardation and simply not be able to produce any further improvments in ;; the model. When that happens, it is time to quit further training. ARC simply cannot achieve a better fit with current ;; technology. ARC network training is a one shot process guaranteed to converge in finite time. Running multiple ;; arcnetLearn training runs will not improve the final solution. One arcnetLearn training run per problem is all you need. ;; ;; arcnetFileName (Optional)The name of the ARC network file produced by previous training attempts ("testName_ArcNet.sl") (default #void or ""); ;; OTHERWISE if TRUE, then a default network file is to be constructed and no training is to take place. ;; ;; rqlCommand (Optional)The RQL search command by which the learning is to be guided (default ArcNetMin). ;; rqlSearch shorcut values: ;; RQL = "ArcNet" - Selects a full coverage RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNet: target epoch depth bases features). ;; RQL = "ArcNetFix" - Selects a fixed width complex RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetFix: target epoch depth bases features). ;; RQL = "ArcNetFnd" - Selects a greedy RQL search for feature selection, as in the following: (arc.arcnetRQLDemo ArcNetFnd: target epoch depth bases features). ;; RQL = "ArcNetFull" - Selects a full coverage RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetFull: target epoch depth bases features). ;; RQL = "ArcNetLib" - Selects a complex RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetLib: target epoch depth bases features). ;; RQL = "ArcNetMax" - Selects a complex RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetMax: target epoch depth bases features). ;; RQL = "ArcNetMin" - Selects a simple RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetMin: target epoch depth bases features). ;; RQL = "ArcNetMix" - Alternates randomly between ArcNetMin, ArcNetMax, and ArcNetMux as in the following: (arc.arcnetRQLDemo ArcNetMix: target epoch depth bases features). ;; RQL = "ArcNetMux" - Selects a complex RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetMux: target epoch depth bases features). ;; RQL = "ArcNetMxx" - Selects a complex RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetMxx: target epoch depth bases features). ;; RQL = "ArcNetSux" - Selects a complex RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetSux: target epoch depth bases features). ;; RQL = "ArcNetMath" - Selects a very complex scientific math RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetMath: target epoch depth bases features). ;; RQL = "ArcNetSMath" - Selects a very complex scientific math RQL search, including hidden features, as in the following: (arc.arcnetRQLDemo ArcNetSMath: target epoch depth bases features). ;; RQL = "ArcENet" - Selects an elastic net search as in the following: (arc.arcnetRQLDemo ArcENet: target epoch depth bases features). ;; RQL = "ArcIn" - Selects a full coverage RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcIn: target epoch depth bases features). ;; RQL = "ArcInFix" - Selects a fixed width complex RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcNetFix: target epoch depth bases features). ;; RQL = "ArcInFull" - Selects a full coverage RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInFull: target epoch depth bases features). ;; RQL = "ArcInLib" - Selects a complex RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInLib: target epoch depth bases features). ;; RQL = "ArcInMax" - Selects a complex RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInMax: target epoch depth bases features). ;; RQL = "ArcInMin" - Selects a simple RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInMin: target epoch depth bases features). ;; RQL = "ArcInMix" - Alternates randomly between ArcInMin, ArcInMax, and ArcInMux as in the following: (arc.arcnetRQLDemo ArcInMix: target epoch depth bases features). ;; RQL = "ArcInMux" - Selects a complex RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInMux: target epoch depth bases features). ;; RQL = "ArcInMxx" - Selects a complex RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInMxx: target epoch depth bases features). ;; RQL = "ArcInSux" - Selects a complex RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInSux: target epoch depth bases features). ;; RQL = "ArcInMath" - Selects a very complex scientific math RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInMath: target epoch depth bases features). ;; RQL = "ArcInSMath" - Selects a very complex scientific math RQL search, excluding hidden features, as in the following: (arc.arcnetRQLDemo ArcInSMath: target epoch depth bases features). ;; RQL = ..else.. - Treats all other text as a user specified RQL search (for RQL programmers only). ;; ;; functionList (Optional)The list of functions to which learning is to be restricted i.e. "+,-,*,/,maximum,minimum" OR "noop,*,abs,inv,sin,cos,tan" - (default "all"). ;; functionList shorcut values: ;; functionList = #void - Selects the default function list, as in the following: (arc.generateFunctionList #void). ;; functionList = "" - Selects the default function list, as in the following: (arc.generateFunctionList ""). ;; functionList = "all" - Selects the default function list, as in the following: (arc.generateFunctionList "all"). ;; functionList = "default" - Selects the default function list, as in the following: (arc.generateFunctionList "default"). ;; functionList = "math" - Selects the major scientific function list, as in the following: (arc.generateFunctionList "math"). ;; functionList = "nomath" - Selects the minor scientific function list, as in the following: (arc.generateFunctionList "nomath"). ;; functionList = "safe" - Selects the safe function list, as in the following: (arc.generateFunctionList "safe"). ;; functionList = "fast" - Selects the fast function list, as in the following: (arc.generateFunctionList "fast"). ;; functionList = "excel" - Selects the excel function list (which can be run in Excel), as in the following: (arc.generateFunctionList "excel"). ;; functionList = "best" - Selects the best function list, as in the following: (arc.generateFunctionList "best"). ;; functionList = "core" - Selects the core function list, as in the following: (arc.generateFunctionList "core"). ;; functionList = "unary" - Selects the unary function list, as in the following: (arc.generateFunctionList "unary"). ;; functionList = ..else.. - The exact list of functions to use, as in the following: "noop,+,-,cos,tan". ;; ;; featureList (Optional)The list of features to which learning is to be restricted i.e. "x1,x4,x22,x56" - (default "all"). ;; Note: the wild card $vvi$ will be replaced with the Champion Feature list in the previous arcnetLayer record. ;; Note: the wild card $HL$ will be replaced with the previous arcnetLayer name. ;; ;; stop (Optional)The maximum testing score at which to halt further learning - (default 0.0). ;; ;; fitness (Optional)The fitness for this training run (cep, ecep, roc, roca, nmae, nlse, mad, user, or usec) - default (#void). ;; ;; mainSW (Optional)The main fork switch (false) for all asynchronous forks and (true) for the main fork - (default true). ;; ;; depth (Optional)The minimum RQL expression depth of each weighted formula basis function - (default 3). ;; ;; edepth (Optional)The extra RQL expression depth of each weighted formula basis function - (default 0). ;; ;; bases (Optional)The minimum number RQL weighted formula basis functions in each ArcNet feature (default is maximum 50 for nary, 250 for numeric, capgains, and binary). ;; ;; ebases (Optional)The extra RQL weighted formula basis functions in each ArcNet feature (default 0). ;; ;; regmax (Optional)The multiple regression maximum training block size (default 2000). ;; ;; boost (Optional)The boosting option ("none", "best", "worst", "support", "random" - default "random"). ;; ;; boostsize (Optional)The boosting training set size (default 5000). ;; ;; quick (Optional)The ArcNet run standard strategy in quicl mode switch - default (false). ;; ;; maxhrs:count (Optional)The ArcNet maximum number of hours to run for each epoch (will halt evolution even if maximum generations have not been reached) - default (100). ;; ;; maxgens (Optional)The ArcNet maximum generations to run for each arcnetLearn epoch. - default (25). ;; ;; mingens (Optional)The ArcNet minimum generations to run (without fitness improvement) for arcnetLearn epoch. - default (25). ;; ;; weights (Optional)The switch to turn on weight evolution in Regression, and Discriminant Analysis (makes weights more accurate but probabalistic as each learning run gets different weights) - default (true). ;; ;; wtmin (Optional)The minimum absolute value below which a weight, AND its associated expression, will be eliminated from the expression. ;; ;; wtmaxgens (Optional)The maximum number of generations to train weights for each WFF candidate. - default (100). ;; ;; wtendgens (Optional)The maximum number of generations to train weights for each WFF candidate during the final End Of Run training period. - default (5000). ;; ;; wtmingens (Optional)The minimum generations to train weights (without fitness improvement) for each WFF candidate. - default (5000). ;; ;; wtrepeats (Optional)The maximum number of repititions to train weights for each WFF candidate. - default (4). ;; ;; act (Optional)The neuron activation function to cap all regress(), logit(), and lda() trained algebra neurons. - default (none). ;; ;; userTypesFileName (Optional)The file name of the user defined type rule by which the learning is to be restricted (default #void). ;; ;; *********************************************** ;; *********************************************** ;; Notes for Beginner ARC Programmers: ;; *********************************************** ;; *********************************************** ;; ;; OVERVIEW ;; ;; Algebraic Regression Classification (ARC) tool is an evolutionary machine learning tool for evolving complex white box algebraic formulas which seek to "explain" or "predict" ;; a single "target" variable which is said to be "dependent" upon a row of other variables called "features". For every single target variable there is a row of features which ;; hopefully can be used to predict the target variable using the formula evolved by the ARC tool. ;; ;; ARC is "trained" by inputting a .csv file of numbers with rows of features in the leftmost columns and a target variable in the rightmost column. The first column may or may not ;; a row identifier not used in the training. The maximum number of feature columns allowed is 1000. The maximum number of rows times columns allowed is 250,000,000. Each feature should be a number ;; in the range [-10e15 to 10e15]. The target may be "numeric" floating point numbers in the range [-10e15 to 10e15], or "capgains" floating point numbers in the range [-100% to 1000000000%], ;; or "binary" integer only values of [0 or 1], or "nary" integer only with values in the range [0 1 2 3 ...]. ;; ;; ARC uses the rows of input features to find patterns which explain each associated target feature and evolves a complex white box algebraic formula which predicts the target variable ;; from the input features. Each training run is given a "TestName" by the user. The evolved algebraic formula is output in several ways. ;; ;; TestName_ArcNet.js An AIS Javascript source file containing the evolved formula as a stand alone AIS Javascript program. ;; TestName_ArcNet.sl An AIS Lisp source file containing the evolved formula as a stand alone AIS Lisp Network file. ;; TestName_Output.csv A .csv file containing the original input features, the original target variable, and the predicted target variable in each row. ;; TestName_ArcNet.txt An ASCI text file containing an analysis of the ARC training run and the evolved forumla. ;; ;; ARC is an evolutionary machine learning system employing Genetic Programming and Swarm Intelligence to evolve its formulas. ARC is inherently probabalistic, meaning each training run ;; evolves a different answer. Getting optimal results from your ARC training run requires repeated training runs and collecting the best result from all of the repeated training runs. ;; ARC allows "forks" for repeating training runs on multiple CPUs within your computer simultaneously. If your computer has only one CPU, ARC allows 'repeats" for repeating training runs ;; on your computer's single CPU synchronously. After running all of the forks and all of the repeats, ARC will output the best formula discovered. More runs will increase the training accuracy, ;; ;; REPEATS, FORKS, & ISLANDS ;; ;; The following ARC command runs twenty runs on the standard test suite file "TestCaseT45". It will utilize 20 CPUs on your current computer. Each fork will run simultaneously, ;; requiring approximately 8G of memory per fork, and the whole process will require approximately 160G of main memory. ;; ;; (arc.arcnetLearn name:"TestCaseT45" train:"TestCaseT45_Train.csv" target:numeric: cpu:20 repeats:1 epochs:1 bases:25 depth:4) ;; ;; Alternatively, supposing your current computer only has 5 CPUs and 50G of main memory available, the following ARC command runs four repeats on the standard test suite file "TestCaseT45". ;; It will utilize 5 CPUs on your current computer plus 40G of main memory, and will take 4 times as long as the previous command. ;; ;; (arc.arcnetLearn name:"TestCaseT45" train:"TestCaseT45_Train.csv" target:numeric: cpu:5 repeats:4 epochs:1 bases:25 depth:4) ;; ;; ;; EPOCHS, LAYERS & NETWORKS ;; ;; The following ARC command runs twenty repeats on the standard test suite file "TestCaseT45" but builds an algebra network of 10 layers. It will utilize 20 CPUs on your current computer. ;; Each repeat will run simultaneously, and the whole process will require about 160G of main memory. Since there are 10 epochs requested, it will take 10 times as long as the first ;; command but will be more accurate - potentially. ;; ;; (arc.arcnetLearn name:"TestCaseT45" train:"TestCaseT45_Train.csv" target:numeric: cpu:20 epochs:10 bases:25 depth:4) ;; ;; After completing each epoch, the above command will save the trained network up to the completed epoch (network feature). So if the computer goes down or needs to be halted, one can ;; resume training at any epoch boundry with the following command. ;; ;; (arc.arcnetLearn name:"TestCaseT45" train:"TestCaseT45_Train.csv" target:numeric: cpu:20 epochs:10 bases:25 depth:4 net:"TestCaseT45_ArcNet.sl") ;; ;; If one prefers to vary the number of bases in each algebra network feature to increase the network variation, the following command will vary the bases between 5 and 25 at random for each feature, ;; and will vary the node depth between 2 and 5 for each feature. ;; ;; (arc.arcnetLearn name:"TestCaseT45" train:"TestCaseT45_Train.csv" target:numeric: cpu:20 epochs:10 bases:5 ebases:20 depth:2 edepth:3 rql:"ArcNet") ;; ;; ARC can be used to run a trained Arc network file ("testName_ArcNet.sl") to predict the output column on an input data set of the same format as the original training and testing data, ;; as accomplished with the arcnetRun function. ;; ;; summary: Runs a trained ARC network file on the specified input data file and produces a predictive output file. ;; ;; Args: ;; Note: Arguments come in pairs starting with argument name followed by argument value as follows: ;; ;; inputs: net:arcnetFileName - The existing ARC network file name whose training is to be used to predict. ;; in:inputFileName - The name of the input file from which predictions are to be made. ;; out:outputFileName - The name of the output file containing the predictions. ;; Return: true - Always returns true. ;; Example: ;; (arc.arcnetRun net:"MyTrained_ArcNet.sl" in:"inputFileName.csv" out:"outputFileName.csv") ;; ;; ARC currently defaults to simple feed-forward ARC networks which it writes to the Lisp source file ("testName_ArcNet.sl"). ;; More complex convoluted networks are supported by editing the ("testName_ArcNet.sl") before starting the machine learning run. ;; A default Arc Network file ("testName_ArcNet.sl"), for later editing, can be obtained with the following function: ;; ;; (arc.arcnetDefineNet testName trainingFileName target fitness epochs depth edepth bases ebases boost functionList featureList) ;; ;; *********************************************** ;; *********************************************** ;; Notes for Advanced RQL Programmers: ;; *********************************************** ;; *********************************************** ;; ;; ========================================= ;; Regression Query Language (RQL) Overview: ;; ========================================= ;; ;; Regression Query Language RQL is a high level Symbolic Regression ;; search language, and consists of one or more search clauses which together ;; make up a symbolic regression request. There are three possible formats for ;; search clause as follows: ;; ;; search book['search','search',...,'search'] ;; ;; search lib[name:search] ;; ;; search island trainer(expression) where {config(...) fitness{nlse) op(noop,+,*) ...etc...} ;; ;; There can be one and only one search clause of the format ;; ;; search book['search','search',...,'search'] ;; ;; and this creates a book of queued RQL search specifications for ;; use whenever there is space to create a new independent search island. ;; ;; ;; There can be multiple search clauses of the format ;; ;; search lib[name:search] ;; ;; Eahc of these defines named search specification to be added to ;; the RQL Library for use whenever there is a demand for the specific ;; named RQL search. ;; ;; *Notes for lib[] clauses -- lib[name:search] ;; Library clauses are island search specifications which become active anytime an epoch, pareto, or smart breeder completes its current search and requires a new search by name. ;; The Library search specifications MUST contain a where clause. ;; The Library search clause may contain substitute characters such as, $c0$, $v1$, $f2$, $w1$, $w[n,m]$, $a$, $b$ which replace the values of those variables into the search string ;; The Library search clause may contain substitute characters such as, %c0%, %w1%, %a%, %b% which replace the signs ("+" or "-") of those variables into the search string ;; ;; ;; There can be multiple search clauses of the format ;; ;; search island trainer(expression) where {config(...) fitness{nlse) op(noop,+,*) ...etc...} ;; ;; represents an independent evolutionary "island" in which a separate ;; symbolic regression search is performed. ;; ;; It is assumed that the best (most fit) champions, ever seen, from each ;; independent search island will be accumulated into a national champion island ;; which holds the final list of champions from which the best champion will ;; become the answer to the entire search process. ;; ;; Every search island must have a goal. The search goal is composed of a ;; trainer declaration enclosing an abstract expression list of some kind. ;; The search goal specifies the regression|classification trainer to be ;; used and the abstract expression lists to be searched. If time permitted ARC ;; would perform the specified strategy on every single concrete expression ;; implied from the abstract expression list, returning the most fit concrete ;; expression, for the specified trainer, as the answer to the search. ;; Of course, for most search goals there is not enough time in the universe to ;; brute force search every possiblility. So ARC uses its special search heurisms ;; to search as many possibilities as time permits, returning the best candidate ;; as the answer to the search. ;; ;; For example, a common goal is regress(universal(3,1,t)) which searches all ;; single (1) regression champions from all possible basis functions of depth (3) ;; where the terminals are both (t) variables (containing features) or abstract ;; constants (containing real numbers). ;; ;; Another search goal example might be lda(f0(v0,f1(v1,c0))) which searches for ;; all possible linear discriminants on function with two arguments where the ;; second argument is also a function with two arguments, the second of which is a ;; constant. The abstract function variables f0 thru fk are meant to contain one ;; concrete function unless otherwise constrained. The abstract feature variables ;; v0 thru vj are meant to contain one concrete feature from the set x0 thru xm ;; unless otherwise constrained. The abstract constant variables c0 thru ci are meant ;; to contain one real number unless otherwise constrained. ;; ;; The constraints, located anywhere after the where keyword, are in the form of ;; limitations on variable and function variable coverage such as f0(cos,sin,tan,tanh) ;; or v0(x0,x3,x10) or c0(3.45). ;; ;; ;; ============================== ;; Very Brief RQL Syntax Outline: ;; ============================== ;; ;; *comments ;; // ...comments ... ;; Optional comments lines begin with the double slash characters ;; ;; *context advisories ;; context fitness(....) ;; Optional keyword advisory indicating the default fitness for the entire search (default is first fitness mentioned in any island). ;; context stop(....) ;; Optional keyword advisory indicating the default epoch stop for the entire search (default is 50). ;; context regmax(....) ;; Optional keyword advisory indicating the maximum regression data row size for the entire search (default is 2000). ;; ;; *search RQL Play Book syntax ;; search book['search','search',...,'search'] ;; Defines the play book of alternate island search specifications which MUST include where clauses in each play book search specification ;; ;; ;; *search RQL Library syntax ;; search lib[name:search] ;; Defines a named RQL search specification in the RQL Library which MUST include where clauses in each library search specification ;; ;; ;; *search island syntax ;; search island trainer(..expressions..) where {..constraints..}; ;; Must be next clause after search keyword (or first clause if search keyword omitted) ;; search trainer(..expressions..) where {..constraints..}; ;; Must be next clause after search keyword (or first clause if search keyword omitted) ;; island trainer(..expressions..) where {..constraints..}; ;; Must be next clause after search keyword (or first clause if search keyword omitted) ;; ;; *General modeling (not a neuron) search trainer declarations (General experimental algebra expression regression|classification formulas) ;; model(...) ;; General abstract as is basis function evaluation and regression or classification fitness scoring ;; ;; *Regression neuron search trainer declarations (Specialized algebra regression neurons will be capped by user specified activation function) ;; regress(...) ;; Single, multiple, and ridge regression neuron declaration (with computed axis alpha and beta constants) ;; ;; *Classification neuron search trainer declarations (Specialized algebra classification neurons will be capped by user specified activation function) ;; logit(...) ;; Logit regression binary classification neuron declaration (axis alpha and multiple coefficienta constants computed for the single multivariate discriminant function) ;; lda(...) ;; Multivariate nonlinear discriminant function multi-class classification neuron declaration (axis alpha and multiple coefficienta constants computed for the single multivariate discriminant function) ;; ;; ...some goal expression syntax examples as follows ;; regress(f0(v0,c0)) ;; Single regression with hand coded goal formula basis function expression ;; regress(f0(v0,c0),f1(v1,c1)) ;; Multiple regression with hand coded goal formula basis function expressions ;; regress(univariate(node-depth,base-functions,v|t|x|xn)) ;; Single regression with machine generated univariate goal formula basis function expression ;; regress(universal(node-depth,base-functions,v|t|x|xn)) ;; Single/Multiple regression with machine generated universal goal formula basis function expression ;; regress(weighted(node-depth,base-functions,n|h|s)) ;; Single/Multiple regression with machine generated percent weighted goal formula basis function expression ;; model(f0(v0,c0)) ;; Single regression or classification with hand coded goal formula basis function expression (with no axis constant) ;; lda(f0(v0,c0),f1(v1,c1),f2(v2,c2))) ;; Multi-class nonlinear classification neuron declaration (axis alpha and beta constants computed for each discriminanat function). ;; logit(f0(v0,v1)) ;; Binary classification between two categories (0,1) using two non-linear discriminant functions. ;; ;; ;; *expressions ;; x0 thru xm ;; Concrete features (independent variables in the regression) ;; y ;; Concrete features (dependent variable in the regression) ;; 39.261 ;; Concrete real number constant ;; +,-,*,^-, ;; Concrete arithmetic binary operators ;; /,^/ ;; Modified arithmetic binary operators ;; <,<=,==,!=,>=,> ;; Concrete relational binary operators ;; abs,square,cube,quart,exp,curoot ;; Concrete arithmetic unary function operators ;; inv,ln,sqroot,quroot ;; Modified arithmetic unary function operators ;; noop,binary,sig,sign,argmax ;; Concrete special function operators ;; cos,sin,tan,tanh ;; Concrete trigonometric unary function operators ;; average,maximum,minimum,product,summarize ;; Concrete aggregate function operators ;; psqrt,psquare,pcube,pquart ;; Modified binary polynomial operators ;; lif,lor,land ;; Special logical operators ;; ? : ;; (Deprecated) Conditional operators with syntax of ((v0 (minimum v0 x2 v2) ;; : ;; (Deprecated) Concrete special right pass through function operator ;; v0 thru vj ;; Abstract feature variables (each variable contains one of x0 thru xm unless otherwise constrained) ;; f0 thru fk ;; Abstract function variables (each variable contains one of the concrete functions unless otherwise constrained) ;; r0 thru rn ;; Abstract relational function variables (each variable contains one of the concrete relational functions unless otherwise constrained) ;; c0 thru ci ;; Abstract constant variables (each variable contains one concrete real number unless otherwise constrained) ;; t0 thru tl ;; Abstract term variables (each variable contains v or c unless otherwise constrained) ;; ;; *Where clause ;; where {} each where clause initiates its own island population within a search clause ;; champion(strategy,popsize,poolsize,geneticOps,constantOps,reduce,finalWeight,mvlMinWeight,burstOps) ;; Defines National island specifications ;; champion(noop,100) ;; Defines the number of champion survivors (default is 10). ;; config(breeder,strategy,popsize,poolsize,serialOps,geneticOps,constantOps,epochGens,burstOps,evolveMutateConstants,burstConstants,epochRandomOps,randomOps) ;; Island specifications, default ==> config(smart,standard,10,25,0,10,25) (Note evolveMutateConstants = "nc" if NO constants are to be serialized or evolved outside of burst mode, and "ec" if burst constants ARE to be serialzed and evolved after burst) (Note burstConstants = "nc" if NO constants are to be burst optimized, and "bc" if ALL constants are to be optimized in burst mode) ;; fitness(fitnessChoice,maxErr) ;; Defines the fitness measure for the search in the specified island, default ==> nlse ;; op(noop,+,-,*,/) ;; Defines function choice specifications, default ==> op(noop,+,-,*,/,^-,^/,abs,binary,cube,exp,inv,ln,quart,sig,sign,sqroot,square,cos,sin,tan,tanh,average,maximum,minimum,product,summarize,pcube,psqrt,psquare,pquart) ;; rop(<,>=,!=) ;; Defines relational function choice specifications, default ==> rop(<,<=,==,!=,>=,>) ;; vv(x3,x4,x10) ;; Defines variable choice specifications ;; vvi(x3,x4,x10) ;; Defines abstract variable initialization specifications ;; f0(+,cos,square) ;; Constrains choices for a specific function slot ;; ffi(+,cos,square) ;; Defines abstract function initialization specifications ;; r0(==,<=) ;; Constrains choices for a specific relational function slot ;; v0(x1,x3,x5) ;; Constrains choices for a specific variable slot ;; bc(c1,c5,c6) ;; Constrains which constant genes receive burst evolution at birth ;; cc(-125.0,125.0,.01,-1.0,1.0,1.3) ;; Constrains all unspecified constant to the specified low and High value range, serial search increment, random initialize low and high range, and initial value (low,high,serialInc,initLow,initHigh,initValue). WARNING: each entry must be an Integer or Number constant or numeric javaScript expression. ;; c0(-125.0,125.0,.01,-1.0,1.0,1.3) ;; Constrains the specific constant to the specified low and High value range, serial search increment, random initialize low and high range, and initial value (low,high,serialInc,initLow,initHigh,initValue). WARNING: each entry must be an Integer or Number constant or numeric javaScript expression. ;; c0(3.159) ;; Constrains the specific constant to the specified initial value with default low and High value range, serial search increment, random initialize low and high range. WARNING: each entry must be an Integer or Number constant or numeric javaScript expression. ;; cci(23.5,-2.9,106.0) ;; Defines abstract constant initialization specifications ;; eb(b1,b5,b6) ;; Constrains which basis functions evolve ;; ef(f1,f5,f6,b2) ;; Constrains which function genes evolve (b2 turns on all genes in basis function 2) ;; ev(v1,v5,v6,b2) ;; Constrains which variable genes evolve (b2 turns on all genes in basis function 2) ;; ec(c1,c5,c6,b2) ;; Constrains which constant genes evolve (b2 turns on all genes in basis function 2) ;; et(t1,t5,t6,b2) ;; Constrains which term genes evolve (b2 turns on all genes in basis function 2) ;; cat(x1,x3,x5) ;; Declares the specified concrete features to be categorical independent variables ;; tti(0,1,1,1) ;; Defines abstract term initialization specifications ;; delay(gen) ;; Causes the breeder to delay processing until AFTER the specified generation (allows gradual evolutionary load initialization) ;; link(label,libKey) ;; Causes the breeder, every new generation, to receive the top champion from the specified islands. 'label', and convert that champion according to the specified library template 'libKey' - templates MAY contain a where clause. ;; isolate(true) ;; If true, champions from this island are NOT to be promoted to the national island ;; forest(cart) ;; if present, all genome basis functions generated by the cart() macro will be optimized, prior to scoring, by the cart splitting algorithm, NO categoric variables, NO global cart initialization. ;; forest(cart,cat) ;; if present, all genome basis functions generated by the cart() macro will be optimized, prior to scoring, by the cart splitting algorithm, YES categoric variables, NO global cart initialization. ;; forest(cart,cat,init) ;; if present, all genome basis functions generated by the cart() macro will be optimized, prior to scoring, by the cart splitting algorithm, YES categoric variables, YES global cart initialization. ;; kernel(kerneID) ;; The kernel ID of the support vector machine kernel for this island (binary bipolar composite cosine cube euclid exp linear log poly quart quint radialBasisKernel sigmoid sine square tan tanh) ;; name(label,label,...,label) ;; The current island is given the specified names (islands may have multiple names and names need not be unique) ;; reduce(true) ;; If true final champions from the current island are subjected to reduction mutation ;; reset(false) ;; If false champions from the current island are NOT erased BEFORE the new play book search is started ;; type(typeName,'feature:expression',typeName,'feature',...) ;; Declares and forces type checking for all valid candidates i.e. type(widget,'x1;x2;x3;(widget+widget);(widget-widget);(widget*widget);(widget/widget)',screw,'x11;x12;x13;(screw+screw);(screw-screw);(screw*screw);(screw/screw)') ;; type('filename') ;; Declares and forces type checking for all valid candidates i.e. type(widget,'x1;x2;x3;(widget+widget);(widget-widget);(widget*widget);(widget/widget)',screw,'x11;x12;x13;(screw+screw);(screw-screw);(screw*screw);(screw/screw)') ;; accept(typeName) ;; Declares the top level types which are acceptable to the typing system OR, if missing, all valid types are acceptable at the top level. ;; onfinal('search','search',...,'search') ;; Defines the onfinal conversion search templates to initiate on this Lambda just before the final island closing is performed - search templates must NOT contain a where clause. ;; ...onfinal special search: 'set(name,..text..)' ;; Defines the onfinal conversion special search template 'set(name,..text..)' which adds the specified ..text.. under the specified name to the RQL Library - this library search MAY containany useful text. ;; ...onfinal special search: 'play(name)' ;; Defines the onfinal conversion special search template 'play(name)' which adds the named RQL Library search to the play book - this library search MAY contain a where clause. ;; ...onfinal special search: 'log(...text...)' ;; Defines the onfinal conversion special search template 'log(...text...)' which displays the text, after wild card substitution, to the ARC search log. ;; ...onfinal special search: 'out(filename)' ;; Defines the onfinal conversion special search template 'out(filename)' which writes out the transformed champion basis function values and the dependent variable under the file names "filename_Train.csv" and "filename_Test.csv". ;; onscore(score,maxgens,'search',...,'search') ;; Defines the onscore search templates to initiate in this island if the specified fitness score is achieved by the current search - 'search' templates need NOT be included BUT if 'search' are included then they MUST contain a where clause. ;; stepwise(maxgens,'search',...,'search') ;; Defines the stepwise search templates to initiate in this island after the specified maximum generations or when the island terminates the current search - search templates MAY contain a where clause. ;; weight(0.0,1.0,1.0,0.0) ;; Defines the weights to use with the wcep fitness measure, i.e. weight(Est0_Act0,Est0_Act1,Est1_Act0,Est1_Act1) ;; weight(0,.0001,.01,1.0,.001,.001) ;; Defines the weights to use with the gini fitness measure, (the 0 class is good), the other five are the weights for gini, ks, ecep, badGoodratio, and goodBadratio. ;; weight(1,.0001,.01,1.0,.001,.001) ;; Defines the weights to use with the gini fitness measure, (the 1 class is good), the other five are the weights for gini, ks, ecep, badGoodratio, and goodBadratio. ;; seed('filename') ;; Defines a seed file (from a previous ARC training run) whose champions are to initiate this island at the start of the training run ;; gp(5,concrete) ;; (Deprecated) Defines genetic programming specifications (concrete, constants, features, or abstract) ;; ut(5) ;; (Deprecated) Defines universal programming specifications ;; ;; *User Specified Types in the where clause ;; ;; RQL supports the declaration of user specified type rules. ;; Violation of the user specified type rules prevents the ;; candidate Lambda from being scored. ;; ;; For example, the following ARC estimator type decoration, with ;; the specified user defined type rules, ... ;; ;; search regress(universal(1,1,t)) ;; where {fitness(nmae) ;; config(pareto,standard,256,25,00,10,10,100,10,ec,bc,10,2) ;; champion(standard,10,25,5,5,reduce,0.0000000001,0.0000000001) ;; op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,cos,sin,tan,tanh,+,-,*,/,maximum,minimum,<=,>=,lif,lor,land) ;; onscore(0.0,160) ;; type( ;; widget,'x1:x2:x3:(widget+widget):(widget-widget):(widget*widget):(widget/widget):max(widget,widget)', ;; widget,'(widget+Number):(widget-Number):(widget*Number):(widget/Number):max(widget,Number)', ;; widget,'(Number+widget):(Number-widget):(Number*widget):(Number/widget):max(Number,widget)', ;; screw,'x0:x4:x5:x6:x7:(screw+screw):(screw-screw):(screw*screw):(screw/screw):max(screw,screw)', ;; screw,'(screw+Number):(screw-Number):(screw*Number):(screw/Number):max(screw,Number)', ;; screw,'(Number+screw):(Number-screw):(Number*screw):(Number/screw):max(Number,screw)', ;; screwsPerWidget,'(screw/widget):(screwsPerWidget+screwsPerWidget):(screwsPerWidget-screwsPerWidget)', ;; screwsPerWidget,'(screwsPerWidget*screwsPerWidget):(screwsPerWidget/screwsPerWidget):max(screwsPerWidget,screwsPerWidget)', ;; Number,'(Number+Number):(Number-Number):(Number*Number):(Number/Number):max(Number,Number)' ;; ) ;; accept(widgit,screwsPerWidgit) ;; } ;; ;; For instance the above user specified type rules will prevent any candidate containing (x1*x4) anywhere in its ;; estimator formula. The reason is because (x1*x4) will return a type result of (widget*screw) and there is no rule ;; allowing widget to be multiplied by screw. So the Lambda will not be scored. ;; ;; The predefined type Number is used for real constants, and rules for Number must be included if constants are to mix with any user type. ;; ;; Type rule shortcuts allow a single rule to cover many similar operators: ;; ;; Abstract or Concrete constants (i.e. c3 or 45.23) := Number ;; Abstract of Concrete features (i.e. x2 or v1) := the type specified for the concrete feature ;; Returns false if ANY argument is false := lda logit model regress ;; ;; The following operator type combinations look for the following type rules: ;; Operator Type Combo Type Rules Looked For ;; ------------------------ ------------------------------------------- ;; (type1+type2) := (type1+type2) OR (type2+type1) ;; (type1-type2) := (type1+type2) OR (type2+type1) ;; (type1^-type2) := (type1+type2) OR (type2+type1) ;; (type1*type2) := (type1*type2) OR (type2*type1) ;; (type1/type2) := (type1/type2) OR (type2^/type1) ;; (type1^/type2) := (type2/type1) OR (type1^/type2) ;; (type1==type2) := (type1==type2) OR (type2==type1) ;; (type1<=type2) := (type1==type2) OR (type2==type1) ;; (type1=type2) := (type1==type2) OR (type2==type1) ;; (type1>type2) := (type1==type2) OR (type2==type1) ;; abs(type) := abs(type) ;; average(type1,type2) := (type1+type2) OR (type2+type1) ;; binary(type) := binary(type) ;; cos(type) := sin(type) ;; cube(type) := (type*type) ;; curoot(type) := (type*type) ;; exp(type) := exp(type) ;; inv(type) := inv(type) ;; land(type1,type2) := (type1==type2) OR (type2==type1) ;; lif(type1,type2,type3) := lif(type1,type2,type3) ;; ln(type) := ln(type) ;; lor(type1,type2) := (type1==type2) OR (type2==type1) ;; maximum(type1,type2) := maximum(type1,type2) OR maximum(type2,type1) ;; minimum(type1,type2) := maximum(type1,type2) OR maximum(type2,type1) ;; pcube(type1,type2) := (type1*type2) OR (type2*type1) ;; power(type1,type2) := (type1*type2) OR (type2*type1) ;; pquart(type1,type2) := (type1*type2) OR (type2*type1) ;; product(type1,type2) := (type1*type2) OR (type2*type1) ;; psquare(type1,type2) := (type1*type2) OR (type2*type1) ;; psqrt(type1,type2) := (type1*type2) OR (type2*type1) ;; quart(type) := (type*type) ;; quroot(type) := (type*type) ;; sig(type) := sig(type) ;; sin(type) := sin(type) ;; square(type) := (type*type) ;; sqroot(type) := (type*type) ;; summarize(type1,type2) := (type1+type2) OR (type2+type1) ;; tan(type) := sin(type) ;; tanh(type) := sin(type) ;; ;; The accept clause tells the type system that only widget and screwsPerWidgit types will be accepted at the top level. ;; If NO accept clause is specified then all valid types are accepted at the top level. ;; An accept clause without a tyoe clause has no affect whatsoever. ;; ;; *Notes for fitness clauses -- fitness(fitnessChoice) ;; The available fitness choices are as follows: ;; cep ;; Compute the M-Class classification percentage of FAILED exact matches (Classification Error Percent - 0% is perfect). ;; ecep ;; Compute the M-Class classification percentage of equally distributed FAILED exact matches (square root of average of squares of error percent for each class). ;; mad ;; Compute the Market Accounting Device error for an estimator wff over the specified training data set. ;; nlse ;; Compute the regression normalized least squared error for an estimator wff over the specified training data set (default). ;; nmae ;; Compute the regression normalized mean absolute error for an estimator wff over the specified training data set. ;; roc ;; Compute the receiver operating characteristic (ROC) for an estimator wff over the specified training data set (good is class 1 Bad is class 0). ;; roca ;; Compute the receiver operating characteristic (ROCA) for an estimator wff over the specified training data set (good is class 0 Bad is class 1). ;; user ;; Compute the user supplied regression statistic (USER) for an estimator wff over the specified training data set. ;; usec ;; Compute the user supplied classification statistic (USER) for an estimator wff over the specified training data set. ;; ;; *Notes for config clauses -- config(breeder,strategy,popsize,poolsize,serialOps,geneticOps,constantOps,epochGens,burstOps,evolveMutateConstants,burstConstants,epochRandomOps,randomOps) ;; Each island represents a separate independent island search request ;; The final answer is the best champion from any of the independent islands ;; The available breeders are as follows: ;; aged ;; Breeder for age layered for one epoch (until all serially possible iterations are complete) then halt search within its island population -- recognizes link, playbook, onfinal, stepwise, and onscore events plus reduce mutations. ;; collect ;; Breeder for collecting champions with no renewal of the island population (National island always uses collect breeder). ;; cyclic ;; Breeder for elite champions for one epoch (until "epochGens" generations without improvement) then renewal of its island population and cyclic repeat evolution -- does NOT recognize link, playbook, stepwise, onscore, events. ;; epoch ;; Breeder for elite champions for one epoch (until "epochGens" generations without improvement) then halt search within its island population -- recognizes playbook but NOT link, stepwise, onscore events (resets pop size ONLY after end of generation). ;; elite ;; Breeder for elite champions with no renewal of the island population -- does NOT recognize link, playbook, stepwise, onscore, events. ;; noop ;; Breeder for no operation with no renewal of the island population (Empty islands always uses noop breeder - does recognize playbook). ;; pareto ;; Breeder for pareto front for one epoch (until all serially possible iterations are complete) then halt search within its island population -- recognizes link, playbook, onfinal, stepwise, and onscore events plus reduce mutations. ;; smart ;; Breeder for elite champions for one epoch (until all serially possible iterations are complete) then halt search within its island population -- recognizes link, playbook, stepwise, onfinal, and onscore events plus reduce mutations (resets pop size IMMEDIATELY after every insert). ;; The available strategies are as follows: ;; noop ;; Strategy population operator with no population operators of any kind (used by all empty islands). ;; burst ;; Strategy population operator which incorporates reasonable levels of serial, mutation, crossover, constant swarm, and limits the number of candidates randomly initialized at each generation. ;; deep ;; Strategy population operator which uses myPopulationNotes and myPopulationState to apply heavy levels of mutation, crossover, discrete differential, serial, greedy, and constant swarm. ;; semantic ;; Strategy population operator which incorporates reasonable levels of serial, mutation, semantic crossover, and constant swarm. ;; serial ;; Strategy population operator which incorporates serial search, and swarm intelligence. ;; standard ;; Strategy population operator which incorporates reasonable levels of serial, mutation, crossover, and constant swarm. ;; swarm ;; Strategy population operator which incorporates concentrated swarm only. ;; The other config integer arguments are as follows: ;; popsize ;; The maximum count of Lambda survivors in the island after trimming. ;; poolsize ;; The maximum count of constants inside each survivor Lambda after trimming. ;; serialOps ;; The count of serial iterative population operations in each generation. ;; geneticOps ;; The count of genetic algorithm population operations in each generation. ;; constantOps ;; The count of swarm algorithm constant pool operations in each generation. ;; epochGens ;; The count of generations without improvement which define the end of an Epoch (for breeders cyclic, epoch pareto, and smart only). ;; burstGens ;; The count of constant operations to apply to burst constants. ;; evolveMutateConstants ;; If "nc" then burst constants are NOT to be evolved after bust mode during main generation evolution, otherwise if "ec" or "" then burst constants are to be evolved after bust mode during main generation evolution. ;; burstConstants ;; If "bc" then ALL constants are burst constants, otherwise if "cc" or "" then no constants are to be burst constants. ;; epochRandomOps ;; The count of random population operations at the start of each island epoch. ;; randomOps ;; The count of random population operations at the start of each island generation. ;; ;; Note: The search clause may contain substitute characters which replace the values of those variables into the search string such as, ;; ;; $lib(name)$ Replaced with the text in the named library entry. ;; $poly$ Replaced with the list of significant contributing basis functions in the in population example: x0, (x45/x23), cos(x10), square(x4*x45) ;; $BB$ Replaced with the list of unique basis functions in the in population example: x0, (x45/x23), cos(x10), square(x4*x45) ;; $FF$ Replaced with the list of unique abstract functions in the in population example: cos, square, +, /, tanh ;; $VV$ Replaced with the list of unique abstract features in the in population example: x0, x45, x10, x4, x45 ;; $B0$ Replaced with the expression of the specified basis function examples: $B0$ $B10$ $B9$ $B5$ ;; $K0$ Replaced with the expression of the specified basis function (without regression weights) examples: $K0$ $K10$ $K9$ $K5$ ;; $c0$ Replaced with the signed real value of the specified abstract constant examples: $c0$ $c10$ $c9$ $c5$ ;; $v1$ Replaced with the feature name value of the specified abstract feature examples: $v0$ $v10$ $v9$ $v5$ ;; $f2$ Replaced with the function name value of the specified abstract function examples: $f0$ $f10$ $f9$ $f5$ ;; $w1$ Replaced with the signed real value of the specified real weight examples: $w0$ $w10$ $w9$ $w5$ ;; $w[n,m]$ Replaced with the signed real value of the specified real weight examples: $w[0,2]$ $w[10,3]$ $w[9,5]$ ;; $a$ Replaced with the signed real value of the axis constant ;; $b$ Replaced with the signed real value of the slope constant ;; %c0% Replaced with the sign ("+" or "-") of the specified abstract constant examples: %c0% %c10% %c9% %c5% ;; %w1% Replaced with the sign ("+" or "-") of the specified real weight examples: %w0% %w10% %w9% %w5% ;; %a% Replaced with the sign ("+" or "-") of the axis constant ;; %b% Replaced with the sign ("+" or "-") of the slope constant ;; $cci$ Replaced with the constants in the champion lambda CC vector. ;; $vvi$ Replaced with the features in the champion lambda VV vector. ;; $ffi$ Replaced with the functions in the champion lambda FF vector. ;; $tti$ Replaced with the terms in the champion lambda TT vector. ;; ;; $.wildcard.$ Replaced with $wildcard$ so that wild cards can be enclosed within RQL Library searches. ;; %.wildcard.% Replaced with %wildcard% so that wild cards can be enclosed within RQL Library searches. ;; ;; *Notes for link() clauses -- link(name,libKey) ;; A link event causes the specified library search 'libKey' in the link clause to undergo wild card substitution and to be compiled and run in the island. ;; The link clause search specifications MAY contain a where clause. ;; The old island template's WHERE clause is used if the new template does not have a WHERE clause. ;; The link event occurs whenever any island with the speciifed name achieves a new local best fitness champion. ;; The link search clause may contain substitute characters such as, $BB$, $FF$, $VV$, $K0$, $B0$, $c0$, $v1$, $f2$, $w1$, $w[n,m]$, $a$, $b$ which replace the values of those variables into the search string ;; The link search clause may contain substitute characters such as, %c0%, %w1%, %a%, %b% which replace the signs ("+" or "-") of those variables into the search string ;; When the link search is initiated, the previous search will have left a best scoring champion in the island which is used in wild card value substitution. ;; Note: wild card substitutions are only possible with Reg, Avg, Mdl, Mvl, and Lda chanpion styles. ;; ;; *Notes for onfinal() clauses -- onfinal('search','search',...,'search') ;; An onfinal event causes the ALL searches in the onfinal clause to undergo wild card substitution and to be compiled and run in the island. ;; The old island template's WHERE clause is used. ;; onfinal search clauses must NOT include a WHERE clause of their own. ;; All onfinal searches are executed simultantously in the same island AND they must resolve to concrete searches. ;; No abstract or evolutionary searches are allowed in the onfinal clause. ;; ;; UNLESS a 'set(name,..text..)' directive is used, which adds the specified ..text.. under the specified name to the RQL Library - this library search MAY containany useful text. ;; UNLESS a 'play(name)' directive is used, then the named RQL Library search is added to the play book and may contain wild cards and a where clause. ;; UNLESS a 'log(...text...)' directive is used, then the text, after wild card substitution, to the ARC search log. ;; UNLESS an 'out(filename)' directive is used, which writes out the transformed champion basis function values and the dependent variable under the file names "filename_Train.csv" and "filename_Test.csv". ;; ;; The onfinal event occurs whenever the island halts execution. ;; Only one onfinal event will occur and that event executes ALL onscore specified searches. ;; The onfinal search clause may contain substitute characters such as, $BB$, $FF$, $VV$, $K0$, $B0$, $c0$, $v1$, $f2$, $w1$, $w[n,m]$, $a$, $b$ which replace the values of those variables into the search string ;; The onfinal search clause may contain substitute characters such as, %c0%, %w1%, %a%, %b% which replace the signs ("+" or "-") of those variables into the search string ;; When the onfinal search is initiated, the previous search will have left a best scoring champion in the island which is used in wild card value substitution. ;; Note: wild card substitutions are only possible with Reg, Avg, Mdl, Mvl, and Lda chanpion styles. ;; ;; *Notes for onscore() clauses -- onscore(score,maxgens,'search',...,'search') ;; The onscore event causes the ALL searches in the onscore clause to undergo wild card substitution and ;; to be compiled and run in the island. The old island template's where clause is NOT used. ;; All onscore search clauses MUST include a where clause of their own. ;; The first onscore search becomes the template for the newly restarted island. ;; The onscore searches should NOT be concrete searches. ;; Since all onscore searches are executed simultantously in the same island, ;; it is the responsibility of the RQL programmer to ensure that the where clauses ;; in each onscore search are compatible. ;; The onscore event occurs whenever the island reaches the specified number of generations OR ;; whenever the island achieves a best score equal to or less than the specified score OR ;; whenever the island halts execution. ;; Only one onscore event will occur and that event executes ALL onscore specified searches. ;; The onscore search clause may contain substitute characters such as, $BB$, $FF$, $VV$, $K0$, $B0$, $c0$, $v1$, $f2$, $w1$, $w[n,m]$, $a$, $b$ which replace the values of those variables into the search string ;; The onscore search clause may contain substitute characters such as, %c0%, %w1%, %a%, %b% which replace the signs ("+" or "-") of those variables into the search string ;; When the onscore search is initiated, the previous search will have left a best scoring champion in the island which is used in wild card value substitution. ;; Note: wild card substitutions are only possible with Reg, Avg, Mdl, Mvl, Lda chanpion styles. ;; For example: ;; ==>search regress(v1/(v0+c0)) where{ config(smart,standard,100,50,1000,100,50) op(noop,:,+,-,*,/) onscore(.001,1000,'regress(($v1+c0)/($v0+$c0)) where{ config(epoch,standard,100,50,1000,100,50) op(noop,:,+,-,*,/)}')} ;; Assume that the champion is y = regress(x2/(x4+3.54293748)) ;; Then by replacing the substitution wild cards $v1, $v0, and $c0, the new search specification will be as follows ;; ==>search regress((x2+c0)/(x4+3.54293748)) where{ config(epoch,standard,100,50,1000,100,50) op(noop,:,+,-,*,/)} ;; ;; *Notes for stepwise() clauses -- stepwise(maxgens,'search',...,'search') ;; A setpwise event causes the the NEXT search in the stepwise clause to undergo wild card substitution and to be compiled and run in the island. ;; Stepwise search clauses MAY include a WHERE clause of their own - but its stepwise decoration is ignored. ;; If the stepwise search contains NO WHERE clause, the old island template's WHERE clause is used. ;; The stepwise event occurs whenever the island reaches the specified number of generations OR whenever the island halts execution. ;; A stepwise event will occur as long as there are remaining stepwise searches which have not been executed. ;; The stepwise search clause may contain substitute characters such as, $BB$, $FF$, $VV$, $K0$, $B0$, $c0$, $v1$, $f2$, $w1$, $w[n,m]$, $a$, $b$ which replace the values of those variables into the search string ;; The stepwise search clause may contain substitute characters such as, %c0%, %w1%, %a%, %b% which replace the signs ("+" or "-") of those variables into the search string ;; When the stepwise search is initiated, the previous search will have left a best scoring champion in the island which is used in wild card value substitution. ;; Note: wild card substitutions are only possible with Reg, Avg, Mdl, Mvl, and Lda chanpion styles. ;; For example: ;; ==>search regress(f0(v0,v1)) where{config(smart,standard,100,50,1000,100,0) op(noop,inv,cos,sin,tan,tanh,sqroot,square,cube,quart,exp,ln,+,-,*,/) stepwise(1000,'regress($B0$,f0(v0,v1)) where{config(smart,standard,100,50,1000,100,0) op(noop,inv,cos,sin,tan,tanh,sqroot,square,cube,quart,exp,ln,+,-,*,/)')} ;; Assume that the champion is y = regress(x2/x4) ;; Then by replacing the substitution wild cards $B0$, the new search specification will be as follows ;; ==>search regress(x2/x4,f0(v0,v1)) where{config(smart,standard,100,50,1000,100,0) op(noop,inv,cos,sin,tan,tanh,sqroot,square,cube,quart,exp,ln,+,-,*,/)} ;; ;; ;; ********************************************** ;; ********************************************** ;; Speed of Training Versus Accuracy of Modeling: ;; ********************************************** ;; ********************************************** ;; ;; The fastest training (but with the least accurate modeling) is to set the coefficient weight training argument off (weights:false) when starting arc.arcnetLearn AND ;; to make sure that the RQL command is concrete (i.e. contains no abstract constants, terms, functions, or variables - c0, t0, f0, v0 etc.). ;; ;; The next fastest training (with a little more accurate modeling) is to use the default coefficient weight training arguments when starting arc.arcnetLearn AND ;; to make sure that the RQL command is concrete (i.e. contains no abstract constants, terms, functions, or variables - c0, t0, f0, v0 etc.). ;; ;; The next fastest training (with even more accurate modeling) is to use the default coefficient weight training arguments when starting arc.arcnetLearn AND ;; to make sure that the RQL command contains no abstract constants, terms, or functions - c0, t0, f0, etc. ;; ;; The next fastest training (with yet more accurate modeling) is to use the default coefficient weight training arguments when starting arc.arcnetLearn AND ;; to make sure that the RQL command contains no abstract constants, or terms - c0, t0, etc. ;; ;; The penultimate slower training (with the most accurate modeling) is to set the coefficient weight training argument (wtmaxgens:5000) when starting arc.arcnetLearn AND ;; to make sure that the RQL command contains no abstract constants, or terms - c0, t0, etc. ;; ;; As a practical matter, to keep training time within reason, if the RQL command contains abstract constants (i.e. c0, etc), then the coefficient weight training argument ;; should be set off (weights:false) when starting arc.arcnetLearn. ;; ;; *********************************************** ;; *********************************************** ;; Notes with examples of RQL search commands: ;; *********************************************** ;; *********************************************** ;; ;; More examples of RQL search commands may be found in the following ARC child functions ;; ;; (arc.arcnetRQL) ;; (arc.help) ;; (arc.generateRQL) ;; ;; generateRQL ==> Auto-generated RQL Searches ;; ;; The arc.generateRQL function is used to generate predefined RQL, rather than building ;; from scratch by hand, for ease of use. The format of this function call and its arguments ;; are as follows. ;; ;; generateRQL argument order: ;; (setq rql (arc.generateRQL keyWord fitness bases depth term features classes reduce operators delayTime seedFileName fieldList typeRules)) ;; Generates predefined RQL search source code. ;; ;; generateRQL keyWord regression values: for multinomial regression: the currently available keyWords are as follows: ;; ;; linear: ;; Generates a linear multiple regression RQL search - whose bases are equal to the number of features. ;; This can easily be used on data which is 100 features or less in width, and should be the first attempt ;; at analyzing any new data set of medium or less width. The bases and features arguments should be ;; the same and should equal the actual features in the training data. The depth, classes, term, ;; and operators arguments are all ignored for this generateRQL option, but the reduce argument is important. ;; fsregress: ;; Generates a complex multiple-island RQL search for cross correlations with feature selection and user supplied operators. ;; The term, operators, bases, depth, and features arguments are all important with the default ;; operators = op(noop,*). This search CAN accept a seed argument generated by a previous identical search. ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; This should be the second attempt at analyzing any new data set of any width. If the features are 100 ;; or less, one should first try bases equal to features and depth = 1 which with operators = op(noop,*) tries ;; all possible pairwise cross correlations. With depth = 2 and operators = op(noop,*), all possible triple ;; cross correlations are attempted. With depth = 1 and operators = op(noop,square,cube,quart), a polynomial ;; study of the data is attempted. By setting operators = "all" and depth = N, one gets the most general ;; nonlinear multiple regression study of the data but unfortunately these studies often overfit. ;; However with operators = "safe" and depth = N, one gets a general nonlinear multiple regression study ;; of the data which almost never overfits. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear multiple regression study of the ;; data which usually does not overfit. ;; ;; The reduce argument is crucial as it determines whether non-contributing basis functions will be dropped. ;; ;; correlate: ;; Generates a complex multiple-island RQL search for cross correlations of features with user supplied operators. ;; The term, operators, bases, depth, and features arguments are all important with the default ;; operators = op(noop,*). This search CAN accept a seed argument generated by a previous identical search. ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; This should be the second attempt at analyzing any new data set of any width. If the features are 100 ;; or less, one should first try bases equal to features and depth = 1 which with operators = op(noop,*) tries ;; all possible pairwise cross correlations. With depth = 2 and operators = op(noop,*), all possible triple ;; cross correlations are attempted. With depth = 1 and operators = op(noop,square,cube,quart), a polynomial ;; study of the data is attempted. By setting operators = "all" and depth = N, one gets the most general ;; nonlinear multiple regression study of the data but unfortunately these studies often overfit. ;; However with operators = "safe" and depth = N, one gets a general nonlinear multiple regression study ;; of the data which almost never overfits. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear multiple regression study of the ;; data which usually does not overfit. ;; ;; The reduce argument is crucial as it determines whether non-contributing basis functions will be dropped. ;; ;; If features <= 100 then it is best to set bases = features for a complete covering multiple regression ;; study. If features > 100, then bases can be set to 25 or so and various subsets of features will be ;; selected from the entire set of features and a general multiple regression search is performed. ;; regress: ;; Generates a simple single-island RQL search for cross correlations of features with user supplied operators. ;; The term, operators, bases, depth, and features arguments are all important with the default ;; operators = op(noop,*). This search CAN accept a seed argument generated by a previous identical search. ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; This should be the second attempt at analyzing any new data set of any width. If the features are 100 ;; or less, one should first try bases equal to features and depth = 1 which with operators = op(noop,*) tries ;; all possible pairwise cross correlations. With depth = 2 and operators = op(noop,*), all possible triple ;; cross correlations are attempted. With depth = 1 and operators = op(noop,square,cube,quart), a polynomial ;; study of the data is attempted. By setting operators = "all" and depth = N, one gets the most general ;; nonlinear multiple regression study of the data but unfortunately these studies often overfit. ;; However with operators = "safe" and depth = N, one gets a general nonlinear multiple regression study ;; of the data which almost never overfits. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear multiple regression study of the ;; data which usually does not overfit. ;; ;; The reduce argument is crucial as it determines whether non-contributing basis functions will be dropped. ;; ;; If features <= 100 then it is best to set bases = features for a complete covering multiple regression ;; study. If features > 100, then bases can be set to 25 or so and various subsets of features will be ;; selected from the entire set of features and a general multiple regression search is performed. ;; extreme: ;; Generates a complex RQL search for Absolute Accuracy (see academic papers). This search is extremely ;; accurate within its specified range and should be the third attempt at analyzing any new data set. ;; It is the subject of several academic articles and is accurate within the following range... ;; U2(1)[50] Extremely accurate ;; U1(25)[150] Mostly extremely accurate ;; U.25(5)[250] Mostly extremely accurate ;; V2(5)[250] Sometimes accurate ;; The depth, features, bases, term, and operator arguments are used with default operators = "best". ;; and gens = -1 must be used for absolute accuracy. ;; aged: ;; (Legacy) Generates a standard aged layered regression search of the data set. The bases = 1 is forced. The term, ;; depth, features, and operators arguments are crucial with default operators = "best". This is not a very ;; accurate search BUT it matches the standard GP community's age layered search and so may be attempted ;; on some data sets. ;; pareto: ;; (Legacy) Generates a standard pareto front regression search of the data set. The bases = 1 is forced. The term, ;; depth, features, and operators arguments are crucial with default operators = "best". This is not a very ;; accurate search BUT it matches the standard GP community's pareto front search and so may be attempted ;; on some data sets. ;; baseline: ;; (Legacy) Generates the standard baseline regression search of the data set used in several of our academic papers. ;; The term, depth, features, reduce, and operators arguments are crucial with default operators = "best". ;; This is not a very accurate search BUT it matches the baseline search from our papers and so may be attempted ;; on some data sets. ;; The reduce argument is crucial as it determines whether non-contributing basis functions will be dropped. ;; arcnet: ;; Generates a simple single-island RQL search with fast GLM multivariate regression functions. ;; This search is used to train each feature in ARC Network deep learning. ;; This search accepts ONLY nmae mad user and nlse fitness measure for regression. ;; The depth, bases, features, and operators are essential with the default operators = "best". ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; ;; By setting operators = "all", bases = 5, and depth = 4, one gets a general nonlinear discriminant analysis ;; study of the data which works well in training ARC Network layers during deep learning. ;; ;; ;; generateRQL keyWord classification values: for Binary and Multiclass classification: The currently available keyWords are as follows: ;; ;; classify: ;; Generates a multi-class classifier complex multiple-island RQL search with fast GLM Linear Discriminant ;; Analysis functions. This search CAN accept a seed argument generated by a previous identical search. ;; This should be the second strategy attempted on any data set. ;; The depth, bases, features, classes, and operators are essential with the default operators = "best". ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; ;; If the data set features < 100, then bases = features should be tried first. If bases < features then ;; a subset of the entire features are and a linear discriminant analysis search is attempted. In all ;; cases, if bases = features, then term is set to "x" otherwise term is set to "v". ;; ;; If the features are 100 or less, one should first set bases equal to features and depth = 0 which tries ;; a standard linear discriminant analysis of the data. The second attempt on any dataset should be ;; setting depth = 2 and operators = "safe", which tries a nonlinear discriminant analysis of the data set ;; which almost never overfits. By setting operators = "all" and depth = 2, one gets a general ;; nonlinear discriminant analysis study of the data but unfortunately these studies often overfit. ;; ;; The next attempt on any dataset should be setting depth = N and operators = "safe", which tries ;; a nonlinear discriminant analysis of the data set which almost never overfits. ;; ;; By setting operators = "all" and depth = N, one gets a general nonlinear discriminant analysis ;; study of the data but unfortunately these studies often overfit. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear discriminant analysis study ;; of the data which usually does not overfit. ;; fsclass: ;; Generates a multi-class classifier complex multiple-island RQL search with fast GLM Linear Discriminant ;; Analysis functions, and an advanced initial feature discovery step. A seed is not accepted. ;; This should be a major strategy attempted on any data set. ;; The depth, bases, features, classes, and operators are essential with the default operators = "best". ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; ;; If the data set features < 100, then bases = features should be tried first. If bases < features then ;; a subset of the entire features are and a linear discriminant analysis search is attempted. In all ;; cases, if bases = features, then term is set to "x" otherwise term is set to "v". ;; ;; If the features are 100 or less, one should first set bases equal to features and depth = 0 which tries ;; a standard linear discriminant analysis of the data. The second attempt on any dataset should be ;; setting depth = 2 and operators = "safe", which tries a nonlinear discriminant analysis of the data set ;; which almost never overfits. By setting operators = "all" and depth = 2, one gets a general ;; nonlinear discriminant analysis study of the data but unfortunately these studies often overfit. ;; ;; The next attempt on any dataset should be setting depth = N and operators = "safe", which tries ;; a nonlinear discriminant analysis of the data set which almost never overfits. ;; ;; By setting operators = "all" and depth = N, one gets a general nonlinear discriminant analysis ;; study of the data but unfortunately these studies often overfit. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear discriminant analysis study ;; of the data which usually does not overfit. ;; fbclass: ;; Generates a multi-class classifier complex multiple-island RQL search with fast GLM Linear Discriminant ;; Analysis functions, and an advanced initial feature discovery step. A seed is not accepted. ;; This should be a major strategy attempted on any data set. ;; The depth, bases, features, classes, and operators are essential with the default operators = "best". ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; ;; If the data set features < 100, then bases = features should be tried first. If bases < features then ;; a subset of the entire features are and a linear discriminant analysis search is attempted. In all ;; cases, if bases = features, then term is set to "x" otherwise term is set to "v". ;; ;; If the features are 100 or less, one should first set bases equal to features and depth = 0 which tries ;; a standard linear discriminant analysis of the data. The second attempt on any dataset should be ;; setting depth = 2 and operators = "safe", which tries a nonlinear discriminant analysis of the data set ;; which almost never overfits. By setting operators = "all" and depth = 2, one gets a general ;; nonlinear discriminant analysis study of the data but unfortunately these studies often overfit. ;; ;; The next attempt on any dataset should be setting depth = N and operators = "safe", which tries ;; a nonlinear discriminant analysis of the data set which almost never overfits. ;; ;; By setting operators = "all" and depth = N, one gets a general nonlinear discriminant analysis ;; study of the data but unfortunately these studies often overfit. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear discriminant analysis study ;; of the data which usually does not overfit. ;; mclass: ;; Generates a multi-class classifier complex multiple-island RQL search with fast GLM Linear Discriminant ;; Analysis functions. This search CAN accept a seed argument generated by a previous identical search. ;; This should be the second strategy attempted on any data set. ;; The depth, bases, features, classes, and operators are essential with the default operators = "best". ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; ;; If the data set features < 100, then bases = features should be tried first. If bases < features then ;; a subset of the entire features are and a linear discriminant analysis search is attempted. In all ;; cases, if bases = features, then term is set to "x" otherwise term is set to "v". ;; ;; If the features are 100 or less, one should first set bases equal to features and depth = 0 which tries ;; a standard linear discriminant analysis of the data. The second attempt on any dataset should be ;; setting depth = 2 and operators = "safe", which tries a nonlinear discriminant analysis of the data set ;; which almost never overfits. By setting operators = "all" and depth = 2, one gets a general ;; nonlinear discriminant analysis study of the data but unfortunately these studies often overfit. ;; ;; The next attempt on any dataset should be setting depth = N and operators = "safe", which tries ;; a nonlinear discriminant analysis of the data set which almost never overfits. ;; ;; By setting operators = "all" and depth = N, one gets a general nonlinear discriminant analysis ;; study of the data but unfortunately these studies often overfit. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear discriminant analysis study ;; of the data which usually does not overfit. ;; lda: ;; Generates a multi-class classifier simple single-island RQL search with fast GLM Linear Discriminant ;; Analysis functions. This search CAN accept a seed argument generated by a previous identical search. ;; This should be the third strategy attempted on any data set. ;; The depth, bases, features, classes, and operators are essential with the default operators = "best". ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; ;; If the data set features < 100, then bases = features should be tried first. If bases < features then ;; a subset of the entire features are and a linear discriminant analysis search is attempted. In all ;; cases, if bases = features, then term is set to "x" otherwise term is set to "v". ;; ;; If the features are 100 or less, one should first set bases equal to features and depth = 0 which tries ;; a standard linear discriminant analysis of the data. The second attempt on any dataset should be ;; setting depth = 2 and operators = "safe", which tries a nonlinear discriminant analysis of the data set ;; which almost never overfits. By setting operators = "all" and depth = 2, one gets a general ;; nonlinear discriminant analysis study of the data but unfortunately these studies often overfit. ;; ;; The next attempt on any dataset should be setting depth = N and operators = "safe", which tries ;; a nonlinear discriminant analysis of the data set which almost never overfits. ;; ;; By setting operators = "all" and depth = N, one gets a general nonlinear discriminant analysis ;; study of the data but unfortunately these studies often overfit. ;; ;; Finally with operators = "best" and depth = N, one gets a general nonlinear discriminant analysis study ;; of the data which usually does not overfit. ;; arcnet: ;; Generates a multi-class classifier simple single-island RQL search with fast GLM Linear Discriminant ;; Analysis functions. This search is used to train each feature in ARC Network deep learning. ;; This search accepts ONLY cep usec roc and roca fitness measure for classification. ;; The depth, bases, features, and operators are essential with the default operators = "best". ;; This RQL search will accept a user specified type rules file name in the typeRules argument. ;; ;; By setting operators = "all", bases = 5, and depth = 4, one gets a general nonlinear discriminant analysis ;; study of the data which works well in training ARC Network layers during deep learning. ;; ;; ;; generateRQL fitness values: ;; cep = where {fitness(cep) } ;; ecep = where {fitness(ecep) } ;; mad = where {fitness(mad) } ;; nlse = where {fitness(nlse) } ;; nmae = where {fitness(nmae) } ;; roc = where {fitness(roc) } ;; roca = where {fitness(roca) } ;; user = where {fitness(user) } ;; usec = where {fitness(usec) } ;; ;; ;; generateRQL reduce values: ;; If reduce = "reduce" then the RQL search requests an onfinal polynomial reduction. ;; ;; generateRQL oper values: ;; operators = "fast" - "op(noop,abs,binary,bipolar,cos,cube,inv,land,lor,psqrt,psquare,pcube,pquart,quart,sign,sin,sqroot,square,tan,+,-,*,/,^/,^-,<,<=,==,!=,>,>=)". ;; operators = "safe" - "op(noop,abs,binary,bipolar,cube,inv,square,sign,+,-,*,/,maximum,minimum,<=,>=,lif,lor,land)". ;; operators = "excel" - "op(noop,abs,binary,bipolar,cube,inv,square,sign,+,-,*,/,maximum,minimum,<=,>=,lif,lor,land)". ;; operators = "best" - "op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,cos,sin,tan,tanh,binary,bipolar,sig,sign,+,-,*,/,maximum,minimum,<=,>=,lif,lor,land)". ;; operators = "core" - "op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,cos,sin,tan,tanh,binary,bopolar,sig,sign,+,-,*,/,maximum,minimum)". ;; operators = "unary" - "op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,cos,sin,tan,tanh,binary,bipolar,sig,sign)". ;; operators = "all" - "op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,binary,bipolar,sign,sig,cos,sin,tan,tanh,+,-,*,/,maximum,minimum,<,<=,==,!=,>=,>,lif,lor,land,psqrt,psquare,pcube,pquart)". ;; operators = "default" - "op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,binary,bipolar,sign,sig,cos,sin,tan,tanh,+,-,*,/,maximum,minimum,<,<=,==,!=,>=,>,lif,lor,land,psqrt,psquare,pcube,pquart)". ;; operators = "" - "op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,binary,bipolar,sign,sig,cos,sin,tan,tanh,+,-,*,/,maximum,minimum,<,<=,==,!=,>=,>,lif,lor,land,psqrt,psquare,pcube,pquart)". ;; operators = #void - "op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,binary,bipolar,sign,sig,cos,sin,tan,tanh,+,-,*,/,maximum,minimum,<,<=,==,!=,>=,>,lif,lor,land,psqrt,psquare,pcube,pquart)". ;; ;; generateRQL fieldList values: ;; fieldList = "all", "", or #void - allows ALL fields to be used as input in the search. ;; fieldList = "vv(x0,x10,x35)" - allows only those fields listed to be used as input in the search. ;; fieldList = "25" - allows ALL fields to be used as input in the search AND specifies the field extraction to search for the 25 best fields. ;; fieldList = "-25" - disallows the first field (RowID?) to be used as input in the search AND specifies the field extraction to search for the 25 best fields. ;; ;; ;; Generating RQL formulas with code generator macros ;; ;; These RQL code generation macros can be placed anywhere inside an RQL statement ;; and they will be replaced by generated RQL code. This allows the RQL programmer ;; to specifiy complicated RQL formulas much easier than if all RQL formulas must ;; be painstakingly written out by hand. Here follow a number of examples. ;; Note: placing the @ symbol after the left paren tells each of these RQL macros ;; NOT to reset the constant|variable|function counters. ;; For example: ;; axis(0,0,n) - Reset the constant|variable|function counters ;; axis(@0,0,n) - Do NOT reset the constant|variable|function counters before macro generation. ;; Note: placing the $ symbol after the left paren tells each of these RQL macros ;; to restore the constant|variable|function counters after macro generation. ;; For example: ;; axis(0,0,n) - Reset the constant|variable|function counters before macro generation ;; axis($0,0,n) - Restore the constant|variable|function counters after macro generation ;; ;; axis(node-depth,base-functions,n|h|s) ;; Generates a sequence of explicitly axis-weighted basis functions as follows. ;; axis(0,0,n) ==> (c0+(c1* ;; axis(0,1,n) ==> (c0+(c1*v0)) ;; axis(0,1,h) ==> tanh(c0+(c1*v0)) ;; axis(0,2,h) ==> tanh(c0+(c1*v0)),tanh(c2+(c3*v1)) ;; axis(2,3,s) ==> sig(c0+(c1*f0(f1(v0,v1),f2(v2,v3)))),sig(c2+(c3*f3(f4(v4,v5),f5(v6,v7)))),sig(c4+(c5*f6(f7(v8,v9),f8(v10,v11)))) ;; ;; cart(bases,node-depth,tree-depth,split,unary) ;; Generates a CART nonlinear regression|classification model, used with the forest(cart) WHERE clause. ;; cart(1,0,2,m,sig) ==> lif(c6>v2,lif(c4>v0,c0,c1),lif(c5>v1,c2,c3)) ;; cart(2,0,2,m,noop) ==> lif(c6>v2,lif(c4>v0,c0,c1),lif(c5>v1,c2,c3)),lif(c13>v5,lif(c11>v3,c7,c8),lif(c12>v4,c9,c10)) ;; cart(2,0,2,m,sig) ==> lif(c6>sig(v2),lif(c4>sig(v0),c0,c1),lif(c5>sig(v1),c2,c3)),lif(c13>sig(v5),lif(c11>sig(v3),c7,c8),lif(c12>sig(v4),c9,c10)) ;; cart(1,1,2,s,tanh) ==> lif(c6>tanh(f0(v0,v1)),lif(c4>tanh(f0(v0,v1)),c0,c1),lif(c5>tanh(f0(v0,v1)),c2,c3))) ;; NOTES: The split argument is either "s" for only a single split formula for all split points, or ;; "m" for different split formulas for each split point. ;; The unary argument is the name of any valid unary function: (i.e. noop, cos, tanh, sig, etc.). ;; WARNING: The forest(cart) decoration invalidates all cc() and ec() constant contraints restricting the ;; abstract constants in the cart() macro generated tree structures. ;; ;; sparse(node-depth,base-functions,v|x) ;; Generates a sequence of possible sparse basis functions f0(Bf,0.0) where {f0(noop,*)} as follows. ;; sparse(0,7,x) ==> f0(x0,0.0) , f1(x1,0.0) , f2(x2,0.0) , f3(x3,0.0) , f4(x4,0.0) , f5(x5,0.0) , f6(x6,0.0) where {f0..f6(noop,*)} ;; sparse(1,3,x) ==> f0(f4(x0,v0),0.0) , f1(f5(x1,v1),0.0) , f2(f6(x2,v2),0.0) where {f0..f2(noop,*)} ;; sparse(0,1,v) ==> f0(v0,0.0) where {f0(noop,*)} ;; sparse(2,4,v) ==> f0(f4(f5(v0,v1),f6(v2,v3)),0.0) , f1(f7(f8(v4,v5),f9(v6,v7)),0.0) , f2(f10(f11(v8,v9),f12(v10,v11)),0.0) , f3(f13(f14(v12,v13),f15(v14,v15)),0.0) where {f0..f3(noop,*)} ;; ;; net(node-depth,inputs,outputs,x|v,n|h|s) ;; Generates a sequence of outputs expressed as explicitly weighted nonlinear sums as follows. ;; net(0,2,1,x,n) ==> summarize(c0,(c1*x0),(c2*x1)) ;; net(0,2,1,v,s) ==> sig(summarize(c0,(c1*v0),(c2*v1))) ;; net(0,3,2,x,h) ==> tanh(summarize(c0,(c1*x0),(c2*x1),(c3*x2))) , tanh(summarize(c4,(c5*x0),(c6*x1),(c7*x2))) ;; net(1,3,2,v,s) ==> sig(summarize(c0,(c1*f0(v0,v1)),(c2*f1(v2,v3)),(c3*f2(v4,v5)))) , sig(summarize(c4,(c5*f0(v0,v1)),(c6*f1(v2,v3)),(c7*f2(v4,v5)))) ;; ;; elastic('featureList','functionList') ;; Generates an elastic net style sequence of basis functions with listed features inclosed within the specified function list of operators as follows. ;; elastic('x0,x1','sig,square,-,*') ==> sig(x0),sig(x1),square(x0),square(x1),(x0-x1),(x1-x0),(x0*x0),(x0*x1),(x1*x1) ;; elastic('v0,v1,v2','noop,sig,square,*,maximum') ==> v0,v1,v2,sig(v0),sig(v1),sig(v2),square(v0),square(v1),square(v2),(v0*v0),(v0*v1),(v0*v2),(v1*v1),(v1*v2),(v2*v2),maximum(v0,v1),maximum(v0,v2),maximum(v1,v2) ;; ;; unary(node-depth,base-functions,v|t|x,unary) ;; Generates a sequence of universal basis functions inclosed within the specified unary function as follows. ;; unary(0,7,x,sig) ==> sig(x0) , sig(x1) , sig(x2) , sig(x3) , sig(x4) , sig(x5) , sig(x6) ;; unary(1,3,x,tanh) ==> tanh(f0(x0,v0)) , tanh(f1(x1,v1)) , tanh(f2(x2,v2)) ;; unary(0,1,v,sig) ==> sig(v0) ;; unary(0,2,,tanh) ==> tanh(t0) , tanh(t1) ;; unary(1,3,t,sig) ==> sig(f0(t0,t1)) , sig(f1(t2,t3)) , sig(f2(t4,t5)) ;; unary(2,4,v,sig) ==> sig(f0(f1(v0,v1),f2(v2,v3))) , sig(f3(f4(v4,v5),f5(v6,v7))) , sig(f6(f7(v8,v9),f8(v10,v11))) , sig(f9(f10(v12,v13),f11(v14,v15))) ;; ;; univariate(node-depth,base-functions,v|t|x) ;; Generates a single sum of explicitly weighted basis function as follows. ;; univariate(0,7,x) ==> (c0 + (c1*x0) + (c2*x1) + (c3*x2) + (c4*x3) + (c5*x4) + (c6*x5) + (c7*x6) ) ;; univariate(1,3,x) ==> (c0 + (c1*f0(x0,v0)) + (c2*f1(x1,v1)) + (c3*f2(x2,v2)) ) ;; univariate(0,1,v) ==> (c0 + (c1*v0) ) ;; univariate(0,2,t) ==> (c0 + (c1*t2) + (c3*t4) ) ;; univariate(1,3,t) ==> (c0 + (c1*f0(t2,t3)) + (c4*f1(t5,t6)) + (c7*f2(t8,t9)) ) ;; univariate(2,4,v) ==> (c0 + (c1*f0(f1(v0,v1),f2(v2,v3))) + (c2*f3(f4(v4,v5),f5(v6,v7))) + (c3*f6(f7(v8,v9),f8(v10,v11))) + (c4*f9(f10(v12,v13),f11(v14,v15))) ) ;; ;; universal(node-depth,base-functions,v|t|x|xn) ;; Generates a sequence of unweighted basis functions as follows. ;; universal(0,7,x) ==> x0 , x1 , x2 , x3 , x4 , x5 , x6 ;; universal(1,3,x) ==> f0(x0,v0) , f1(x1,v1) , f2(x2,v2) ;; universal(0,1,v) ==> v0 ;; universal(0,2,t) ==> t0 , t1 ;; universal(1,3,t) ==> f0(t0,t1) , f1(t2,t3) , f2(t4,t5) ;; universal(2,4,v) ==> f0(f1(v0,v1),f2(v2,v3)) , f3(f4(v4,v5),f5(v6,v7)) , f6(f7(v8,v9),f8(v10,v11)) , f9(f10(v12,v13),f11(v14,v15)) ;; universal(0,1,x23) ==> x23 ;; universal(2,1,x201) ==> f0(f1(x201,v0),f2(v1,v2)) ;; ;; uniweighted(node-depth,base-functions,n|h|s) ;; Generates a single sum of explicitly weighted basis functions as follows. ;; uniweighted(0,2,n) ==> (c0*v0) + (c1*v1)) ;; uniweighted(0,2,h) ==> tanh(c0*v0) + tanh(c1*v1)) ;; uniweighted(1,3,s) ==> sig(c0*f0(v0,v1)) + sig(c1*f1(v2,v3)) + sig(c2*f2(v4,v5))) ;; uniweighted(2,3,h) ==> tanh(c0*f0(f1(v0,v1),f2(v2,v3))) + tanh(c1*f3(f4(v4,v5),f5(v6,v7))) + tanh(c2*f6(f7(v8,v9),f8(v10,v11)))) ;; ;; weighted(node-depth,base-functions,n|h|s) ;; Generates a sequence of explicitly weighted basis functions as follows: "weighted(2,3,n)" ==> "c0*f0(f1(v0,v1),f2(v2,v3)) , c1*f3(f4(v4,v5),f5(v6,v7)) , c2*f6(f7(v8,v9),f8(v10,v11))" ;; weighted(0,2,n) ==> (c0*v0) , (c1*v1) ;; weighted(0,2,h) ==> tanh(c0*v0) , tanh(c1*v1) ;; weighted(1,3,s) ==> sig(c0*f0(v0,v1)) , sig(c1*f1(v2,v3)) , sig(c2*f2(v4,v5)) ;; weighted(2,3,h) ==> tanh(c0*f0(f1(v0,v1),f2(v2,v3))) , tanh(c1*f3(f4(v4,v5),f5(v6,v7))) , tanh(c2*f6(f7(v8,v9),f8(v10,v11))) ;; ;; Controlling the variable and function counters with RQL macros: ;; Syntax: universal(depth,basis,t|v||x|xn) - Reset the constant|variable|function counters before macro generation ;; Syntax: universal(@depth,basis,t|v||x|xn) - Do NOT reset the constant|variable|function counters before macro generation ;; Syntax: universal($depth,basis,t|v||x|xn) - Reset the constant|variable|function counters before macro generation and restore after macro generation ;; Syntax: universal(@$depth,basis,t|v||x|xn) - Do NOT reset the constant|variable|function counters before macro generation and restore after macro generation ;; Syntax: universal($@depth,basis,t|v||x|xn) - Do NOT reset the constant|variable|function counters before macro generation and restore after macro generation ;; Syntax: univariate(depth,basis,t|v||x|xn) - Reset the constant|variable|function counters ;; Syntax: univariate(@depth,basis,t|v||x|xn) - Do NOT reset the constant|variable|function counters ;; Syntax: univariate($depth,basis,t|v||x|xn) - Reset the constant|variable|function counters before macro generation and restore after macro generation ;; Syntax: univariate(@$depth,basis,t|v||x|xn) - Do NOT reset the constant|variable|function counters before macro generation and restore after macro generation ;; Syntax: univariate($@depth,basis,t|v||x|xn) - Do NOT reset the constant|variable|function counters before macro generation and restore after macro generation ;; ;; Hint: The RQL programmer can test the code generator macro expansion for any macro with the following lisp test expression. ;; (arc.rqlMacros "weighted(0,2,n)") ;; Substitute the RQL code generator macro whose expansion is desired. ;; ;; ;; ************************************************************************************************************************************* ;; ======================================== ;; ======================================== ;; ARC Internal Developer API EXAMPLES ;; Note: For use by ARC Software Developers ;; ======================================== ;; ======================================== ;; ======================================== ;; ARC RQL Individual Run Training EXAMPLES ;; ======================================== ;; ****************************** ;; Production classification training run ;; Label Fitness Features RowID TypeRules Hours TrainingFileName TestingFileName (arc.classify "TestCaseC06" ecep: 1000 false "" 10.0 "GPTP2017TestCase06_Train.csv" "GPTP2017TestCase06_Test.csv") ;; ****************************** ;; Production regression training run ;; Label Fitness Features RowID TypeRules Hours TrainingFileName TestingFileName (arc.regress "FICOSkewed" nmae: 78 false "" 10.0 "FICO_Extract03_Skewed_Train.csv" "FICO_Extract03_ALL_Test.csv") ;; ****************************** ;; Production scientific regression training run ;; Label Features RowID Hours TrainingFileName TestingFileName (arc.science "TestCaseT05" 25 false 24.0 "TestCaseT05_Train.csv" "TestCaseT05_Test.csv") ;; ================================================= ;; ARC RQL Individual PRODUCTION Prediction EXAMPLES ;; ================================================= ;; ****************************** ;; Production classification prediction run ;; command SeedFileName TestingFileName (arc.runPredict "class=2" "TestAmlAll_Leukemia_arcbin.db" "TestAmlAll_Leukemia_Test.xls") ;; ;; ****************************** ;; Production regression prediction run ;; command SeedFileName TestingFileName (arc.runPredict "regress" "TestBigRegress_arcbin.db" "TestBigRegress_Test.xls") ;; ============================== ;; Research Training EXAMPLES ;; ============================== ;; Two styles of production deep learning classification runs with named output tab file ;; command Bases Depth Learning Hours Stop TrainingFileName TestingFileName OutputFileName (arc.runLearn "class=2" 25 5 deep: 2.0 .0001 "TestAmlAll_Leukemia_Train.xls" "TestAmlAll_Leukemia_Test.xls" "TestAmlAll_Leukemia_Output.csv") ;; Genrate RQL = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules Hours Stop Label TrainingFileName TestingFileName OutputFileName (arc.runTrain (arc.generateRQL lda: ecep: 10 3 v: 25 2 false all: 10 "" "" "" ) 10 .001 Test: "LendingClub_train.csv" "LendingClub_test.csv" "LendingClub_output.csv") ;; ======================== ;; REGRESSION TEST PROBLEMS ;; ======================== ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 25)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL correlate: nlse: 25 2 t: F 0 true best: 10 "" "" "" )) (arc.run rql model:"linearRegression" G E R F "model(92.25 + (53.53*x0) + (88.26*x1) + (42.11*x2) + (29.0*x3) + (93.6*x4) + (67.88*x5) + (35.87*x6) + (43.21*x7) + (97.6*x8) + (40.98*x9) + (90.35*x10) + (92.35*x11) + (29.67*x12) + (50.39*x13) + (42.3*x14) + (24.12*x15) + (33.64*x16) + (29.02*x17) + (75.11*x18) + (97.81*x19) + (75.62*x20) + (47.55*x21) + (39.14*x22) + (22.94*x23) + (38.46*x24));" noise) (arc.run rql model:"cubicRegression" G E R F "model(50.63 + (63.6*cube(x0)) + (66.54*cube(x1)) + (32.95*cube(x2)) + (4.87*cube(x3)) + (46.49*cube(x4)) + (62.85*cube(x5)) + (90.45*cube(x6)) + (63.28*cube(x7)) + (42.15*cube(x8)) + (73.03*cube(x9)) + (92.2*cube(x10)) + (77.99*cube(x11)) + (56.67*cube(x12)) + (72.51*cube(x13)) + (49.77*cube(x14)) + (56.94*cube(x15)) + (54.76*cube(x16)) + (23.11*cube(x17)) + (56.03*cube(x18)) + (51.98*cube(x19)) + (11.71*cube(x20)) + (33.82*cube(x21)) + (46.25*cube(x22)) + (32.98*cube(x23)) + (36.06*cube(x24)));" noise) (arc.run rql model:"crossCorrelation" G E R F "model(-9.16 + (-9.16*x4*x0) + (-19.56*x0*x1) + (21.87*x1*x2) + (-17.48*x2*x3) + (38.81*x3*x4) + (3.1*x4*x5) + (59.81*x5*x6) + (93.1*x6*x7) + (.81*x7*x8) + (9.21*x8*x9) + (-5.81*x9*x10) + (-.01*x10*x11) + (4.21*x11*x12) + (68.81*x12*x13) + (-8.81*x13*x14) + (2.11*x14*x15) + (-7.11*x15*x16) + (-.91*x16*x17) + (20.0*x17*x18) + (1.81*x18*x19) + (9.71*x19*x20) + (8.1*x20*x21) + (6.1*x21*x22) + (18.51*x22*x23) + (7.1*x23*x24));" noise) (arc.run rql model:"elipsoid" G E R F "model(0.0 + (1*square(x0)) + (2*square(x1)) + (3*square(x2)) + (4*square(x3)) + (5*square(x4)) + (6*square(x5)) + (7*square(x6)) + (8*square(x7)) + (9*square(x8)) + (10*square(x9)) + (11*square(x10)) + (12*square(x11)) + (13*square(x12)) + (14*square(x13)) + (15*square(x14)) + (16*square(x15)) + (17*square(x16)) + (18*square(x17)) + (19*square(x18)) + (20*square(x19)) + (21*square(x20)) + (22*square(x21)) + (23*square(x22)) + (24*square(x23)) + (25*square(x24)));" noise) (arc.run rql model:"hiddenModel" G E R F "model(1.57 + (2.13*sin(x2)));" noise) (arc.run rql model:"cyclicSeries" G E R F "model(65.86 + (79.4*sin(x0)) + (45.88*cos(x1)) + (2.13*tan(x2)) + (4.6*sin(x3)) + (61.47*cos(x4)) + (30.64*tan(x5)) + (51.95*sin(x6)) + (47.83*cos(x7)) + (4.21*tan(x8)) + (37.84*sin(x9)) + (62.57*cos(x10)) + (4.68*tan(x11)) + (32.65*sin(x12)) + (86.89*cos(x13)) + (84.79*tan(x14)) + (31.72*sin(x15)) + (90.4*cos(x16)) + (93.57*tan(x17)) + (42.18*sin(x18)) + (47.91*cos(x19)) + (41.48*tan(x20)) + (39.47*sin(x21)) + (48.44*cos(x22)) + (34.75*tan(x23)) + (56.7*sin(x24)));" noise) (arc.run rql model:"mixedRegression" G E R F "model(1.57 + (1.57*x0) + (-39.34*sin(x1)) + (2.13*x2) + (46.59*(x3/x2)) + (11.54*x4) + (30.64*ln(x5)) + (51.95*abs(x6)) + (47.83*(x7*x3)) + (4.21*quart(x8)) + (37.84*x9) + (62.57*square(x10)) + (4.68*sqroot(x11)) + (32.65*(x12/x3)) + (86.89*x13) + (84.79*tan(x14)) + (31.72*cube(x15)) + (90.4*(x16*x4)) + (93.57*(x17/x16)) + (42.18*sin(x18)) + (47.91*cos(x19)) + (41.48*ln(x20)) + (39.47*square(x21)) + (48.44*x22) + (34.75*(x23*x20)) + (56.7*x24));" noise) (arc.run rql model:"squareRoot" G E R F "model(50.63 + (63.6*sqroot(x0)) + (66.54*sqroot(x1)) + (32.95*sqroot(x2)) + (4.87*sqroot(x3)) + (46.49*sqroot(x4)) + (62.85*sqroot(x5)) + (90.45*sqroot(x6)) + (63.28*sqroot(x7)) + (42.15*sqroot(x8)) + (73.03*sqroot(x9)) + (92.2*sqroot(x10)) + (77.99*sqroot(x11)) + (56.67*sqroot(x12)) + (72.51*sqroot(x13)) + (49.77*sqroot(x14)) + (56.94*sqroot(x15)) + (54.76*sqroot(x16)) + (23.11*sqroot(x17)) + (56.03*sqroot(x18)) + (51.98*sqroot(x19)) + (11.71*sqroot(x20)) + (33.82*sqroot(x21)) + (46.25*sqroot(x22)) + (32.98*sqroot(x23)) + (36.06*sqroot(x24)));" noise) ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 5)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL correlate: nlse: 5 3 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"linearRegression" G E R F "model(1.57 + (1.57*x0) + (-39.34*x1) + (2.13*x2) + (46.59*x3) + (11.54*x4));" noise) (arc.run rql model:"cubicRegression" G E R F "model(1.57 + (1.57*cube(x0)) + (-39.34*cube(x1)) + (2.13*cube(x2)) + (46.59*cube(x3)) + (11.54*cube(x4)));" noise) (arc.run rql model:"crossCorrelation" G E R F "model(-9.16 + (-9.16*x0*x0*x0) + (-19.56*x0*x1*x1) + (21.87*x0*x1*x2) + (-17.48*x1*x2*x3) + (38.81*x2*x3*x4));" noise) (arc.run rql model:"elipsoid" G E R F "model(0.0 + (1.0*square(x0)) + (2.0*square(x1)) + (3.0*square(x2)) + (4.0*square(x3)) + (5.0*square(x4)));" noise) (arc.run rql model:"hiddenModel" G E R F "model(1.57 + (2.13*sin(x2)));" noise) (arc.run rql model:"cyclicSeries" G E R F "model(14.65 + (14.65*sin(x0)) + (-6.73*cos(x1)) + (-18.35*tan(x2)) + (-40.32*sin(x3)) + (-4.43*cos(x4)));" noise) (arc.run rql model:"hyperTangent" G E R F "model(1.57 + (1.57*tanh(cube(x0))) + (-39.34*tanh(cube(x1))) + (2.13*tanh(cube(x2))) + (46.59*tanh(cube(x3))) + (11.54*tanh(cube(x4))));" noise) (arc.run rql model:"squareRoots" G E R F "model(1.23 + (1.23*sqroot(x0*x0)) + (-9.16*sqroot(x0*x1)) + (11.27*sqroot(x1*x2)) + (7.42*sqroot(x2*x3)) + (8.21*sqroot(x3*x4)));" noise) (arc.run rql model:"nistMisrala" G E R F "model(213.80940889-(213.80940889*exp(-0.54723748542*x0)));" noise) (arc.run rql model:"tangentRatios" G E R F "model(-23.4+(1.0*(x0/tan(x1)))+(2.0*(tanh(x2)/x1))+(3.0*(x2/tan(x3)))+(4.0*(tanh(x4)/x3))+(5.0*(x4/tan(x1))));" noise) ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 5)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL correlate: nlse: 5 2 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"maxima" G E R F "model(-9.16 + (-9.16*maximum(x0,x4)) + (-19.56*minimum(x0,x1)*x2) + (21.87*maximum(x1,x2)) + (-17.48*minimum(x2,x3)*x1) + (38.81*maximum(x3,x4)));" noise) ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 5)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist TypeRules (setq rql (arc.generateRQL correlate: nlse: 5 2 t: F 0 true default: 10 "" "" " )) (arc.run rql model:"linearRegression" G E R F "model(1.57 + (1.57*x0) + (-39.34*x1) + (2.13*x2) + (46.59*x3) + (11.54*x4));" noise) (arc.run rql model:"cubicRegression" G E R F "model(1.57 + (1.57*cube(x0)) + (-39.34*cube(x1)) + (2.13*cube(x2)) + (46.59*cube(x3)) + (11.54*cube(x4)));" noise) (arc.run rql model:"crossCorrelation" G E R F "model(-9.16 + (-9.16*x0*x0*x0) + (-19.56*x0*x1*x1) + (21.87*x0*x1*x2) + (-17.48*x1*x2*x3) + (38.81*x2*x3*x4));" noise) (arc.run rql model:"elipsoid" G E R F "model(0.0 + (0.0*square(x0)) + (1.0*square(x1)) + (2.0*square(x2)) + (3.0*square(x3)) + (4.0*square(x4)));" noise) (arc.run rql model:"hiddenModel" G E R F "model(1.57 + (2.13*sin(x2)));" noise) (arc.run rql model:"cyclicSeries" G E R F "model(14.65 + (14.65*sin(x0)) + (-6.73*cos(x1)) + (-18.35*tan(x2)) + (-40.32*sin(x3)) + (-4.43*cos(x4)));" noise) (arc.run rql model:"hyperTangent" G E R F "model(1.57 + (1.57*tanh(cube(x0))) + (-39.34*tanh(cube(x1))) + (2.13*tanh(cube(x2))) + (46.59*tanh(cube(x3))) + (11.54*tanh(cube(x4))));" noise) (arc.run rql model:"squareRoot" G E R F "model(1.23 + (1.23*sqroot(x0*x0*x0)) + (-9.16*sqroot(x0*x1*x1)) + (11.27*sqroot(x0*x1*x2)) + (7.42*sqroot(x1*x2*x3)) + (8.21*sqroot(x2*x3*x4)));" noise) (arc.run rql model:"nistMisrala" G E R F "model(213.80940889-(213.80940889*exp(-0.54723748542*x0)));" noise) (arc.run rql model:"transRatios" G E R F "model((tan(x0)/tan(x1))*(tan(x2)/tan(x3)));" noise) (arc.run rql model:"cosineOfCube" G E R F "model(6.87+cos(7.23*cube(x0)));" noise) (arc.run rql model:"trigonometric" G E R F "model(-9.16 + cos(-9.16*x0*x0*x0) + sin(22.19*x0*x1*x1) + cos(1.07*x0*x1*x2) + sin(-17.48*x1*x2*x3) + cos(18.81*x2*x3*x4));" noise) (arc.run rql model:"mixedModels" G E R F "model( if (abs(ninteger(x0 % 4.0)) == 0.0) {1.57 + (1.57*ln(.000001+abs(x0))) + (-39.34*ln(.000001+abs(x1))) + (2.13*ln(.000001+abs(x2))) + (46.59*ln(.000001+abs(x3))) + (11.54*ln(.000001+abs(x4)))} else if (abs(ninteger(x0 % 4.0)) == 1.0) {1.57 + (1.57*x0*x0) + (-39.34*x1*x1) + (2.13*x2*x2) + (46.59*x3*x3) + (11.54*x4*x4)} else if (abs(ninteger(x0 % 4.0)) == 2.0) {1.57 + (1.57*sin(x0)) + (-39.34*sin(x1)) + (2.13*sin(x2)) + (46.59*sin(x3)) + (11.54*sin(x4))} else if (abs(ninteger(x0 % 4.0)) == 3.0) {1.57 + (1.57*x0) + (-39.34*x1) + (2.13*x2) + (46.59*x3) + (11.54*x4)});" noise) (arc.run rql model:"ratioRegression" G E R F "model( if (abs(ninteger(x0 % 4.0)) == 0.0) {1.57 + (1.57*x0) + (-39.34*x1) + (2.13*x2) + (46.59*x3) + (11.54*x4)} else if (abs(ninteger(x0 % 4.0)) == 1.0) {1.57 + (1.57*x0*x0) + (-39.34*x1*x1) + (2.13*x2*x2) + (46.59*x3*x3) + (11.54*x4*x4)} else if (abs(ninteger(x0 % 4.0)) == 2.0) {1.57 + (1.57*sin(x0)) + (-39.34*sin(x1)) + (2.13*sin(x2)) + (46.59*sin(x3)) + (11.54*sin(x4))} else if (abs(ninteger(x0 % 4.0)) == 3.0) {1.57 + (1.57*ln(.000001+abs(x0))) + (-39.34*ln(.000001+abs(x1))) + (2.13*ln(.000001+abs(x2))) + (46.59*ln(.000001+abs(x3))) + (11.54*ln(.000001+abs(x4)))});" noise) ;; Example Test Problems: (used for GPTP XII paper and extended for the hard examples published in the paper) ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 25)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL extreme: nlse: 5 2 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"TestCaseT01" G E R F "model(1.57 + (14.3*x3));" noise) (arc.run rql model:"TestCaseT02" G E R F "model(3.57 + (24.33/x3));" noise) (arc.run rql model:"TestCaseT03" G E R F "model(1.687 + (94.183*(x3*x2)));" noise) (arc.run rql model:"TestCaseT04" G E R F "model(21.37 + (41.13*(x3/x2)));" noise) (arc.run rql model:"TestCaseT05" G E R F "model(-1.57 + (2.3*((x3*x0)*x2)));" noise) (arc.run rql model:"TestCaseT06" G E R F "model(9.00 + (24.983*((x3*x0)*(x2*x4))));" noise) (arc.run rql model:"TestCaseT07" G E R F "model(-71.57 + (64.3*((x3*x0)/x2)));" noise) (arc.run rql model:"TestCaseT08" G E R F "model(5.127 + (21.3*((x3*x0)/(x2*x4))));" noise) (arc.run rql model:"TestCaseT09" G E R F "model(11.57 + (69.113*((x3*x0)/(x2+x4))));" noise) (arc.run rql model:"TestCaseT10" G E R F "model(206.23 + (14.2*((x3*x1)/(3.821-x4))));" noise) (arc.run rql model:"TestCaseT11" G E R F "model(0.23 + (19.2*((x3-83.519)/(93.821-x4))));" noise) (arc.run rql model:"TestCaseT12" G E R F "model(0.283 + (64.2*((x3-33.519)/(x0-x4))));" noise) (arc.run rql model:"TestCaseT13" G E R F "model(-2.3 + (1.13*sin(x2)));" noise) (arc.run rql model:"TestCaseT14" G E R F "model(206.23 + (14.2*(exp(cos(x4)))));" noise) (arc.run rql model:"TestCaseT15" G E R F "model(-12.3 + (2.13*cos(x2*13.526)));" noise) (arc.run rql model:"TestCaseT16" G E R F "model(-12.3 + (2.13*tan(95.629/x2)));" noise) (arc.run rql model:"TestCaseT17" G E R F "model(-28.3 + (92.13*tanh(x2*x4)));" noise) (arc.run rql model:"TestCaseT18" G E R F "model(-222.13 + (-0.13*tanh(x2/x4)));" noise) (arc.run rql model:"TestCaseT19" G E R F "model(-2.3 + (-6.13*sin(x2)*x3));" noise) (arc.run rql model:"TestCaseT20" G E R F "model(-2.36 + (28.413*ln(x2)/x3));" noise) (arc.run rql model:"TestCaseT21" G E R F "model(21.234 + (30.13*cos(x2)*tan(x4)));" noise) (arc.run rql model:"TestCaseT22" G E R F "model(-2.3 + (41.93*cos(x2)/tan(x4)));" noise) (arc.run rql model:"TestCaseT23" G E R F "model(.913 + (62.13*ln(x2)/square(x4)));" noise) ;; Narrow problems in U1(3)[25] ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 25)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL extreme: nlse: 5 2 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"TestCaseT24" G E R F "model(13.3 + (80.23*x2) + (1.13*x3));" noise) (arc.run rql model:"TestCaseT25" G E R F "model(18.163 + (95.173/x2) + (1.13/x3));" noise) (arc.run rql model:"TestCaseT26" G E R F "model(22.3 + (62.13*x2) + (9.23*sin(x3)));" noise) (arc.run rql model:"TestCaseT27" G E R F "model(93.43 + (71.13*tanh(x3)) + (41.13*sin(x3)));" noise) (arc.run rql model:"TestCaseT28" G E R F "model(36.1 + (3.13*x2) + (1.13*x3) + (2.19*x0));" noise) ;; Wide problems in U1(5)[25] ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 25)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL extreme: nlse: 5 2 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"TestCaseT29" G E R F "model(-9.16 + (-9.16*x24*x0) + (-19.56*x20*x21) + (21.87*x24*x2) + (-17.48*x22*x23) + (38.81*x23*x24));" noise) (arc.run rql model:"TestCaseT30" G E R F "model(-9.16 + (-9.16*x24/x0) + (-19.56*x20/x21) + (21.87*x24/x2) + (-17.48*x22/x23) + (38.81*x23/x24));" noise) ;; Wide problems in F13(5)[3000] ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 5000)(setq F 3000)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL extreme: nlse: 5 2 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"TestCaseT31" G E R F "model(50.63 + (63.6*cube(x0)) + (66.54*cube(x1)) + (32.95*cube(x2)) + (4.87*cube(x3)) + (46.49*cube(x4)));" noise) (arc.run rql model:"TestCaseT32" G E R F "model(-9.16 + (-9.16*square(x0)) + (-19.56*ln(x123)) + (21.87*exp(x254)) + (-17.48*x3) + (38.81*x878));" noise) (arc.run rql model:"TestCaseT33" G E R F "model(0.0 + (1*square(x0)) + (2*square(x1)) + (3*square(x2)) + (4*square(x3)) + (5*square(x4)));" noise) (arc.run rql model:"TestCaseT34" G E R F "model(65.86 + (79.4*sin(x0)) + (45.88*cos(x1)) + (2.13*tan(x2)) + (4.6*sin(x3)) + (61.47*cos(x4)));" noise) (arc.run rql model:"TestCaseT35" G E R F "model(1.57 + (1.57/x923) + (-39.34*sin(x1)) + (2.13*x2) + (46.59*cos(x932)) + (11.54*x4));" noise) (arc.run rql model:"TestCaseT36" G E R F "model(50.63 + (63.6*sqroot(x0)) + (66.54*sqroot(x1)) + (32.95*sqroot(x2)) + (4.87*sqroot(x3)) + (46.49*sqroot(x4)));" noise) (arc.run rql model:"TestCaseT37" G E R F "model(92.25 + (53.53*square(2.3*x0)) + (88.26*cos(x1)) + (42.11/x4) + (29.0*cube(x3)) + (93.6*tanh(x4)));" noise) ;; Broad problems in U1(5)[150] ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 150)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL extreme: nlse: 5 1 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"TestCaseT38" G E R F "model(-9.16 + (-9.16*x124*x0) + (-19.56*x120*x21) + (21.87*x24*x26) + (-17.48*x122*x23) + (38.81*x123*x24));" noise) (arc.run rql model:"TestCaseT39" G E R F "model(-9.16 + (-9.16*x124/x0) + (-19.56*x20/x92) + (21.87*x102/x2) + (-17.48*x22/x143) + (38.81*x23/x149));" noise) (arc.run rql model:"TestCaseT40" G E R F "model(-9.16 + (-9.16*cos(x0)) + (-19.56*x20/x21) + (21.87*square(x125)) + (-17.48*x22/x23) + (38.81*tanh(x24)));" noise) ;; Dense problems in U1(25)[25] ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 25)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL extreme: nlse: 25 1 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"TestCaseT41" G E R F "model(50.63 + (63.6*cube(x0)) + (66.54*square(x1)) + (32.95*quart(x2)) + (4.87*cube(x3)) + (46.49*square(x4)) + (62.85*quart(x5)) + (90.45*cube(x6)) + (63.28*square(x7)) + (42.15*quart(x8)) + (73.03*cube(x9)) + (92.2*square(x10)) + (77.99*quart(x11)) + (56.67*cube(x12)) + (72.51*square(x13)) + (49.77*quart(x14)) + (56.94*cube(x15)) + (54.76*square(x16)) + (23.11*quart(x17)) + (56.03*cube(x18)) + (51.98*square(x19)) + (11.71*quart(x20)) + (33.82*cube(x21)) + (46.25*square(x22)) + (32.98*quart(x23)) + (36.06*cube(x24)));" noise) (arc.run rql model:"TestCaseT42" G E R F "model(-9.16 + (-9.16*x4*x0) + (-19.56*x0*x1) + (21.87*x1*x2) + (-17.48*x2*x3) + (38.81*x3*x4) + (3.1*x4*x5) + (59.81*x5*x6) + (93.1*x6*x7) + (.81*x7*x8) + (9.21*x8*x9) + (-5.81*x9*x10) + (-.01*x10*x11) + (4.21*x11*x12) + (68.81*x12*x13) + (-8.81*x13*x14) + (2.11*x14*x15) + (-7.11*x15*x16) + (-.91*x16*x17) + (20.0*x17*x18) + (1.81*x18*x19) + (9.71*x19*x20) + (8.1*x20*x21) + (6.1*x21*x22) + (18.51*x22*x23) + (7.1*x23*x24));" noise) (arc.run rql model:"TestCaseT43" G E R F "model(0.0 + (1*square(x0)) + (2*square(x1)) + (3*square(x2)) + (4*square(x3)) + (5*square(x4)) + (6*square(x5)) + (7*square(x6)) + (8*square(x7)) + (9*square(x8)) + (10*square(x9)) + (11*square(x10)) + (12*square(x11)) + (13*square(x12)) + (14*square(x13)) + (15*square(x14)) + (16*square(x15)) + (17*square(x16)) + (18*square(x17)) + (19*square(x18)) + (20*square(x19)) + (21*square(x20)) + (22*square(x21)) + (23*square(x22)) + (24*square(x23)) + (25*square(x24)));" noise) (arc.run rql model:"TestCaseT44" G E R F "model(65.86 + (79.4*sin(x0)) + (45.88*cos(x1)) + (2.13*tan(x2)) + (4.6*sin(x3)) + (61.47*cos(x4)) + (30.64*tan(x5)) + (51.95*sin(x6)) + (47.83*cos(x7)) + (4.21*tan(x8)) + (37.84*sin(x9)) + (62.57*cos(x10)) + (4.68*tan(x11)) + (32.65*sin(x12)) + (86.89*cos(x13)) + (84.79*tan(x14)) + (31.72*sin(x15)) + (90.4*cos(x16)) + (93.57*tan(x17)) + (42.18*sin(x18)) + (47.91*cos(x19)) + (41.48*tan(x20)) + (39.47*sin(x21)) + (48.44*cos(x22)) + (34.75*tan(x23)) + (56.7*sin(x24)));" noise) (arc.run rql model:"TestCaseT45" G E R F "model(1.57 + (1.57*x0) + (-39.34*sin(x1)) + (2.13*x2) + (46.59*(x3/x2)) + (11.54*x4) + (30.64*ln(x5)) + (51.95*abs(x6)) + (47.83*(x7*x3)) + (4.21*quart(x8)) + (37.84*x9) + (62.57*square(x10)) + (4.68*sqroot(x11)) + (32.65*(x12/x3)) + (86.89*x13) + (84.79*tan(x14)) + (31.72*cube(x15)) + (90.4*(x16*x4)) + (93.57*(x17/x16)) + (42.18*sin(x18)) + (47.91*cos(x19)) + (41.48*ln(x20)) + (39.47*square(x21)) + (48.44*x22) + (34.75*(x23*x20)) + (56.7*x24));" noise) ;; Example Test Problems: (used in GPTP IX paper and extended) ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 25)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL extreme: nlse: 5 2 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"TestCaseP01" G E R F "model(1.57 + (24.3*x3));" noise) (arc.run rql model:"TestCaseP02" G E R F "model(0.23 + (14.2*((x3+x1)/(3.0*x4))));" noise) (arc.run rql model:"TestCaseP03" G E R F "model(-5.41 + (4.9*(((x3-x0)+(x1/x4))/(3*x4))));" noise) (arc.run rql model:"TestCaseP04" G E R F "model(-2.3 + (0.13*sin(x2)));" noise) (arc.run rql model:"TestCaseP05" G E R F "model(3.0 + (2.13*ln(x4)));" noise) (arc.run rql model:"TestCaseP06" G E R F "model(1.3 + (0.13*sqroot(x0)));" noise) (arc.run rql model:"TestCaseP07" G E R F "model(213.80940889 - (213.80940889*exp(-0.547*x0)));" noise) (arc.run rql model:"TestCaseP08" G E R F "model(6.87 + (11*sqroot(7.23*x0*x4)));" noise) (arc.run rql model:"TestCaseP09" G E R F "model(((sqroot(x0)/ln(x1))*(exp(x2)/square(x3))));" noise) (arc.run rql model:"TestCaseP10" G E R F "model(0.81 + (24.3* ( ((2.0*x1)+(3.0*square(x2)))/((4.0*cube(x3))+(5.0*quart(x4))) )));" noise) (arc.run rql model:"TestCaseP11" G E R F "model(6.87 + (11*cos(7.23*cube(x0))));" noise) (arc.run rql model:"TestCaseP12" G E R F "model(2.0 - (2.1*(cos(9.8*x0)*sin(1.3*x4))));" noise) (arc.run rql model:"TestCaseP13" G E R F "model(32.0 - (3.0*((tan(x0)/tan(x1))*(tan(x2)/tan(x3)))));" noise) (arc.run rql model:"TestCaseP14" G E R F "model(22.0 - (4.2*((cos(x0)-tan(x1))*(tanh(x2)/sin(x3)))));" noise) (arc.run rql model:"TestCaseP15" G E R F "model(12.0 - (6.0*((tan(x0)/exp(x1))*(ln(x2)-tan(x3)))));" noise) (arc.run rql model:"TestCaseP16" G E R F "model(2.0 - (2.1*(sqroot(9.8*x0)/cube(1.3*x4))));" noise) (arc.run rql model:"TestCaseP17" G E R F "model(9.19 + (11*sqroot(x0*x3*x4)));" noise) (arc.run rql model:"TestCaseP18" G E R F "model(3.80940889 - (21.89*cos(x3-0.237)));" noise) (arc.run rql model:"TestCaseP19" G E R F "model(408.89 - (8.0940889*tanh(47.854/x0)));" noise) (arc.run rql model:"TestCaseP20" G E R F "model(3.8089 - (13.0*quart(37.542-x0)));" noise) (arc.run rql model:"TestCaseP21" G E R F "model(.0940889 - (2.0*sin(x3*x0)));" noise) (arc.run rql model:"TestCaseP22" G E R F "model(9.40889 - (6.0*tan(x3/x0)));" noise) (arc.run rql model:"TestCaseP23" G E R F "model(0.23 + (14.2*((x2+5.521)/(3.519-x4))));" noise) (arc.run rql model:"TestCaseP24" G E R F "model(2.3 + (1.2*((x3*x1)/(3.0/x4))));" noise) (arc.run rql model:"TestCaseP25" G E R F "model(0.03 + (9.2*((x3/x1)-(3.0*x4))));" noise) (arc.run rql model:"TestCaseP26" G E R F "model(1.53 + (4.2*((x3+x1)/(x0*x4))));" noise) (arc.run rql model:"TestCaseP27" G E R F "model(1.53 + (4.2*((x3+x1)*(22.1-x0))));" noise) (arc.run rql model:"TestCaseP28" G E R F "model(0.81 + (24.3* ( (3.0*square(x0))/(4.0*cube(x3)) )));" noise) (arc.run rql model:"TestCaseP29" G E R F "model(0.23 + (9.2*((x2+15.1)*(23.5-x4))));" noise) (arc.run rql model:"TestCaseP30" G E R F "model(0.23 + (44.3*((x2+2.21)-(2.15-x4))));" noise) (arc.run rql model:"TestCaseP31" G E R F "model(-6.93 + (0.5*sqroot(x0)) + (2.0*cube(inv(x1))) + (4.0*quart(x2)));" noise) (arc.run rql model:"TestCaseP32" G E R F "model(0.03 + (9.3*((x2+2.21)/cos(x4))));" noise) (arc.run rql model:"TestCaseP33" G E R F "model(5.22 + (-4.1*((x2-x1)/tan(x0))));" noise) (arc.run rql model:"TestCaseP34" G E R F "model(1.53 + (4.4*((x3+x1)/(x0-x4))));" noise) (arc.run rql model:"TestCaseP35" G E R F "model(1.53 + (4.2*(x0*x1*x2)) + (-2.6*(x1*x3*x4)) + (3.5*(x0*x2*x3)));" noise) (arc.run rql model:"TestCaseP36" G E R F "model(0.23 + (34.12*(x0*x2)) + (-9.451*(x4/x1)) + (16.61*cos(x3)));" noise) (arc.run rql model:"TestCaseP37" G E R F "model(0.23 + (11.2*cube(x2)) + (-9.451*square(x1)) + (16.61*quart(x3)));" noise) (arc.run rql model:"TestCaseP38" G E R F "model(0.23 + (-16.13*sin(x2)) + (-21.451*cos(x1)) + (1.61*tanh(x3)));" noise) (arc.run rql model:"TestCaseP39" G E R F "model(32.0 - (3.0*((tan(x0)/29.431)*(tanh(x2)/sin(x3)))));" noise) (arc.run rql model:"TestCaseP40" G E R F "model(32.0 - (3.0*((cos(x0)*sin(x1))/(tanh(x2)-tan(x3)))));" noise) (arc.run rql model:"TestCaseP41" G E R F "model(-12.3 + (2.13*exp(x2-23.526)));" noise) (arc.run rql model:"TestCaseP42" G E R F "model(26.921 + (94.12*(exp(sin(x0)))));" noise) ;; Example Test Problems: (used in GPTP VIII paper but expanded to 3 basis functions)(noise 0%) ;; Regress sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions regress: true 10%)(setq R 10000)(setq F 25)(setq G -1)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL pareto: nlse: 3 2 t: F 0 true default: 10 "" "" "" )) (arc.run rql model:"linearRegressionP03" G E R F "model(2.92 + (-2.8*x0) + (3.1*x3) + (4.6*x4) );" noise) (arc.run rql model:"cubicRegressionP03" G E R F "model(2.92 + (-20.1*cube(x0)) + (13.19*cube(x3)) + (14.26*cube(x4)) );" noise) (arc.run rql model:"crossCorrelationP03" G E R F "model(2.92 + (-4.1*(x0*x1*x2)) + (3.9*(x0*x3*x4)) + (4.6*(x1*x3*x4)) );" noise) (arc.run rql model:"elpisoidP03" G E R F "model(2.92 + (1.0*(x0*x0)) + (2.0*(x1*x1)) + (3.0*(x2*x2)) );" noise) (arc.run rql model:"hiddenModelP031" G E R F "model(1.57 + (2.13*sin(x2)));" noise) (arc.run rql model:"cyclicSeriesP03" G E R F "model(3.25 + (14.65*sin(x0)) + (-6.73*cos(x1)) + (-18.35*tan(x2)) );" noise) (arc.run rql model:"hyperTangentP03" G E R F "model(-1.57 + (1.57*tanh(x0*x0*x0)) + (-39.34*tanh(x1*x1*x1)) + (2.13*tanh(x2*x2*x2)) );" noise) (arc.run rql model:"squareRootP03" G E R F "model(1.23 + (1.23*sqroot(x0*x3)) + (-9.16*sqroot(x1*x2)) + (11.27*sqroot(x3*x4)) );" noise) (arc.run rql model:"cosineP03" G E R F "model(0.23 + (14.2*cos(99.521+x1)));" noise) (arc.run rql model:"sineP03" G E R F "model(0.23 + (14.2*sin(99.521/x1)));" noise) (arc.run rql model:"hypertangentP03" G E R F "model(0.23 + (14.2*tanh(99.521*x1)));" noise) (arc.run rql model:"exponentP03" G E R F "model(0.23 + (14.2*exp(x2+x1)));" noise) ;; Example Test Problems: (with user entered maximum run times) (arc.setOptions regress: true 10%) (setq fitness nlse:)(setq T t:)(setq B 5)(setq noise 0%) (arc.run "regress( (c0*f0(f1(v0))),(c1*f0(f1(v1))),(c2*f0(f1(v2))),(c3*f0(f1(v3))),(c4*f0(f1(v4))) );" model: "hyperTangent" 200 .01 10000 5 "model(1.57 + (1.57*tanh(cube(x0))) + (-39.34*tanh(cube(x1))) + (2.13*tanh(cube(x2))) + (46.59*tanh(cube(x3))) + (11.54*tanh(cube(x4))));" noise) (arc.run "regress( (c0*f0(f1(t1,t2),f2(t3,t4))),(c5*f3(f4(t6,t7),f5(t8,t9))),(c10*f6(f7(t11,t12),f8(t13,t14))),(c15*f9(f10(t16,t17),f11(t18,t19))),(c20*f12(f13(t21,t22),f14(t23,t24))) );" fitness T B model: "hyperTangent" 200 .01 10000 5 "model(1.57 + (1.57*tanh(cube(x0))) + (-39.34*tanh(cube(x1))) + (2.13*tanh(cube(x2))) + (46.59*tanh(cube(x3))) + (11.54*tanh(cube(x4))));" noise) (arc.run "regress( (c0*f0(f1(v0,v1),f2(v2,v3))),(c5*f3(f4(v4,v5),f5(v6,v7))),(c10*f6(f7(v8,v9),f8(v10,v11))),(c15*f9(f10(v12,v13),f11(v14,v15))),(c20*f12(f13(v16,v17),f14(v18,v19))) );" fitness T B model: "hyperTangent" 200 .01 10000 5 "model(1.57 + (1.57*tanh(cube(x0))) + (-39.34*tanh(cube(x1))) + (2.13*tanh(cube(x2))) + (46.59*tanh(cube(x3))) + (11.54*tanh(cube(x4))));" noise) ;; Example Test Problems: (with user entered goals) (arc.setOptions regress: true 10%) (setq fitness nlse:)(setq T t:)(setq B 5)(setq noise 0%) (arc.run "regress(universal(3,5,v)) where { op(noop,+,-,*,/,square,cube,cos,sin,tan,tanh) ff(noop) ef() ev() f0(noop) v0(x0) v8(x1) v16(x2) v24(x3) v32(x4)};" model: "linearRegression" 200 .01 10000 5 "model(1.57 + (1.57*x0) + (-39.34*x1) + (2.13*x2) + (46.59*x3) + (11.54*x4));" noise) (arc.run "regress(universal(3,5,v)) where { op(noop,+,-,*,/,square,cube,cos,sin,tan,tanh) ff(noop) ef() ev() f0(cube) f7(cube) f14(cube) f21(cube) f28(cube) v0(x0) v8(x1) v16(x2) v24(x3) v32(x4)};" model: "cubicRegression" 200 .01 10000 5 "model(1.57 + (1.57*cube(x0)) + (-39.34*cube(x1)) + (2.13*cube(x2)) + (46.59*cube(x3)) + (11.54*cube(x4)));" noise) (arc.run "regress(universal(0,5,x));" model: "linearRegression" 200 .01 10000 5 "model(1.57 + (1.57*x0) + (-39.34*x1) + (2.13*x2) + (46.59*x3) + (11.54*x4));" noise) (arc.run "regress(universal(3,1,t));" model: "cubicRegression" 200 .01 10000 5 "model(1.57 + (1.57*cube(x0)) + (-39.34*cube(x1)) + (2.13*cube(x2)) + (46.59*cube(x3)) + (11.54*cube(x4)));" noise) ;; Example Test Problem: (with multiple search clauses) (arc.setOptions regress: true 10%) (setq fitness nlse:)(setq T t:)(setq D 1)(setq B 5)(setq noise 0%) (setq rql "search regress(universal(3,1,t)) where {op(noop,+,-,*,/,square,cube,cos,sin,tan,tanh) config(smart,standard,10,25,10,10)} search regress((tan(x1)/c0)+(tan(x3)/tan(x4))) where {op(noop,+,-,*,/,square,cube,cos,sin,tan,tanh) config(smart,standard,10,25,10,10)} search book['regress(universal(3,5,t)) where { f0(cos,sin,tan,tanh) f1(*) f2(*) f3(*) ef(f0) ev(v0,v2,v3) ec(c1) et() config(smart,standard,1000,100,50,1000) op(noop,+,-,*,/,abs,cos,sin,tan,tanh,sqroot,square,cube,quart,exp,ln)}', 'regress(universal(3,5,t)) where { f0(cos,sin,tan,tanh) f1(*) f2(*) f3(noop,*) ef(f0) ec(c1) et() eb(b0) config(smart,standard,100,100,50,100) op(noop,+,-,*,/,abs,cos,sin,tan,tanh,sqroot,square,cube,quart,exp,ln)}', 'regress(universal(3,5,t)) where { f0(noop,*,/) f1(cos,sin,tan,tanh) f2(*) f4(cos,sin,tan,tanh) f5(*) ef(f0,f1,f4) ev(v1,v5) ec(c0,c4) et() config(smart,standard,1000,100,50,1000) op(noop,+,-,*,/,abs,cos,sin,tan,tanh,sqroot,square,cube,quart,exp,ln)}', 'regress(universal(3,5,t)) where { f0(+,-,*,/) f1(+,-,*,/) f2(cos,sin,tan,tanh) f3(cos,sin,tan,tanh) f4(+,-,*,/) f5(cos,sin,tan,tanh) f6(cos,sin,tan,tanh) ev(v0,v2,v4,v6) ec() et() eb(b0) config(smart,standard,1000,100,100,1000) op(noop,:,+,-,*,/,square,sqroot,cube,cos,sin,tan,tanh,ln,exp)}', 'regress(universal(3,5,t)) where { f0(noop,*,+,/) f1(noop,+) f2(*,psqrt,psquare,pcube,pquart) f3(*,psqrt,psquare,pcube,pquart) f4(noop,+) f5(*,psqrt,psquare,pcube,pquart) f6(*,psqrt,psquare,pcube,pquart) ec(c0,c2,c4,c6) ev(v1,v3,v5,v7) et() eb(b0) config(smart,standard,1000,100,100,100) op(noop,+,-,*,/,abs,cos,sin,tan,tanh,sqroot,square,cube,quart,exp,ln,psqrt,psquare,pcube,pquart)}', 'regress(universal(3,5,t)) where { config(smart,standard,1000,256,200,1000) op(noop,:,+,-,*,/) }' ];") (arc.run rql model:"ratioTest" 20 .05 10000 5 "regress((tan(x1)/2.84)+(tan(x3)/tan(x4)));" noise) ;; Example Test Problem: (using constant constraints for weights regression) (arc.setOptions regress: true 100%) (setq fitness nlse:)(setq T t:)(setq D 1)(setq B 5)(setq R 50000)(setq F 5)(setq G 10)(setq E .0001)(setq noise 0%) (setq rql (append "model(c0*x0,c1*x1,c2*x2,c3*x3,c4*x4)" _eol " where {fitness(nlse) config(smart,standard,10,25,0,0,200)" _eol " c0(0.0,1.0,.1) c1(0.0,1.0,.1) c2(0.0,1.0,.1) c3(0.0,1.0,.1) c4(0.0,1.0,.1)}")) (arc.run rql model:"weightRegression" G E R F "model(0.53*x0,0.26*x1,0.11*x2,0.90*x3,.6*x4);" noise) ;; Example Test Problem: (using sigmoid and nlse for non-linear regression) (arc.setOptions regress: true 100%) (setq fitness nlse:)(setq B 5)(setq T t:)(setq R 10000)(setq F 5)(setq G 10)(setq D 1)(setq E .0001)(setq noise 0%) (setq rql (append "model(sig((c0*x0)+(c1*x1)+(c2*x2)+(c3*x3)+(c4*x4)))" _eol " where {fitness(nlse) config(smart,standard,1,25,200,0,200)" _eol " c0(0.0,1.0,.1) c1(0.0,1.0,.1) c2(0.0,1.0,.1) c3(0.0,1.0,.1) c4(0.0,1.0,.1)}")) (arc.run rql model:"sigmoidRegression" G E R F "model(sig((0.53*x0)+(0.26*x1)+(0.11*x2)+(0.90*x3)+(.6*x4)));" noise) ;; Example Test Problems: (using import from external data files) (arc.setOptions regress: true 10%) (setq D 2)(setq B 8)(setq T v:)(setq fitness nlse:)(setq F 8)(setq G -1)(setq C 0)(setq RD true)(setq OP #void)(setq E .0000000001)(setq noise 0%) (setq rql (arc.generateRQL extreme: fitness B D T F C RD OP)) (arc.run rql import: "TestSmallRegress" G E "TestSmallRegress_Train.xls" "TestSmallRegress_Test.xls" "TestSmallRegress_Output.xls") ;; Example Test Problems: (using import from external data files) (arc.setOptions regress: true 10%) (setq D 2)(setq B 8)(setq T v:)(setq fitness nlse:)(setq F 8)(setq G -1)(setq C 0)(setq RD true)(setq OP #void)(setq E .0000000001)(setq noise 0%) (setq rql (arc.generateRQL extreme: fitness B D T F C RD OP)) (arc.run rql import: "TestBigRegress" G E "TestBigRegress_In.xls") ;; Example Test Problem: (using single correlated linear regression) (arc.setOptions regress: true 100%) (setq D 0)(setq B 67)(setq T v:)(setq fitness nmae:)(setq F 67)(setq G 1)(setq C 0)(setq RD true)(setq OP "best")(setq E .0001)(setq noise 0%) (setq rql (arc.generateRQL linear: fitness B D T F C RD OP)) (arc.run rql import: "CreditLinear" G E "Test_Credit_Train.xls" "Test_Credit_Test.xls" "Test_Credit_Output.xls") ;; Example Test Problem: (using Koza style pareto front symbolic regression) (arc.setOptions regress: true 100%) (setq D 10)(setq B 1)(setq T v:)(setq fitness nmae:)(setq F 8)(setq G 100)(setq C 0)(setq RD false)(setq OP #void)(setq E .0001)(setq noise 0%) (setq rql (arc.generateRQL pareto: fitness B D T F C RD OP)) (arc.run rql import: "KozaPareto" G E "TestBigRegress_Train.xls" "TestBigRegress_Test.xls" "TestBigRegress_Output.xls") ;; Example Test Problem: (using Hornby style age layered symbolic regression) (arc.setOptions regress: true 100%) (setq D 10)(setq B 1)(setq T v:)(setq fitness nmae:)(setq F 8)(setq G 100)(setq C 0)(setq RD false)(setq OP #void)(setq E .0001)(setq noise 0%) (setq rql (arc.generateRQL aged: fitness B D T F C RD OP)) (arc.run rql import: "HornbyAgeLayered" G E "TestBigRegress_Train.xls" "TestBigRegress_Test.xls" "TestBigRegress_Output.xls") ;; Example Test Problem: (using baseline style age pareto front symbolic regression) (arc.setOptions regress: true 100%) (setq D 10)(setq B 5)(setq T v:)(setq fitness nmae:)(setq F 8)(setq G 100)(setq C 0)(setq RD true)(setq OP #void)(setq E .0001)(setq noise 0%) (setq rql (arc.generateRQL baseline: fitness B D T F C RD OP)) (arc.run rql import: "Baseline" G E "TestBigRegress_Train.xls" "TestBigRegress_Test.xls" "TestBigRegress_Output.xls") ;; ============================ ;; CLASSIFICATION TEST PROBLEMS ;; ============================ ;; Example Multi-class Classification Test Problems: (used for a future GPTP paper studying extreme accuracy in discriminent analysis problems) ;; Note: Each set of multi-class classification test cases has mutiple comparitive RQL search strategies ;; Classify sampling 10% verbose Rows Features Hours Error Noise (arc.setOptions classify: true 10%)(setq R 1000)(setq F 25)(setq G 30)(setq E .0000001)(setq noise 0%) ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL lda: ecep: 5 1 v: F 5 false default: 10 "" "" "" )) (arc.run rql model:"TestCaseC01" G E R F "argmax((1.57*x0),(-39.34*x1),(2.13*x2),(46.59*x3),(11.54*x4));" noise) (arc.run rql model:"TestCaseC02" G E R F "argmax((1.57*x0),(-39.34*x1),(9.0*x1)+(2.13*x2),(-10.0*x0)+(46.59*x3),(11.54*x4));" noise) (arc.run rql model:"TestCaseC03" G E R F "model((x0<=0.0?(x4<=-2.0?0.0:1.0):(x2>=1.24?(x1>1.0?2.0:3.0):4.0)));" noise) (arc.run rql model:"TestCaseC04" G E R F "argmax((1.57*cos(x0)),(-39.34*sin(x1)),(2.13*square(x2)),(46.59*minimum(x3,x0)),(11.54*(x2/x4)));" noise) ;; Sample problems in deep multinomial classification ;; Genrate RQL search = Style Fitness Bases Depth Term Features Classes Reduce Operators Delay Seed Fieldlist typeRules (setq rql (arc.generateRQL lda: ecep: 5 1 v: F 5 false default: 10 "" "" "" )) (arc.run rql model:"TestCaseC05" G E R F (append "argmax(summarize(-3.57*x0,23.5*x1,-32.5*x2,23.5*x3,3.5*x4)," "summarize(23.57*x0,3.5*x1,2.5*x2,-23.5*x3,1.5*x4)," "summarize(-11.57*x0,23.5*x1,2.5*x2,23.5*x3,6.0*x4)," "summarize(-10.57*x0,2.3*x1,-32.5*x2,23.5*x3,5.2*x4)," "summarize(22.0*x0,1.9*x1,-12.5*x2,23.5*x3,23.5*x4));") noise) ;; Example Multinomial Classification Test Problems: (used for a future GPTP paper studying extreme accuracy in CART and TREE problems) ;; Note: Each set of multi-class classification test cases has mutiple comparitive RQL search strategies (arc.setOptions classify: true 100%) (setq fitness ecep:)(setq D 2)(setq B 25)(setq C 5)(setq RD false)(setq OP #void)(setq R 1000)(setq F 5000)(setq T v:)(setq G 30)(setq E .0000000001)(setq noise 0%) ;; Sample problems in shallow multinomial classification (setq rql (arc.generateRQL lda: fitness B D T F C RD OP)) (arc.run rql model:"TestCaseC06" G E R F "model((x0<=0.0?(x9<=-2.0?0.0:1.0):(x4<=1.24?2.0:3.0)));" noise) (arc.run rql model:"TestCaseC07" G E R F "model((x11<=x9?(x17<=0.0?0.0:1.0):(x4<=maximum(x3,x7)?2.0:3.0)));" noise) (arc.run rql model:"TestCaseC08" G E R F "model((x2<=square(x5)?(x3>=minimum(x0,x13)?0.0:1.0):(x6>=-3.3?2.0:3.0)));" noise) (arc.run rql model:"TestCaseC09" G E R F "model((x14<=x21?(x5<=tanh(x4/x6)?0.0:1.0):(x18<=(x3*x7)?2.0:3.0)));" noise) (arc.run rql model:"TestCaseC10" G E R F "model((x9<=sin(x2/x19)?(x22<=curoot(x4)?0.0:1.0):(x19>=tanh(minimum(x24,x3))?2.0:3.0)));" noise) ;; Example Multinomial Classification Industry|Academia Problems: (used for a future GPTP paper studying extreme accuracy in CART and TREE problems) ;; Note: Each set of multi-class classification test cases has mutiple comparitive RQL search strategies (arc.setOptions classify: true 100%) (setq fitness ecep:)(setq D 2)(setq R 1000)(setq T v:)(setq RD false)(setq OP "default")(setq G 30)(setq E .0000000001)(setq noise 0%) (setq F 4)(setq B 4)(setq C 2)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "TestIris" G E "TestIris_Train.xls" "TestIris_Test.xls" "TestIris_NetNL_Output.xls") (setq F 5000)(setq B 25)(setq C 5)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "TestClassify" G E "TestClassify_Train.xls" "TestClassify_Test.xls" "TestClassifyLDA_Output.xls") (setq F 7129)(setq B 25)(setq C 2)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "TestAmlAllLeukemia" G E "TestAmlAll_Leukemia_Train.xls" "TestAmlAll_Leukemia_Test.xls" "TestAmlAll_Leukemia_Output.xls") (setq F 16)(setq B 16)(setq C 2)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "TestBank" G E "Test_Bank_Train.xls" "Test_Bank_Test.xls" "Test_Bank_Output.xls") (setq F 5)((setq B 5)setq C 7)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "Test_Volatility" G E "Test_Volatility_Train.xls" "Test_Volatility_Test.xls" "Test_Volatility_Output.xls") (setq F 5)(setq B 5)(setq C 2)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "Test_VolatilityB" G E "Test_VolatilityB_Train.xls" "Test_VolatilityB_Test.xls" "Test_VolatilityB_Output.xls") (setq F 13)(setq B 13)(setq C 2)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "TestHeart" G E "TestHeart_Train.xls" "TestHeart_Test.xls" "TestHeart_Output.xls") (setq F 13)(setq B 13)(setq C 11)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "TestUCI_Vowels" G E "TestUCI_Vowels_Train.xls" "TestUCI_Vowels_Test.xls" "TestUCI_Vowels_Output.xls") (setq F 8)(setq B 8)(setq C 3)(setq rql (arc.generateRQL class: fitness B D T F C RD OP))(arc.run rql import: "TestYeast" G E "TestYeast_Train.xls" "TestYeast_Test.xls" "TestYeast_Output.xls") ;; Example Test Problem: (using export to create classification test files) ;; RootFileName TrainRows TestRows Features xConvert yConvert Noise Model (arc.makeTrainTestFiles ExportTestClassify: 1000 1000 5000 none: none: 0% "argmax((1.57*x0),(-39.34*x1),(2.13*x2),(46.59*x3),(11.54*x4));") ;; ================================== ;; USER SPECIFIED TYPES TEST PROBLEMS ;; ================================== ;; Example Test Problem: (using Koza style pareto front symbolic regression) (arc.setOptions regress: true 100%) (setq D 10)(setq B 1)(setq T v:)(setq fitness nmae:)(setq F 8)(setq G 10)(setq C 0)(setq RD false)(setq OP "op(noop,+,-,*,/)")(setq E .0001)(setq noise 0%) (setq rql (arc.generateRQL pareto: fitness B D T F C RD OP)) (setq rql "search regress(universal(1,1,v)) where {fitness(nmae) config(pareto,standard,256,25,00,10,10,100,10,ec,bc,10,2) champion(standard,10,25,5,5,reduce,0.0000000001,0.0000000001) op(noop,inv,abs,sqroot,square,cube,curoot,quart,quroot,exp,ln,cos,sin,tan,tanh,+,-,*,/,maximum,minimum,<=,>=,lif,lor,land) onscore(0.0,160) type( widget,'x1:x2:x3:(widget+widget):(widget-widget):(widget*widget):(widget/widget):max(widget,widget)', widget,'(widget+Number):(widget-Number):(widget*Number):(widget/Number):max(widget,Number)', widget,'(Number+widget):(Number-widget):(Number*widget):(Number/widget):max(Number,widget)', screw,'x0:x4:x5:x6:x7:(screw+screw):(screw-screw):(screw*screw):(screw/screw):max(screw,screw)', screw,'(screw+Number):(screw-Number):(screw*Number):(screw/Number):max(screw,Number)', screw,'(Number+screw):(Number-screw):(Number*screw):(Number/screw):max(Number,screw)', screwsPerWidget,'(screw/widget):(screwsPerWidget+screwsPerWidget):(screwsPerWidget-screwsPerWidget)', screwsPerWidget,'(screwsPerWidget*screwsPerWidget):(screwsPerWidget/screwsPerWidget):max(screwsPerWidget,screwsPerWidget)', Number,'(Number+Number):(Number-Number):(Number*Number):(Number/Number):max(Number,Number)' ) }") (arc.run rql import: "KozaPareto" G E "TestBigRegress_Train.xls" "TestBigRegress_Test.xls" "TestBigRegress_Output.xls") true