General Training Process Training Data Pre-Processing

General

What if the answer to my question is not found in this FAQ?
Please direct your question to [email protected]
What are the charges for INFXL deep learning foundry service?

This online service is free to try.

The price of a deep net trained through this automated service is USD 999.

Charges for projects that cannot be handled through this online service vary with the size and complexity of the dataset and will be communicated to you for approval before the start of the project.

We provide limited support through email at no cost to you. Our consultants will be pleased to provide you with detailed support on data selection and preprocessing at a rate of USD 200 per hour. Consulting charges will be communicated to you for approval in advance.

Please contact us at [email protected] with the details of your project and we will be happy to provide you with a quote.

How do you ensure privacy on this portal?
You can ensure your privacy by following the steps below:
  1. Connect to the portal through a VPN.
  2. Use a newly created gmail address as your username.
  3. Use random filenames for the files that you upload.

In case you pay with a credit card, please note that we process it through a third-party payment processor, Stripe. The only info about you that the payment-processor shares with INFXL is the email address that you enter with your credit card info.

What is the 'COST' that you have mentioned in the C code file?
COST is an indicator of the number of assignment and addition ops that the network has to do during a single forward pass. A lower COST means faster operation, lower energy consumption, and lower memory requirement.
Can you give us any idea about the power consumption?
We can mention some numbers for the INFXL deep net trained on the Human Activity Recognition dataset. The forward pass on a Cortex M4 consumes 1 µW. On a low-power FPGA, speed is 10x and consumption is around 200 nW.

Training Process

Why do you have limits on file sizes? What if I want to train with a bigger dataset?
We have tested our deep learning engine on much bigger datasets, having 10's of thousands of inputs and GiBs of data. We have placed a limit on the file size for the free-to-try online service so as not to overburden our servers.
Why a 24-hour turn around time? Can you build deep nets faster?
The exact turn-around time depends on the size of the dataset and complexity of the underlying phenomenon. 24-hour is just a number that we can come up with to give us enough time to take care of the expected load on our servers.
Are you using any regularization techniques while training the INFXL deep net?
We are using several, rather strong proprietary regularization heuristics to limit the complexity of the INFXL deep net. Although the regularization heuristics that we normally use include early-stopping, early-stopping is not being used in this iteration of our cloud offering for the sake of simplicity.

Training Data Pre-processing

Why all features and labels have to be mapped to integers?
The inputs, outputs, and all data paths within the INFXL deep net are 8-bit wide. This design choice has lead to reduced power consumption, faster speeds, and reduced memory needs.
Do you require balanced datasets?
We do not require balanced datasets. However, for best results, the training data should include an equal representation of all classes.
What to include and exclude from features and labels files?
Do not include column labels or record (or row) numbers. Make sure that no values are missing. These files must consist of digits, commas and newline characters only.
How to pre-process binary columns in the features files?
Map 'false' values to -127 and 'true' to 127.
How to pre-process continuous-valued columns in the features files?
Continuous-valued columns must be mapped to the range [-127, 127]. For example, if continuous values are in the range [0, 1], they are to be mapped using int(round(254 * value - 127)).
How to pre-process categorical columns in the features files?
Non-binary categorical columns should be exploded into several columns, with each of the new columns representing a single category. For example, a column for highest educational qualification {HS, Bach, Mast, PhD} should be replaced by four columns, one each for 'HS', 'Bach', 'Mast', and 'PhD'. In any feature vector, only one of these columns must show 'true', while the remaining must be set to 'false'
How to pre-process labels file?
Map 'false' values to -127 and 'true' to 127. The number of columns must match the number of classes: two columns for a two-class problem, three for a three-class problem, and so on.