Build a C50 model.

Parameters:

Parameters:
See dedicated page for more information.
C50 models are included for completeness and educational purpose, In pretty much all situations, you will prefer to use a CRT model.
In theory, C50 sounds like an attractive model as it can have any number of branches and appears to give models with higher accuracy.
In practice, C50 are much slower and hungry in resources, and you will need to select a sample in order to build a simple model on 200.000 rows. C50 also have a bad tendency to overfit the data, thus creating accurate but unreliable models.
C5.0 algorithm is an extension of C4.5 algorithm. C5.0 is the classification algorithm which applies in big data set. C5.0 is better than C4.5 on the efficiency and the memory. C5.0 model works by splitting the sample based on the field that provides the maximum information gain. The C5.0 model can split samples on basis of the biggest information gain field. The sample subset that is get from the former split will be split afterward. The process will continue until the sample subset cannot be split and is usually according to another field. Finally, examine the lowest level split, those sample subsets that don’t have remarkable contribution to the model will be rejected.
Gain is computed to estimate the gain produced by a split over an attribute
Let S be the sample:
Gain ratio then chooses, from among the tests with at least average gain, The Gain Ratio= P(A)

Gain Ratio(A)= Gain(A)/P(A)
(International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 4, June - 2012 ISSN: 2278-0181)
Parameters:

