% Description of the German credit dataset.%% 1. Title: German Credit data%% 2. Source Information%% Professor Dr. Hans Hofmann% Institut f'ur Statistik und 'Okonometrie% Universit'at Hamburg% FB Wirtschaftswissenschaften% Von-Melle-Park 5% 2000 Hamburg 13%% 3. Number of Instances: 1000%% Two datasets are provided. The original dataset, in the form provided% by Prof. German Credit Data Set Arff Firefighter Resume. “Bad” and “Good”. We can see above (code for Figure ) that the German credit data is a case of unbalanced dataset with of the individuals being classified as having good credit. Therefore, the accuracy of a classification model should be superior to, which would be the accuracy of a.
Here are some small programs purporting to show the versatility of the Weka data mining/machine learning system and what it can do. I will not explain everything (in fact, I will not explain very much at all). At the Weka site http://www.cs.waikato.ac.nz/~ml/weka/index.html you can read more about the system as well as downloading it.Also see:Weka MOOC's:
- Advanded Data Mining with Weka
< />
Here you can see some of the algorithms in the works, as well as using different data sets (and providing one of your own in ARFF data format) .
Source files: WekaApplet1.java, Weka1.java
Applet using (some of) the options of the J48 algorithm.
Source files: WekaJ48Applet.java, WekaJ48.java
Very simple text classification applets: Source files: TextClassifierApplet.java, TextClassifier.java
- Copy the file ExpandFreqField.java to the Weka directory
weka/filters/unsupervised/instance
- Add the following line in the file
weka/gui/GenericObjectEditor.props
together with other filters.unsupervised.instance filters:weka.filters.unsupervised.instance.ExpandFreqField,
(don't forget the trailing '). - Compile the Java file
- Start Weka Explorer
Some of my other pages about Weka
Also see the following pages on my site mentioning Weka.- If you can read Swedish (or are courageous) you may see my data mining presentation pages, wheresome of the basic principles and algorithms of machine learning and data miningare explained.
- The badge problem which is an analysis of a (recreational) data set, using Weka.
- My Data Mining, Machine Learning etc page.
ARFF data files
The data file normally used by Weka is in ARFF file format, which consist of special tags to indicate different things in the data file (mostly: attribute names, attribute types, attribute values and the data).Here is a list of some ARFF-file you can use, many are standard data sets oftenused in the machine learning community. Most of them are available from theWeka site. Many of them are also described and downloadable from http://www.ics.uci.edu/~mlearn/MLRepository.html.
If you click on the link in the list below you can see for yourself what the data set looks like. Please note that some files arequite big, and for some algorithms it will take a lot of time (often a lot of time!). The number in parenthesis is the size in bytes. In some of the files there are quite good comments for the data set, other has no explanationat all (they are probably converted from some other source by myself).
One more thing: The class attribute (i.e. the attribute we want to learn) mustbe the last.
- http://www.hakank.org/weka/zoo2_x.arff (6296)
- http://www.hakank.org/weka/golf.arff (383)
- http://www.hakank.org/weka/cpu.arff (6936)
- http://www.hakank.org/weka/sunburn.arff (573)
- http://www.hakank.org/weka/wine.arff (13790)
- http://www.hakank.org/weka/iris_discretized.arff (12390)
- http://www.hakank.org/weka/shape.arff (296)
- http://www.hakank.org/weka/titanic.arff (42322)
- http://www.hakank.org/weka/disease.arff (457)
- http://www.hakank.org/weka/labor_discretized.arff (9595)
- http://www.hakank.org/weka/zoo.arff (9408)
- http://www.hakank.org/weka/monk3.arff (1944)
- http://www.hakank.org/weka/monk2.arff (2602)
- http://www.hakank.org/weka/monk1.arff (1972)
- http://www.hakank.org/weka/credit.arff (23254)
- http://www.hakank.org/weka/contact-lenses.arff (2890)
- http://www.hakank.org/weka/iris.arff (7486)
- http://www.hakank.org/weka/labor.arff (8255)
- http://www.hakank.org/weka/weather.arff (489)
- http://www.hakank.org/weka/weather.nominal.arff (587)
- http://www.hakank.org/weka/BC.arff (25063)
- http://www.hakank.org/weka/G2.arff (8125)
- http://www.hakank.org/weka/GL.arff (10504)
- http://www.hakank.org/weka/HD.arff (22564)
- http://www.hakank.org/weka/HE.arff (8639)
- http://www.hakank.org/weka/HO.arff (29907)
- http://www.hakank.org/weka/IR.arff (4919)
- http://www.hakank.org/weka/LA.arff (4817)
- http://www.hakank.org/weka/LY.arff (11150)
- http://www.hakank.org/weka/SO.arff (8068)
- http://www.hakank.org/weka/V1.arff (31252)
- http://www.hakank.org/weka/VO.arff (33016)
- http://www.hakank.org/weka/auto93.arff (13617)
- http://www.hakank.org/weka/tic-tac-toe.arff (26569)
- http://www.hakank.org/weka/prnn_virus3.arff (6657) From Pattern Recognition and Neural Networks' by B.D. Ripley
- http://www.hakank.org/weka/prnn_viruses.arff (7672) From Pattern Recognition and Neural Networks' by B.D. Ripley
- http://www.hakank.org/weka/tic-tac-toe.arff (26569)
- http://www.hakank.org/weka/badges_plain.arff (11262) (see my analysis of this data set here)
- http://www.hakank.org/weka/badges2.arff (21295) (see my analysis of this data set here)
German Credit Data Set Arff Downloads
The following data sets are quite large:- http://www.hakank.org/weka/spambase.arff (700661)
- http://www.hakank.org/weka/spambase_real.arff (700659)
- http://www.hakank.org/weka/ticdata_categ.arff (1012920) (Caravan data)
- http://www.hakank.org/weka/exper1.arff (106047)
- http://www.hakank.org/weka/soybean.arff (202935)
- http://www.hakank.org/weka/CH.arff (483568)
- http://www.hakank.org/weka/HY.arff (336201)
- http://www.hakank.org/weka/MU.arff (743765)
- http://www.hakank.org/weka/SE.arff (337512)
- http://www.hakank.org/weka/kropt.arff (532550)
ARFF versions of DASL data
DASL - The Data and StoryLibrary is a great collection of data sets, with backgroundstories and some analysis. For ARFF versions of these data sets, see ARFF versions ofDASL data sets.Related pages:- My Eureqa page: Eureqa is a great tool for symbolic regression
- My JGAP page, I have written my own symbolic regression program using JGAP (Java)
German Credit Data Set Arff Download Free
Back to my homepageCreated by Hakan Kjellerstrand hakank@gmail.com