Data Mining & Machine Learning

A nice datamining tutorial provided by Luis Torgo can be found at  : 1. Data mining with R : http://www.liaad.up.pt/~ltorgo/DataMiningWithR/ 2. AI Access : http://www.aiaccess.net/  

Comprehensive Latex symbol list can be found here.

Machine learning vs Statistics : A nice comparison with different terminologies used along with advantages and disadvantages. George Mason University : Tutorial

T
ext minining reseources:

1. Solr : an Apache based search engine on similar lines to SRS.
Link 1: http://www.slideshare.net/teofili/apache-solr-crash-course

Link 2 : http://yonik.com/solr/getting-started/

Data Wrangling in R

D
ata Cleaning and Processing Rules:

*. Eyeball the data to check any weird characters.

*. Remove white spaces from each column of the matrix.

*. Remove empty lines by sorting the data.

*. Remove the data that does not require processing (clean data)

*. Either consistently use json format or xml format.

*. If data does not fit in the memory, use file IO explicitly to monitor progress of the process.

*. If using R use high performance libraries such as foreach all the time.

*. Data management is better done in SQL languages for reproducibility reasons.

*. Set higher priorities for the processes using “renice” or “nice” or use SU privileges  when running processes.

*. Use tools such as json2csv, csvkit, xml2json, xmlstarlet for data manipulation in UNIX.

*. For visualization use R as de factor standard.

Programming Resources:
SHELL:

*. SED 

*. 10 Easy steps to understanding SQL

Comments are closed.