Data Mining & Machine Learning

A nice datamining tutorial provided by Luis Torgo can be found at  : 1. Data mining with R : 2. AI Access :  

Comprehensive Latex symbol list can be found here.

Machine learning vs Statistics : A nice comparison with different terminologies used along with advantages and disadvantages. George Mason University : Tutorial

ext minining reseources:

1. Solr : an Apache based search engine on similar lines to SRS.
Link 1:

Link 2 :

Data Wrangling in R

ata Cleaning and Processing Rules:

*. Eyeball the data to check any weird characters.

*. Remove white spaces from each column of the matrix.

*. Remove empty lines by sorting the data.

*. Remove the data that does not require processing (clean data)

*. Either consistently use json format or xml format.

*. If data does not fit in the memory, use file IO explicitly to monitor progress of the process.

*. If using R use high performance libraries such as foreach all the time.

*. Data management is better done in SQL languages for reproducibility reasons.

*. Set higher priorities for the processes using “renice” or “nice” or use SU privileges  when running processes.

*. Use tools such as json2csv, csvkit, xml2json, xmlstarlet for data manipulation in UNIX.

*. For visualization use R as de factor standard.

Programming Resources:

*. SED 

*. 10 Easy steps to understanding SQL

Comments are closed.