A nice datamining tutorial provided by Luis Torgo can be found at : 1. Data mining with R : http://www.liaad.up.pt/~ltorgo/DataMiningWithR/ 2. AI Access : http://www.aiaccess.net/ Machine learning vs Statistics : A nice comparison with different terminologies used along with advantages and disadvantages. George Mason University : Tutorial
Text minining reseources:
1. Solr : an Apache based search engine on similar lines to SRS.
Link 1: http://www.slideshare.net/teofili/apache-solr-crash-course
Link 2 : http://yonik.com/solr/getting-started/
Data Wrangling in R
Data Cleaning and Processing Rules:
*. Eyeball the data to check any weird characters.
*. Remove white spaces from each column of the matrix.
*. Remove empty lines by sorting the data.
*. Remove the data that does not require processing (clean data)
*. Either consistently use json format or xml format.
*. If data does not fit in the memory, use file IO explicitly to monitor progress of the process.
*. If using R use high performance libraries such as foreach all the time.
*. Data management is better done in SQL languages for reproducibility reasons.
*. Set higher priorities for the processes using “renice” or “nice” or use SU privileges when running processes.
*. Use tools such as json2csv, csvkit, xml2json, xmlstarlet for data manipulation in UNIX.
*. For visualization use R as de factor standard.
*. 10 Easy steps to understanding SQL
Data Mining & Machine Learning
Comments are closed.