Sunday, 11 August 2013

How to separate my huge dataset into bins and average them or aggregate data by Dates in R?

How to separate my huge dataset into bins and average them or aggregate
data by Dates in R?

I'm trying to develop a program to allow visualization of big data in
graphs. Basically, the idea is that I can input a huge dataset and output
a line graph in which I can actually see the trends.
Here is my idea (please let me know if there are already algorithms like
this built into R or in a package, as I realize this is a very basic or
'primitive' way of aggregating data. I also don't want to use sample()
because I am specifically looking for trends in data. I realize that there
is always going to be a trade-off between accuracy of data and ease of
data representation in this case.):
Let's say I have a standard csv dataset of 10,000 numeric rows (columns
representing variables).I want to create a resultant dataset that takes
this huge dataset and separates it into 20-30 bins, each bin representing
a datapoint that is the average of a certain number of data points in the
big dataset. For example, if I had 10 bins, each bin would be the average
of 1,000 datapoints.
Here is my code:
average <- function(dataf)
{
numericdata <- dataf[,sapply(dataf,is.numeric)]
***mean(numericData, trim = 0, na.rm = TRUE)
}
x <- names(numericData)
real <- ddply(diamonds, .(x), average)
***I do not know what to do here. Here is the place where I want to
separate the numbericdata into a certain number of bins, in which the data
in each bin will be averaged out.
On another important note, most of my datasets that I input will have Date
variables (this is why I mentioned a line graph). The mean() function only
works on numeric data, so how could I average out a time column? By
averaging out, I mean that the time column was in YYYY-MM-DD format, I can
aggregate the days and graph the data by month. If this is the case, then
I would not even have to worry about averaging the other columns! How can
I do this?
Thanks for any input, and sorry for the long post, I felt like I needed to
provide all the necessary information.

No comments:

Post a Comment