Learning Data

  • Archive
  • RSS
  • reach me

Hello World with NodeCount

This is first step towards learning data. I would be learning and running experiments on data as part of my learning. The intention is to learn

  • Hadoop with Java
  • Hadoop with Ruby
  • Ruby
  • Big Data Analysis
  • Big Data visualization

The first step was to get the dataset and I got my first dataset from SNAP and I ran my first node Count using the Hadoop streaming API.

You many find the source code for mapper and reducer at my github project.

The command that I used to run the mapper and reducer was:

$HADOOP_STREAMING -mapper ‘ruby <mapper_file>’ -reducer ‘ruby <reducer_file>’ -file <mapper_file> -file <reducer_file> -input ‘<path_where_snap_data_is_extracted>’ -output <output_dir_must_not_exist>

Once the process is done, it will create a file in <output_dir>/part-00000 if you have just one reduced(in my case it was one). Else based on your reducers you might have that many part-xxxxx files

This created 739455 data points of pattern (count, node).

I struggled a lot to visualize these many points and posted questions on stackoverflow and quora

I finally used R and ggplot2 to plot them, but still I can’t see all these points, so I would be working to get most of them on graph if not all

I formatted the output file to that I can create a chart out of it. So in the first line of my part-00000. I did the following

count <put a tab> node

then in R

install.packages(“ggplot2”)

library(ggplot2)

  p <- read.table(“<path_to_part-0000_file>”, header=T, sep=”\t”)

d <- ggplot(p, aes(count, node))

png(“nodeCount.png”)

d + geom_point(aes(color=’red’))

dev.off()

 and I got a nice little chart as

Analysis :

  • most of the nodes are more or same connected to same number of nodes
  • there is a rare node which has highest connectivity like to 435 other nodes

It would be interesting to learn that what all we can analyze with such data, so if you know just let me know I would try them

    • #ruby
    • #hadoop
  • 1 year ago
  • 8
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

About

Twitter

loading tweets…

  • RSS
  • Random
  • Archive
  • reach me
  • Mobile
Effector Theme by Pixel Union