-
0What definition of "Unique" will I use to measure song "Uniqueness?"2/10/2015 7:16 PMThere are several ways one can define a measure of "uniqueness" for music; lyrical uniqueness, melodic uniqueness, idea or concept uniqueness (i.e. is a song about love or triumph). Perhaps a few more... The complete measure of a song's uniqueness is probably a combination of all of those things. But it's too expensive to develop instrumentation for measuring all of those. Instead, I will use "unique word's per song" as my uniqueness measure.This approach certainly has its flaws but it's probably not far from the truth. After all a chorus lyric tends to repeat in tandem with chorus melodics and chorus concepts / ideas (the lyric supplies the idea). So while crude, I'll assume that "unique word's per song" is a fair enough proxy for the thing I seek to measure here.
-
1Can multiple dygraphs [r] be stitched together in one browser window to make a sweet infographic?2/10/2015 7:30 PMTheoretically yes - if dygraphs is creating javascript to render it's graphs (which it is) then I would think those same elements and canvas could be reformatted, in a copy/past&tweak fashion, to fit multiple dygraph canvases on one webpage.
-
2What doesn't a master want (in an apprentice)?2/10/2015 7:42 PMSomeone who...
- ... is going to quit
- ... doesn't have the ability to learn independently
- ... doesn't have a desire to know about the subject
- ... doesn't want to teach people (part of learning)
- ... is vague
- ... only tells them things they already know.
- ... will accomplish nothing with entrusted information - impart little impact.
- ... is a douchebag - nobody wants to work with a douchebag!
My in...If I can show a master that I am the opposite of the above I will almost certainly be allowed apprenticeship (accepted into grad school).UPDATE 1:Maybe -
3Where can I get lots of reliable music lyrics?2/11/2015 6:49 PMGoogle provides some lyrics in search as of week of Dec 22 2014LyricFind licenses lyrics to third parties like Pandora, SoundHound, Shazam and others.AZ Lyrics is like the Wikipedia for lyrics - crowd sourced (and open to vandalism?).The MillionSongDatabase points to musiXmatch for lyrics.musiXmatch seems to source its lyric data from a combination of community and direct publisher agreements but does not provide full lyric content for free or in the dataset produced for MillionSongDatabase.Conclusion:No single source (known) offers free, full, direct from source (artist/publisher) lyrics. Perhaps the only way to obtain full lyric data is to scrape two or three lyric aggregation sites and have them crossed referenced/edited for accuracy and, most importantly, disallowed from public or private distribution.UPDATE1:Or perhaps LyricWikia is the Wikipedia of lyrics
-
4What is this project's scope?2/11/2015 6:54 PMDesired: To analyze lyric and duration data for every predominantly english (as in not Gangnam Style) song on the billboard 100 since it's inception.Stretch Goal Scope: To be determined.
-
5When was the Billboard Hot 100 started?2/11/2015 6:56 PMAugust 4, 1958As of the issue for the week ending February 21, 2015, the Hot 100 has had 1,041 different number-one hits.
-
6How does Billboard track who's listening to radio when?2/11/2015 7:05 PMWord of mouth from DJ's, radio hosts and the like coupled with hand filled surveys, and sales data from music stores and music networks formed 'popularity' indicators for a majority of Billboards existence. 'Popularity' sampling from consumer surveys by Arbitron (a consumer research company that became Nielsen Audio), evolved into wearable technologies such as the Portable People Meter that could be given to consumers to wear for one to two years. People are paid to wear this little 'audio spy' for months while it identifies what song, if any, you're listening to.
These audio spies are still used today along with audience metrics from digital sources such as Internet radio to enhance 'popularity' estimates.
In summary, although Billboard's collection methodologies have changed many times since it's inception in 1958 and it's 'popularity' indexes have no doubt been biased to particular metrics at any given time (and indeed have failed to represent widely popular song from certain artists), its ability to fulfill the basic function of gauging a song's relative popularity is adequate for this project - it's a zeitgeist. -
7Where can I find the name of every song ever on the Billboard Hot 100?2/11/2015 7:35 PMWikipedia:1041/57=19 songs a year or, spun another way, is like singing the same song for 2.5 weeks... On second thought - that sounds about right.
-
8Can I use contact info of the last person to edit the Billboard Hot 100 wiki to find someone who can provide the songs list?2/11/2015 9:43 PMHello Ericorbit,
I happened upon the Billboard Hot 100 page you contribute to regularly and was wondering what the source is for your 'different number-one hits' figure. In your revision between '2015-01-01T16:54:54' and '2015-01-07T20:08:07' this figure changed from 1,040 to 1,041. Beyond manually (programmatically) counting entries in https://en.wikipedia.org/wiki/List_of_Billboard_number-one_singles how do you happen across this figure? Minor Googling doesn't reveal a straight answer. Help?
Thank you for your time ~~~~
PS: I'm asking because I'm putting together a song lyric database that I'll analyze with [R] to learn neat things about American music listenership. People who help me get early access to results in two months (yes - that's a music nerd bribe) -
9How do I scrape wiki lists of historic billboard song ratings/titles?2/11/2015 10:32 PMYear end ranks reported on Wikipedia are listed differently in 1958 from those between 1959 and 2014... 2015 will not be available till 2016.
1958... although lists are available as early as 1956
Remaining Wiki urls have the syntax:where the article year ranges from 1959 to 2014
install.packages("XML")
The following [r] code gets most of the way...
library(XML)theurl <- "http://en.wikipedia.org/wiki/Billboard_year-end_top_100_singles_of_1958"tables <- readHTMLTable(theurl)write.csv(tables[[1]] [2:3], file = "Billboard Hot100 1958.csv", quote=FALSE)
baseurl<-"http://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_"for (yr in 1959:2014 ) {#Wikipedia markups such as "This article has multiple issues" will break this loopstringyr <- as.character(yr)theurl <- paste0(baseurl, stringyr)tables <- readHTMLTable(theurl)fileName <- paste0("Billboard Hot100 ", stringyr, ".csv")write.csv(tables[[1]] [2:3], file = fileName, quote=FALSE)} -
10How long would it take to listen to all the Billboard songs?2/11/2015 10:46 PMFigure 3 minute low average estimate, 4:30 high average estimate per song.
Between two and threeish days.Low: 3:00*1041=3123 minutes -> 52 hoursHigh: 4.50*1041=4685 minutes -> 78 hours -
11Should song relationships be plotted as a function of 'Ranking date' or 'Song release date'?2/11/2015 11:04 PMPopularity, after all, is temporal and time does matter. Since a goal of this study is to asses musical appetite (do Americans prefer broccoli or pizza... simple 'repetitive choruses' or complicated, dense verse?) and the capacity for a song to become popular, ranking date will be sought firstly and release date as fall-back.
-
12How long does it take for a song to become ranked (popular)?2/11/2015 11:14 PMCompare release date to rank date when data becomes available.
-
13Is it worth it to subscribe to Billboard magazine to (more) easily get this data?2/11/2015 11:47 PMAlmost certainly. But I'd have to feed the machine that is Billboard. Then again, publishing this research might feed the machine.ANSWER:Unnecessary to subscribe to Billboard magazine.
-
14How do you search reminders in Evernote?2/11/2015 11:57 PMremindertime:*also use the reminder tab
-
15Why doesn't the "Scraping html tables into R data frames using the XML package" StackOverflow answer work?2/12/2015 12:29 AMThis answer uses http in the example. Wikipedia will load in https depending on your browser, Wikipedia itself or who knows what. Removing the 's' from 'https' in the following code made it work
theurl <- "https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_1959"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
Reminder:Add this note to stackoverflow when I earn 50 reputation.
-
16How do I extract a desired list among several in an [r] table?2/12/2015 12:43 AMTry tables[[1]]
As it turns out the answer was also on the stackoverflow answer that recommended readHTMLTable() -
17How do you get table dimensions in [r]?2/12/2015 12:47 AMdim(x)
Examplesx <- 1:12 ; dim(x) <- c(3,4)
x -
18What is an [R] table list type?2/12/2015 12:52 AMThe type of a variable can be found using the typeof() function.
A data frame is a way to take many vectors of different types and store them in the same variable. The vectors can be of all different types. For example, a data frame may contain many lists, and each list might be a list of factors, strings, or numbers.
-
19How do you read output from readHTMLTable [r]?2/12/2015 12:58 AMtables downloaded in r by readHTMLTable() can be accessed by using double brackets i.etables = readHTMLTable(u)tables[[1]]As it turns out the answer was also on the stackoverflow answer that recommended readHTMLTable()
-
20How do I save a table to file [r]?2/12/2015 1:05 AMThere is also
and the inverse, reading, operationwrite.csv(x, file = "foo.csv")
Remove double quotes with quote=FALSE as inread.csv("foo.csv", row.names = 1)write.csv(tables[[1]] [2:3], file = "foo.csv", quote=FALSE) -
21Should double quotes introduced from wikipedia download be removed?2/12/2015 9:29 PMPerhaps not - since double quotes are used to make exact Google searched. Wont add very much read time since population is small.... After additional deliberation - YES! Remove the crap out of them.UPDATE 1: Don't remove double quotes! Any string that contains a comma unenclosed by double quotes will erroneously add columns when written as a csv. DUHH
-
22How should I handle/represent purely instrumental hits?2/12/2015 9:32 PMOptions:
1) According to assumption "if it's Billboard ranked... it's popular enough" instrumental song counts should be reported but perhaps linked to a separate chart.
2) ... -
23How do I make for loops in [r]?2/12/2015 9:38 PM
foo = seq(1, 100, by=2)
foo.squared = NULL
for (i in 1:50 ) {foo.squared[i] = foo[i]^2} -
24How are substrings manipulated in [r]?2/12/2015 9:44 PM
Concatenate strings using paste() to introduce spaces between concatenations and paste0() to concatenate without space.
-
25How are numbers converted to strings in [r]?2/12/2015 9:50 PM
-
26What is the [r] equivalent to MATLAB 'try'?2/12/2015 10:55 PMresult = tryCatch({expr}, warning = function(w) {warning-handler-code}, error = function(e) {error-handler-code}, finally = {cleanup-code}
-
27Is there some simple utility... wrapper I can use for [r] syntax highlighting on my website?2/12/2015 11:04 PMSomething like Crayon Syntax Highlighter but for js.
-
28How do you count unique words in a string [r]?2/12/2015 11:14 PMTry the command unique dummyAlso get into the habit of first searching for command help in [r] by typing ?'the-command' or ??'the-command' ... just like in MATLAB.
-
29What types of questions could this project's unstructured data answer?2/13/2015 4:49 PMMaintaining that the extent of 'unstructured' data I will harvest are lyrics, perhaps it would be cool to distill emotion/theme/genre categorizations from songs... That is... to answer does American popular opinion prefer...
- Love songs over breakup songs?
- Triumph songs over anger songs?
- Selfish over selfless songs?
-
30How to scrape Google of lyrics?2/13/2015 10:47 PMAccording to TechCrunch song lyrics are now being provided at the top of Google search results pages.
UPDATE 1:These can be scraped in part using techniques described in What is a simple RCurl example I can use to begin learning how to use RCurl? -
31What portion of historic Billboard Hot 100 songs does the Million Song Dataset contain?2/13/2015 10:59 PMIt appears neither the Million Song Dataset nor the musiXmatch data set it also references offer full lyric data since doing so is copyright blocked... so this question doesn't matter.
-
32How do you make 3D graphs in [r]?2/14/2015 3:51 PM
-
33Did I overlook wikipedia's total Billboard Hot 100's total song count reference?2/14/2015 4:43 PMIn reference to 'When was the Billboard Hot 100 started?', as far as I can tell, the total song count to date of 1041 IS unreferenced.
-
34What is Plot.ly?2/14/2015 4:54 PMIt's awesome - an online plotting tool with libraries for plotting in popular technical tools such as [r], MATLAB and Python to enhance, share and backup plots.
-
35Where did this guy get his data from and how is my work different from his?2/14/2015 5:04 PMWhile looking into the 3D capabilities of Plot.ly I stumbled upon this graph of Average Song Length since 1945ish. I have a problem with this chart: the data source is a complete mystery. I have no clear way of deducing what songs were considered.
Apparently he, RhettAllain, writes for Wired Science.
I like his error bars.
Forget about pursuing his source... how will my work be different?- I'll be more transparent
- I'll attempt to bin averages across more relevant time intervals (annual seems as abstract an interval as 213 days - is one better than the other?)
- 3D Plot
UPDATE 1:Rhett Allain produced his graph in relation to the WIRED article he wrote - Why Are Songs on the Radio About the Same Length? -
36How do I get Evernote to default the cursor of a new note to the Title field?2/14/2015 5:23 PMApparently this option is not configurable. However, one can press F2 on a note to switch focus to the Title field. Alternatively, the following AutoHotkey code can bind the creation of a new note to the keyboard sending of the 'F2' key all in one go (tested/works).
#IfWinActive, ahk_class ENMainFrame ^n:: SendInput ^n Sleep 100 SendInput {F2} Return
-
37What kind of data is available in the millions song Dataset?2/14/2015 7:50 PMA lot of tune and harmony type data as well as metadata like artist name, song duration and year.
-
38What is the Echo Nest?2/14/2015 8:25 PMA "music intelligence platform [that] synthesizes billions of data points and transforms it into musical understanding..."Engadget has noted they are "The song-picking puppet master pulling the playlist strings behind iHeartRadio, Spotify and Nokia's music services."
-
39Uhhh... What?2/14/2015 8:32 PM
-
40What are stemmed/unstemmed words?2/14/2015 9:23 PMThe goal of stemming is that many related words are mapped onto the same one. For instance 'victori' maps to 'victory' and ought be counted as the same word.
UPDATE 1:The following SO post might help mapping endeavors R: replace characters using gsub, how to create a function?Perhaps chartr() is of use - Translate characters in character vectors, in particular from upper to lower case or vice versa. -
41What are musiXmatch's Terms of Use?2/14/2015 9:54 PMEULA found here
Important points:6 You will not copy any part of Musixmatch or any Third Party Applications and/or Sites or make commercial use of, rent, lease, loan, sell, publish, license, sublicense, distribute, assign or otherwise transfer any part of Musixmatch to any person.
6.6 Any content provided by us as part of Musixmatch, including but not limited to ringtones, lyrics, artist information and downloads contains copyrighted material, trademarks and other proprietary rights belonging to us and our licensors. All right, title and interest in and to such content vests in us and our licensors. You are granted a limited, revocable, non-exclusive licence to display that content as part of the Services solely for your personal use. Except as expressly authorized by us, you may not copy, modify, translate, reproduce, distribute, publish, broadcast, perform, display, sell, assign, lease or sub-license that content, in whole or in part.
-
42What are AZLyrics Terms of Use?2/14/2015 10:02 PMImportant points:Nothing notable, however, AZ Lyrics is 'powered' by musixmatch.
In reference to What doesn't a master want (in an apprentice)? people (in general) also tend not to like to work with criminals... respect the Digital Millennium Copyright Act. -
43Does the scraped Wikipedia Billboard Hot 100 data have any duplicates?2/14/2015 10:17 PM
-
44How do you list a size summary of table of lists in [r]?2/14/2015 10:23 PMUse summary()
Sometimes [r] is just plain easy. Then again... everything is easy once you know how to do it. -
45Does Ryan Tedder mostly only write lyrics?2/14/2015 10:48 PMRyan Tedder has written a ton of hit songs.
-
46Who has written (or co-written) the highest number of Billboard Hot 100 songs?2/14/2015 11:15 PMLet the numbers speak. To be answered later.
-
47What exactly does getURL in [r] get?2/14/2015 11:23 PMThe getURL and getURLContent functions from the RCurl package are used to retrieve the source of a webpage.
-
48How to get length of lists in list [r]?2/15/2015 12:24 AMThis question was asked in hopes of answering another question - how do I only download the billboard tables from Wikipedia usingtables <- readHTMLTable(theurl)As it turns out the longest list (who's length is 3) returned by the list of lists 'tables' is the list of interest. Usetables[sapply(tables, length) == 3]to return the desired ranking information. Recommended from StackOverflow.
-
49What is the most common Billboard Hot 100 phrase?2/17/2015 1:13 PMTo be determined
-
50What's more powerful: Lyrics or Tune?2/17/2015 1:15 PMTo be determined
-
51What would a sentiment analysis of the Billboard Hot 100 say about american emotion?2/17/2015 1:13 PMKeep this question for later
-
52How do you save table without row names [r]?2/21/2015 12:32 PMuse write.tablerow.names = FALSE
-
53How do you append multiple CSV files together [r]?2/21/2015 12:54 PMThere are a ton of ways of doing this. This forum post lists a couple.Since my analysis will mix and match these rows (songs) at some point I ought to append year information.
-
54How do you make a simple column vector with name in [r]?2/21/2015 1:05 PMA data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors n, s, b.n = c(2, 3, 5)s = c("aa", "bb", "cc")b = c(TRUE, FALSE, TRUE)df = data.frame(n, s, b) # df is a data frame
-
55How to make a vector of all the same number [r]?2/21/2015 1:24 PMR: generate a repeating sequence based on vectorrep(1995,20)Output > [1] 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995
-
56Do [r] matrices only support numbers?2/21/2015 1:48 PMNo, they support characters or numbers and perhaps another type but only one type at a time. To mix various data types use data.frame
-
57How do I remove double quotation marks from data.frame [r]?2/21/2015 1:52 PMVarious online message boards for this question entertain answers that have to do with how [r] makes output more human readable by including imaginary quotation marks.This is the one answer I could get to actually remove quotation marks in my data set. Note that it is not marked as the answer by the asker.library(stringr)library(plyr)del <- colwise(function(x) str_replace_all(x, '\"', ""))x <- del(x)Return to this question to add comment when get SO 50 Rep.
-
58What is a simple RCurl example I can use to begin learning how to use RCurl?2/21/2015 8:42 PMThis example from R Function ofthe Day
URL <- "http://www.ebay.com/sch/ctg/Big-Bang-Theory-Complete-Fourth-Season-DVD-2011-3-Disc-Set-/103149230?LH_Auction=1&_dmpt=US_DVD_HD_DVD_Blu_ray&_pcategid=617&_pcatid=1&_refkw=big+bang+theory+season+4&_trkparms=65%253A12%257C66%253A4%257C39%253A1%257C72%253A5841&_trksid=p3286.c0.m14" -
59How do I know if I have a particular library installed in [r]?2/21/2015 9:28 PMinstalled.packages()
Check for a specific package:a<-installed.packages() packages<-a[,1] is.element("boot", packages)
-
60Where can I find RCurl documentation?2/24/2015 9:33 PMCan be used to make concurrent webpage requestsCan process web pages in pieces [chunks] as they become available keeping memory overhead lowExample 5.1 doesn't work on account of the link http://www.omegahat.org/RCurl/exampleStock.dat no longer exists.Appears to be written sometime in 2006 or 2007
-
61What's the simplest website that reports my IP?2/24/2015 9:56 PMThis one told me the wrong IP... - http://www.whatsmyip.org/I cannot find my ip in the page source code of:
UPDATE!!IP address readily available in page source at http://ipchicken.com/
UPDATE 2 - This is the simplest.http://ifconfig.me/ip -
62How to parse xml data in [r]?3/2/2015 7:11 PMWait a second - if I'm hoping to parse webpages then I should use an html parser... referring to the htmlparse() cousin of XML packages' xmlparse() use
Here's a little tutorial on parsing a webpage using htmlTreeParse()URL <- "http://www.ipchicken.com/"doc = htmlTreeParse(URL) -
63How to return subset of character from a string [r]?3/2/2015 8:11 PMsubstr(x, start, stop) where start and stop are the index of the character locations to subset
-
64How do you make an infinite loop in [r]?3/4/2015 7:39 PMwhile(TRUE){print(1)}
-
65How do you terminate a process (or an infinite loop) in [r]?3/4/2015 7:42 PMHit keyboard ESCAPE button.
-
66How do you make a delay in [r]?3/4/2015 7:44 PMStakcOverflowPosttcltk can be used for fancy delaying of processes while permitting other things to happen... overkill for what I need.
Use Sys.sleep(number_of_seconds)
Note that functions in [r] are case sensitive... Sys.sleep() is different from sys.sleep(). -
67How does one use RCurl to execute and process https in [r]?3/4/2015 8:35 PMWhen I execute
getURL("https://sourceforge.net")
I get the following errorError in function (type, msg, asError = TRUE) :SSL certificate problem: unable to get local issuer certificate
As StackOverflow (SSL verification causes RCurl and httr to break - on a website that should be legit) suggests, add
.opts = list(ssl.verifypeer = FALSE)
to the getURL request like so:
getURL("https://sourceforge.net",.opts = list(ssl.verifypeer = FALSE) )
UPDATE 1:Although these did not help answer the question above, the following posts may help solve future RCurl obstacles -
68Is it just me or is Seven Little Girls (Sitting in the Back Seat) a strange song?3/4/2015 10:12 PMWhy are seven "little" girls kissin' and uh' huggin Fred...in the backseat?
-
69How are target pages accessed?3/4/2015 10:19 PMPrimary Site - as it pertains to artist and song names:
Apostrophes are turned into a single dash: That's=That-s
Spaces are turned into a single dash: Hello There=Hello-There
Commas are removedParenthesis are removed
No consecutive dashes
No ampersands
Periods are turned into a single dash (this breaks the URL if it produces a dash as last character): T.I.=T-I-
No plus signs
No exclamation marks
"featuring" is turned into "feat"
Dollar signs are turned into single dash: Ke$ha=Ke-ha
No forward slashes (/)
It should look like www.blaI'mNotSinglingOutSourcebla.com/lyrics/Artist-name-separated-by-dashes/Song-name-separated-by-dashes
-
70How do I generate most effective scrape queries (a.k.a land on desired page often)?3/4/2015 10:58 PM1. Reverse engineer accession formatting and force queries to ask Internet appropriately and hopefully land on (a) target... Landing uncertain2. Programmatically feed my unformatted queries to Internet search engine and use the best match's link as target. Landing is certain though possibly incorrect.Missing the target at some point is certain. Remediation is as follows:For option 1:
Throw flag for every artist/song combo that misses landing.
Condense flagged songs into error list that can be triaged either programmatically or manually.
For option 2:There is a greater chance of scraping the wrong song with this option. Easy remediation dubious.
-
71What needs to be done to scrape lyric data?3/5/2015 8:01 PM1. Condense Wikipedia's Billboard Hot100 between 1958 and 2014 into one list1.5 Check for duplicates in the above list (notify article keepers if any)2. Make copy of said list2. Create URLs according to access format from this list3. Feed URLs to crawler (at 0.5 Hz) who will determine which HTML node contains lyric data.
4. Inspect ERROR list and attempt to obtain lyric data by manual or semi-automated meansIf URL is inaccessibleThrow ERROR flag and push failed song/artist information onto ERROR listElseSave lyric data to file according to the following format: YEAR-RANK.txt
Note: At this point programmatically leveraging existing online search engines to find remaining lyrics may be required.If lyrics unavailable from primary sourceExplore alternate sources
GOAL: Get more than 90% of the historic Billboard Hot100 song lyrics. I refuse to believe that lyrics for a popular song less than 60 years old cannot be found online. -
72How do I append a data.frame to a data.frame in [r]?3/5/2015 9:52 PMboth.matrices<- rbind(matrix1, matrix2) ... courtesy of this poorly worded StackOverflow post
-
73How do I pre-allocate memory for data.frames in [r]?3/5/2015 10:08 PMThe excellent SO post, Growing a data.frame in a memory-efficient manner, suggests using data.table and set()
install.packages('data.table')library(data.table) #quickest ways to learn the features is to type example(data.table) and study the output.
Update1: I don't have time to get fancy with runtime minimization. Append to file will get the job done.
initFile<-read.csv("Billboard Hot100 1958.csv", header = TRUE, sep = ",")
write.table(initFile,
"Master Billboard.csv",
sep = ",",
dec = ".",
qmethod = "double",
row.names=FALSE)
for (year in 1959:2014 ) {fileName <- paste0("Billboard Hot100 ", year, ".csv")nextFile<-read.csv(fileName, header = TRUE, sep = ",")write.table(nextFile,"Master Billboard.csv",sep = ",",dec = ".",qmethod = "double",row.names = FALSE,append = TRUE,col.names = FALSE)} -
74Why does my Master Billboard file have 5701 instead of 5700 songs in it?3/5/2015 11:37 PM
-
75What songs data does Billboard.com show for older ranks?3/5/2015 11:45 PMIn example, go to http://www.billboard.com/archive/charts/1969/hot-100. The only songs shown occupied the No 1 spot. However, a list of the weekly Hot 100 is available for each week listed on this page. Unfortunately this does not give me the synthesized annual Hot 100.
-
76What is the true 100th ranked song of the 1969 Billboard Hot 100?3/5/2015 11:59 PMTracking down the true last rank for 1969 is not worth my time. One song represents 1/5700*100=0.02% of my data set. Therefor I will assume the first listed 100 rank a.k.a "Sweet Cream Ladies" by the The Box Tops is the true least popular song of the 1969 Billboard Hot 100.
"Let Me" by Paul Revere & the Raiders is hereby omitted from the Master Billboard list. -
77Can I tell the Wikipedia folks that they have an error in their Billboard Hot 100 1969 entry?3/6/2015 12:27 AMYes - Here's the talk pageHi Ericorbit,
After crawling through all the billboard songs I found that year 1969 ( https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_1969 ) has two 100 rankings. I tried finding out which of one of the two songs are actually the 100th spot but couldn't from Billboard.com (without paying I guess). I figure you or someone else close by probably has the source and could fix it in a jiffy so I'm simply blowing the whistle here.
Take care, [[User:UnclassicallyTrained|UnclassicallyTrained]] ([[User talk:UnclassicallyTrained|talk]]) 05:21, 6 March 2015 (UTC)UPDATE 1:Ericorbit repliedHmmmm, interesting. I checked to see who created that article and unfortunately he has not been active since 2014. I do know that in the past Billboard did have songs tied, both in weekly or yearly charts, although I didn't realize it happened as recently as '69. Both billboard.com and billboard.biz have year-end rankings going back to only 2002. I'll have to research around to see what I find. - eo (talk) 13:17, 6 March 2015 (UTC) -
78How do you make and call a function in [r]?3/6/2015 7:03 PMTaking inspiration from User-written FunctionsCreate the function:
Call the functionmyfunction <- function(arg1){print(arg1)print(arg1*2)}myfunction(3) -
79How do I access elements from a data.Frame like a pro in [r]?3/6/2015 7:26 PMStudy the examples in ?Extract.data.frame (in [r]).
-
80What the hell is a factor in [r]?3/6/2015 7:30 PMTo classify something as a factor in [r] is a way of designating that variable, be it a numeric or string, as some value who has additional properties that become useful for higher complexity statistical modeling. Also dubbed ‘category’ and ‘enumerated' types, factors carry 'level' and 'label' information which somehow make them better to use for generating graphics and improved memory management along the way.
-
81Why does accessing a single element in my data.frame give me more than one element [r]?3/6/2015 8:20 PMOr an even better question, why is typeof(initFile$Title[1]) an integer? I think both questions may benefit from similar answers. Take the following scenario:I load my data into r:initFile<-read.csv("Billboard Hot100 1960.csv", header = TRUE, sep = ",")
Then, typing in the following:initFile$Title[1]yields this output:[1] Theme from A Summer Place100 Levels: A Million to One ... Young EmotionsI'm only interested in the string directly to the right of the [1] row identifier
After a little more inspection it appears that initFile$Title[1] is not a string after alltypeof(initFile$Title[1])[1] "integer"
Finally, forcing the element to character gives me what I was after in the first place.as.character(initFile$Title[[1]])[1] "Theme from A Summer Place"
Why is it that initFile$Title[1] doesn't return a single character element in the first place? Other languages like MATLAB seem to be more succinct in their element access routines - is there a better way to access this information that doesn't require reminding [r] that it's looking at characters?
ANSWER - UPDATE 1:Use stringsAsFactors=FALSECheck out my total newb question on SO - Subsetting by [ ] returns a plurality of elements when only a single element was expected [R]. Why? -
82How do you replace characters in a string in [r]?3/6/2015 9:45 PM
-
83How do you replace apostrophes using gsub [r]?3/6/2015 10:12 PMSurround apostrophe with double quotesgsub("'", "-" , "kjd'jloalkjkj" )
-
84How do you replace parenthesis using gsub [r]?3/6/2015 10:26 PMThey must be escaped with a double backslash like so: \\) or \\(UPDATE 1:Also, periods ".", plus "+", USD "$"
-
85How do you get [r] to understand accented letters such as the e in Beyoncé?3/6/2015 10:54 PMIt appears [r] has the facilities to accept "uncommon" (anything other than alpha-numeric) characters. Use key board combination ALT+2+5+4 to get a black square like this ¦... It works!
Therefore RCurl must have dropped the desired Unicode encodings when I scraped from the Hot100 list from Wikipedia.
UPDATE 1:The above assumption was wrong and true reason disclosed here. -
86How long is downloading all of the Billboard song lyrics going to take?3/6/2015 10:56 PMMin:2 seconds * 5700 songs = 3.17 hoursMax:4 seconds * 5700 songs = 6.32 hours
-
87What is the e in Beyoncé called?3/7/2015 12:51 PMe-acute (accent) is a letter of the Latin Alphabet to be confused with È which is e-grave.
-
88Is readHTMLTable() or write.csv() dropping acute letters [r]?3/7/2015 12:56 PMIt appears that converting the output of readHTMLTable() to a data frame drops acute letters... It is not established if write.csv() will drop these encodings once I fix the data.frame issue.UPDATE 1:Converting the output of readHTMLTable() to a data frame is not what is dropping acute letters... It appears that R studio's data view window drops these but in fact acute letters are preserved in memory as is evident from inspection of the same in console.
ANSWER:That means that write.csv() must be what is dropping acute letters.UPDATE 2:Write.csv preserves encoding. So does [r]'s read.csv routine. The problem is there is no problem and I just spent half an hour chasing the ghost documented in UPDATE 1 of this note. Three cheers for R-Studios' crappy data view! -
89Are preprocessing routines in [r] robust?3/7/2015 1:26 PMIn his book, R Cookbook, Paul Teetor of Chicago R Users Group says the following:
Several of my Statistical Analysis System (SAS) friends are disappointed with the input facilities of R. They point out that SAS has an elaborate set of commands for reading and parsing input files in many formats. R does not, and this leads them to conclude that R is not ready for real work. After all, if it can’t read your data, what good is it?
I think they do not understand the design philosophy behind R, which is based on a statistical package called S. The authors of S worked at Bell Labs and were steeped in the Unix design philosophy. A keystone of that philosophy is the idea of modular tools. Programs in Unix are not large, monolithic programs that try to do everything. Instead, they are smaller, specialized tools that each do one thing well. The Unix user joins the programs together like building blocks, creating systems from the components.R does statistics and graphics well. Very well, in fact. It is superior in that way to many commercial packages.R is not a great tool for preprocessing data files, however. The authors of S assumed you would perform that munging with some other tool: perl, awk, sed, cut, paste, whatever floats your boat. Why should they duplicate that capability? If your data is difficult to access or difficult to parse, consider using an outboard tool to preprocess the data before loading it into R. Let R do what R does best.So although [r] conceivably could have community members improving preprocessing routines as you read this raw data may be better managed by other software. -
90How do I get write.csv() to preserve character encodings [r]?3/7/2015 1:55 PMDon't need to - It does it already (that is, at least, for ASCII).
-
91How do you access the last character in a string [r]?3/7/2015 3:33 PMExample:
nchar(a) can be used to determine length of string. Combine it with substr() to access the last character of some stringa<-"dlkflk-"substr(a, nchar(a), nchar(a)) -
92Should I continue investing time trying to fix known (but uncommon) break cases?3/7/2015 3:54 PMTake for example the artist name T.I.According to currently implement formatting rules T.I. converts to "T-I-" where the last dash breaks URL access. This is an uncommon case meriting temporary oversight.I will deal with this when it gets into my URL ERROR list.Here's a potentially great resource for dealing with string manipulation I will undoubtedly need help with later
-
93How do I check for duplicate Artist & Song Title entries on Wikipedia's Billboard Hot100 list[r]?3/7/2015 4:21 PM
vec <- c("a", "b", "c","c","c")vec[duplicated(vec) | duplicated(vec, fromLast=TRUE)] ## [1] "c" "c" "c"
master<-read.csv("Master Billboard.csv", stringsAsFactors=FALSE)
titleArtist<-paste(master$Title,master$Artist)
duplicatesIndex<-which(duplicated(titleArtist) | duplicated(titleArtist, fromLast = TRUE))
duplicateList<-master[duplicatesIndex,] -
94How do you, row-wise, append string vectors... merge string columns[r]?3/7/2015 4:28 PMApparently paste() can do this too. R - concatenate row-wise across specific columns of dataframe
-
95Wikipedia's Billboard Hot 100 (total) has almost 200 duplicate Song/Artist entries. Is this a problem?3/7/2015 4:55 PMAny useful analysis uses high fidelity measurement. If 200 of these songs are in fact duplicates that would mean 200/5700*100=3.51% of the data is compromised.UPDATE 1:Reappearing on the Billboard Hot 100 does not necessarily constitute a data entry error or Wikipedia vandalism. After all, Billboard is a zeitgeist of American musical taste. If for some reason we all began listening to a popular song of the 1960's today that song would appear on the 2015 ranking. Take for example Bohemian Rhapsody by Queen that topped the charts in 1976 and came back in 1992 (this very example, however, could be a demonstration of vandalism - at this time I don't know this to be true beyond Wikipedia sources).Let's assume for a moment that about 200 songs are indeed, not merely by duplication, erroneous Wikipedia entries. Considering various qualitative truths about this data set's origins, namely that it is safe guarded by a group of dedicated wikipedia moderators and that various pages are semi-protected from flagrant vandalism, it is probable that the number of errors associated with the data are less than 200. This would constitute that, thus far, it is likely that my data set is more than 96% accurate.In the interest of time to complete this project by March 25th 2015, I consider this data set one of relatively high fidelity and will proceed to scrape lyric data accordingly.NOTE - Some ways of checking fidelity of Wikipedia entries I will not explore (*Assuming that all entries are actually songs):-Determine initial release dates of songs and flag any entries appearing earlier in time.-Work with Wikipedia moderators to track integrity of rankings.-Manually scour the Internet for correct data on dubious entries - ain't nobody got time for that.
-
96How will I manage duplicate entries?3/7/2015 8:02 PMIf it takes about 6 seconds to download each lyric file then 198 duplicates would add 30 minutes to the download.1.Flag only 2nd duplicate index (do not need to worry about triple appearances since they don't occur)2.Make dummy file who's contents read "DUPLICATE , TITLENAME, ARTISTNAME"3.Decide how I want to incorporate duplicates in the analysis later.UPDATE 1:In hind sight it would have been better to do the above *AND* save said duplicates in a different directory from unique lyrics (Or just ignore them altogether).UPDATE 2:I wish I managed these in a more defined way earlier - I would've saved time. The way to manage these and all song files is to store all song files, regardless of category, in the same folder. Then, according to labeling contained in a "master navigator" file, access files according to conditionals which either enable or ignore the reading of certain files on the fly as needed.
-
97How do you find the index of the largest number in a list [r]?3/7/2015 9:55 PM
-
98How are if, else statements formatted in [r]?3/7/2015 11:15 PM
There is also the vectorized ifelse statement with simple example hereif(condition == TRUE) {x <- TRUEelse x <- FALSE}UPDATE 1:
UPDATE 1:if(client=='private'){ tot.price <- net.price * 1.12 # 12% VAT } else { if(client=='public'){ tot.price <- net.price * 1.06 # 6% VAT } else { tot.price <- net.price * 1 # 0% VAT } }
The above implementation 'of if, else if' falls short of a true switch statement.For switching, use the following:if (type == "mean") 1 else if (type == "median") 2 else if (type == "trimmed") 3...or switch()
UPDATE 2:
It appears that what is crossed out above also is not a true switching statement in the sense that if the first statement is executed, the second two are not. That's bogus.
-
99How do you set RCurl to follow redirect URLs?3/7/2015 11:59 PM
if a call to .opts= already exists, i.egetURL(u,.opts=curlOptions(followlocation=TRUE))getURL(url='https://www.bla.cm', curl=my.handle, .opts = list(ssl.verifypeer = FALSE))add followlocation setting like so:getURL(url='https://www.bla.cm', curl=my.handle, .opts = list(ssl.verifypeer = FALSE,followlocation=TRUE)) -
100How do you test if characters are in a string in [r]?3/8/2015 12:49 AMchars <- "test"value <- "es"grepl(value, chars)
-
101Is / a forward slash or a back slash?3/8/2015 1:44 AM/ is a Forward Slash
-
102How much memory will the historic Billboard Hot 100 consume?3/8/2015 2:33 AMMax
3kb/file... 5700*3kb=17.1mb
-
103How long will scraping the historic Billboard Hot 100 lyrics take?3/8/2015 2:43 AM13 minutes/100 songs... 5700/100*13=741 minutes or 12.4 hours. I hope this works the first time.
-
104Why is my scraping routine hanging after about 30 minutes computer idle time?3/8/2015 12:07 PMPower management settings had been miss-configured - the hard disk had been set to turn OFF after 20 minutes. After enabling most "DO NOT SLEEP/TURN OFF" settings, the routine ran for about 4.5 hours until failing with the following message (NOTE: this message was returned only after suspending the socket connection i.e. closing the browser window):Error in function (type, msg, asError = TRUE) :
transfer closed with outstanding read data remaining -
105Why did my transfer close with 'outstanding read data remaining' after scraping for four hours?3/8/2015 12:42 PMCould have been a number of things:
-Socket (enabled through live web browser) could have become disrupted by client (local) timeout.
-R Studio sucks
-I'm an idiot (most likely)
It could've been a simple RCurl issue possibly avoidable through try-catch statements. Attempts to try-catch things failed and proved to waste more time than just skipping the error prone loop and manually overseeing progress.
UPDATE 1:Web client may be abdicating it's responsibility to maintain socket connections after about 2.9 hours: This is an uncertain statement. -
106What does a basic website require?3/8/2015 3:33 PMDomain registrationHosting planBasic files (uploaded via hosting service FTP or similar protocol):
A basically good website will have:CSS and/or JavaScript files either containing or referencing other files with pictures of cats or textIf contain references to other files... those filessome type of tracking files to logusershipuse - Google Analytics -
107Is usership a word?3/8/2015 3:37 PMMaybe not really
-
108Will I make a mobile version of my website?3/8/2015 3:40 PMProbably not before April 1 2015
-
109Can I mock a website locally then push it to the wild?3/8/2015 3:41 PMYes although testing analytics stuff may be tricky. Hmmm
-
110How do you link between web pages in html?3/8/2015 5:20 PM
page1.html<!DOCTYPE html><html><head> <title>Page 1 to ground control</title></head><body> This is page 1. <a href="page2.html" title="to page 2">What is going on on page 2?</a></body></html>
<!DOCTYPE html><html><head> <title>Page 2 :)</title></head><body> This is a page 2. <a href="page1.html" title="to page 1">Want to go back to page 1? Click here</a></body></html>
-
111Making web pages seems easy enough but managing all those files can get tedious - DreamWeaver or Muse?3/8/2015 5:26 PMMuse is for noob hobbyists hoping to make a site with little or no coding experience... Creates bloated, web unstandardized projectsDreamWeaver is for pros hoping to make commercial/enterprise level web sites.
-
112What are good alternatives to DreamWeaver and are they worth my time?3/8/2015 5:33 PM
-
113What is WYSIWYG?3/8/2015 5:41 PM
-
114Dreamweaver CC or Dreamweaver CS6?3/8/2015 5:52 PMCC is the creative cloud version and upgrade to the standalone CS6. I'm assuming, perhaps incorrectly, the creative cloud 30 day trial has no limitations.
-
115What are the most common (important) html tags?3/8/2015 6:40 PMYou Only Need 10 HTML Tags - There are about 100 in total
<h1> - <h6> Heading <h2>Headings Are Great Fun</h2> <p> Paragraph <p>This is my first. I hope you like it.</p> <i> Italic <b>italicized word</b> <b> Bold <b>bold word</b> <a> Anchor (hyperlink) <a href="http://www.google.com">Link to Google</a> <ul> & <li> Unordered List & List Item <ul>
<li>Apples</li>
<li>Bananas</li>
<li>Pears</li>
</ul><blockquote> Blockquote <blockquote>“To be or not to be, that is the question.” - Person who said this</blockquote> <hr> Horizontal Rule (draw line) <hr /> <img> Image <img src="myimage.jpg" /> <div> Division <div> other tags here </div> ...Used for dividing web page content into containers
-
116Reference for other common HTML tags?3/8/2015 7:05 PM
-
117How many lyrics did the first scrape miss?3/9/2015 5:01 PMsummary(trueErrorList$HitTarget)
FALSE TRUE NA's
182 5315 53.3% 96.6% 0.1%The first scrape missed 3.4% of the lyrics or 187 songs. -
118I've made my first scrape of lyric data. What's next?3/9/2015 5:23 PM1. Determine percentage of lyrics missing2. Get remaining lyrics.If less than 5% missing, spend 2 hours getting remainder. Worry about any others missing after core analysis is developed.Else, spend 4 hours getting as many lyrics as you can or until less than 5% missing.3. Manually inspect a few lyrics**Write the following routines such that, when the analysis is done, and remaining lyrics are compiled, re-performing the complete analysis is simply a matter of running a script (or multiple back to back).**4. Generate compiled lyric file. NEVER DISTRIBUTE5. Make a duplicate of the above file to work from. The Composite Billboard Hot 100 Lyrics is formally dubbed the MLL (Master Lyric List).6. Generate alphabetically ordered list of unique words in MLL.7. Find or develop stemming routine to match misspelled, slang, or similarly mutated words with their English counterpart. Do same for other observed languages.
7a. Document Stemming rules
8. Create "modified MLL" - the MLL edited to only include word roots in lyrics (Parents of stemmed words)... a sort of under-glorified spell-check.Apply "under-glorified spell-check" to each individual lyric file and save edited counterparts to new file.
9. Perform "Total Unique Words", "Average Total Unique Words/Song", "Total vocabulary", and similar analysis.....Chart this project's EverNote progress (Doing this well may be App worthy).Test the following:Hypothesis 1
Hypothesis 2Lyrically, the most popular songs in America have become less unique (more repetitive) over time.Not really a hypothesis but it would be cool to analyze song themes and plot distributions. Possible themes include;
Love - Deeply in Love
Love - Breakups
Lust
Drugs
Partying
Travel
A million other things
Hypothesis 3
The above may be crappy sounding research questions but they serve as the beginning of deeper likely more interesting analysis.The most common lyrical noun or verb of the most popular songs in America is "love"
UPDATE 1:Alternate TestsHypothesis 4The average popular song in America is written at lower than an eighth grade reading level (as measured by readability indices such as the Gunning Fog). -
119Do utilities that plot EverNote use (i.e. productivity) already exist?3/9/2015 6:13 PMI'm talking about a view similar to commit history in Git repositories - do plotters/viewers for this exist?Doesn't look like it. Holy poop there's money here.UPDATE 1:Not many exist but some like EverNote Analytics already do a decent job at graphing EverNote use.
-
120What is the Gunning Fog Index?3/9/2015 6:17 PMCame across the following when answering the previous question. SourceGunning Fog Index:A weighted average of the number of words per sentence, and the number of long words per word. An interpretation is that the text can be understood by someone who left full-time education at a later age than the index.
-
121What is the SMOG index?3/9/2015 6:29 PMPer Wikipedia, SMOG is the acronym derived from Simple Measure of Gobbledygook. The formula for calculating the SMOG grade was developed by G. Harry McLaughlin in 1969 as a more accurate and more easily calculated substitute for the Gunning fog index.
Definition:Count the words of three or more syllables in three 10-sentence samples, estimate the count's square root (from the nearest perfect square), and add 3.
Approximation Formula: -
122What is the Flesch–Kincaid Reading Ease?3/9/2015 6:47 PMPer Wikipedia, an index similar in purpose to SMOG and Gunning Fog.Flesch–Kincaid Reading Ease:
Score Notes 90.0–100.0 easily understood by an average 11-year-old student 60.0–70.0 easily understood by 13- to 15-year-old students 0.0–30.0 best understood by university graduates Flesch–Kincaid Grade Level:The number of years of education generally required to understand a text relevant when the formula results in a number greater than 10.
-
123How do you quickly bin (get number counts) of data in a matrix [r]?3/9/2015 8:26 PMUse summary()
-
124Remove rows based on condition [r]?3/9/2015 9:02 PMd<-d[!(d$A=="B" & d$E==0),]
-
125How do you round in [r]?3/9/2015 9:05 PMround(x, digits = 0)
-
126Did Big Sean's song Dance A$$, ranked in 2012, begin with the word ass?3/10/2015 8:16 PMYes. In fact the first 15 words were "ass."
-
127How effective have my scraping routines been?3/10/2015 9:22 PMFirst Pass: Collected more than 95% of items (of 5700)
Second Pass:Third Pass: Might just do this by hand...+
+
= 5700 files
UPDATE 1:Duplicates Regular InstrumentalAfter taking a closer look at my scraped lyrics files last week I noticed that 253 or so files where oddly formatted - they had less than 4 lines of text. A closer look proved they were indeed miss populated and required further, more manual labor to find and correct. Thus, aided by the same scraper pointing to a different source, most of these files were auto populated although all were manually inspected for errors. The result is that now it is known that 71 Billboard Hits were Instrumental not 3The new lineup is as follows (Click here for category definitions)Duplicate:198
Regular:
5430Instrumental:71
Incomprehensibly Foreign:
1(PSY's Gangnam Style)
UPDATE 2:198+5430+71+1=5700The landscape of the lyric category distribution continues to change as preprocessing (quality control) continues. The following are songs that require re-categorization:
New totals1958 64 Will Glahe - Liechtensteiner Polka Lyrics is Incomprehensibly Foreign (German)198+5429+71+2=5700 -
128How many pages should the website have?3/10/2015 10:03 PM
- HOME - Motivation
- 'XXX' Questions (from this journal)
- Analysis (Results)
- Source Code - Link to Git
- An About Me page
-
129What should the Home page look like?3/10/2015 10:08 PM
-
130Color scheme - should I match the university?3/10/2015 10:09 PMNo, choose styling that reflects me
Green hues
MinimalistUPDATE 1:Make drawings/icons look like stencil sketches if applicable - like this EverNote elephant -
131Will the target audience have color blind observers?3/10/2015 10:11 PMI don't have time to design a website with robust visual ergonomics. Consider Revisiting This.
-
132What should my intro be?3/10/2015 10:13 PMOption 1:Hi, my name is Andrew Agostini and I'm trying to get into your school. I made this website to impress you. I hope it works.
-
133Will admissions officers reading this think I may have posted certain questions unintentionally?3/10/2015 10:15 PMProbably. If you're reading this question know that I'm documenting my mind here (as much as is reasonably practical). This is as much a journal as it is intended to be an art piece peering into the intricacies of problem solving in one bloke. That's a hope anyhow.
-
134What [r] routines/packages exist that can help stem words?3/10/2015 10:32 PMPotential Candidates:UPDATE 1:SnowballC cannot be used for this study since it maps 'winter' and 'winterize' only to the word 'winter'.I should almost just open the Master lyric file in MSword and run spell check. This is going to take forever!!!!
-
135Is stemming actually what I want to use to edit the master lyric file?3/10/2015 10:48 PMRunning the following example from SnowballC makes me think I can't use it for this project:
Where I would classify 'win' and 'winning' as two different words SnowballC thinks they're one. It appears that 'stemming' is more aboutlibrary(SnowballC)wordStem(c("win", "winning", "winner"))#Output -> [1] "win" "win" "winner"wordStem(c("win", "winning", "winter"))#Output -> [1] "win" "win" "winter"wordStem(c("winter","winterize", "poop", "pooping"))
#Output -> [1] "winter" "winter" "poop" "poop"simplification of language than it is mapping lexically similarcompaction of language by grouping of lexemes than it is mapping phonetic or slang words to their 'pure' language counterpart. Stemming the English dictionary, for example, would result in fewer unique words than would finding the unique word count of the same dictionary after only running a spell check on it.Hence, this question evolves into "Which interpretation of a unique word makes the most sense for this project?" -
136What is a lexeme (from lexicon)?3/10/2015 10:54 PMA basic lexical unit of a language, consisting of one word or several words, considered as an abstract unit, and applied to a family of words related by form or meaning.
-
137Which interpretation of a unique word makes the most sense for this project?3/10/2015 11:02 PMThe stemming interpretation of unique words is not acceptable for truly unique word count generation.In this project "winter","winterize", "poop",and "pooping" are considered four unique words (not two). Distinctions will continue to be relatively unambiguous among words such as "Gangsta", "Gangster" (which would map to one word - Gangster) but undoubtedly will become increasingly ambiguous among misspelled and slang words.I will strive to have the final mapping of these words publicly available.
-
138What [r] routines/packages exist that calculate SMOG and other reading level indeces?3/10/2015 11:25 PMSMOG:Flesch-Kincaid:Gunning Fog Index:[r] can do most of this with the koRpus Package!!Use readability()
-
139Can I use the koRpus Package to make verb and noun and adjective and other part of speach counts?3/10/2015 11:36 PMPerhaps not on it's own but definitely with the TreeTagger it recommends.
-
140Do [r] spell check routines exist?3/11/2015 5:59 PMYes: aspell-utils or aspell which uses the GNU Aspell spell checker which, per it's webpage, claims to be one of the best spell checkers available.
-
141How does GNU Aspell work?3/11/2015 6:11 PMHow it all works:In a nut shell, it determines words that sound alike to other words per some character error distance, known as the Levenshtein distance. It will then suggest 'corrections' in descending order of deviation.
-
142Can I dynamically teach GNU Aspell new associations for really poorly spelled words?3/11/2015 6:11 PMTake for instance the following excerpt from the unique words list:
What's the quickest way I could correct these into appropriate words?don't
Don't
DON'T
don't'
don'tcha
Don'tcha
don'tchu
don'ts
Don'tstop
donald -
143I run aspell("duncan") in the [r] console and get an error - what do I do?3/11/2015 6:33 PM> aspell("duncan")
Error in aspell("duncan") : No suitable spell-checker program foundTried installing the full windows installer from here... didn't helpWaiT!!! 1st generate alphabetically ordered list of unique words in MLL then work on this if needed.UPDATE 1:Downloaded the official English dictionary from Available Aspell DictionariesNeed to download Aspell for Windows. OMG the last release was in 2002! -
144How will I combine all lyrics from 5700 files (some including non-lyric data) into one master lyric file?3/11/2015 7:06 PMSome files are reappearances of earlier songs who's contents are merely:
Skipping/ignoring these files may have been easier if each file name began with a number starting at 1 and ascending to 5700.DUPLICATE , TITLENAME, ARTISTNAMEUPDATE 1:One can import all files of a specific type from a directory using
then remove unwanted files according to their contents.temp = list.files(pattern="*.txt")
myfiles = lapply(temp, read.delim) -
145Can [r] move files to a different folder?3/11/2015 8:24 PMMaybe... using file.rename(). Not sure if this actually moves the file or just creates a new one after deleting the old one according to this SO post
UPDATE 1:Yes, follow the above SO link. -
146Was the No.1 song in 1958 actually 'Volare', a non-English song, by Domenico Modugno?3/11/2015 9:10 PMWikipedia says it was. Since I'm hard pressed to believe Wikipedia is running a conspiracy to mislead the world about Billboard rankings I believe it
-
147How do you read actual text (not column data in a .txt) into [r]?3/11/2015 9:27 PM
UPDATE 1:Scan example:scan("test text.txt",what=character()) -
148How are alphabetized lists made in [r]?3/11/2015 9:35 PMUsing order():
-
149How to merge all text (lyric) files quickly?3/11/2015 9:38 PM
Open CMDMake current directory the one with all the .txt files to mergecd "C:\Users\User\Desktop\files"Type the followingfor %f in (*.txt) do type "%f" >> output.txt
UPDATE 1: DONT'T DO THE ABOVE (it will merge all files twice)!
Instead, Open CMD (Does not work in Powershell)
Make current directory the one with all the .txt files to mergecd "C:\Users\User\Desktop\files"
copy *.txt mergedFile.txt -
150Which lyric files are corrupt (need to be re-downloaded)?3/11/2015 9:57 PM1985 2 Like a Virgin Madonna2002 54 Missy Elliot Work It2005 93 Linken Parks Numb Encore ft JayZ2012 47 Gangnam Style - (This one will be ommitted from the analysis since I don't know the language)
UPDATE 1: -
151What are Unicode (or whatever it is) words that need to be recoded to proper English?3/11/2015 10:24 PM<U+0085><U+04F0>[<U+07F8>(fade to end) " "<U+0092> "\'"â<U+0080><U+0099> "\'"â<U+0080> ""<U+009C> ""Â<U+0091> ""<U+1E57> "p"Transcribed by Leon Sanchez from the Brenda Lee Story CD ""From the Movie "Doctor Zivago"
-
152I thought PSY's Gangnam Style was really popular - why is it only ranked 47th in 2012?3/11/2015 11:23 PMIn addition to being ranked 47th in 2012 it also ranked 55th in 2013 indicatingA) It was a really popular songB) Timing the release of a potentially popular song could help rank it better on year end charts. Noting it's near mirror positions about the chart's median indicate this single could have been in the top 10 or even No. 1 of a year end chart had it been released sooner or later... which would've made it even more memorable.
-
153Are my lyrics pure lyrics at this time [2015.03.11 11:59PM]?3/11/2015 11:58 PMDefinitely not. Several if not many files have header information that needs to be removed.For example, Pitbull's (Chris Brown's) International Love has useless header info.UPDATE 1:Several headers, by they present, contain the word "Miscellaneous" in them. One way to "quickly" identify files with headers may be to search for this word.Header Keywords:
Miscellaneous
Footer Keywords:Read more:
Random Keywords:2X
chorus
[
:
Chorus
fade out
sallysally@usa.net
instrumental
Transcribed by
jbacres@iquest.net
(break)
------ instrumental break ------
ronhontz@worldnet.att.net
Note:
@
Billboard position
peak Billboard
Words and Music by
girlfriendvideo@webtv.net
intro:
SPOKEN:
Awcantor@aol.com
Childuf60s@aol.com
bridge
repeat
Míša Pekarová & @SShano33
Turnpike Tom Music
?
cuz
-
154What's the easiest way to remove punctuation from a text file [r]?3/12/2015 8:55 PMHitting Control+F in Notepad++ and replacing all punctuation for blank space is an option but there's bound to be something that already works in [r].
x <- "I like %$@to*&, chew;: gum, but don't like|}{[] bubble@#^)( gum!?"gsub("[^[:alnum:][:space:]']", "", x)
[1] "I like to chew gum but don't like bubble gum" -
155What are the differences between MML versions?3/12/2015 9:16 PMMaster Lyric List v1: Raw lyrics dumped into single file since correcting bad individual lyric files.Master Lyric List v2: Same as above but stripped of all untranslated ASCII and UNICODE.Master Lyric List v3: Same as above but stripped of all punctuation except apostrophes.******MLL Bag of Words v1.txt: Not quite a bag of wordsMLL Bag of Words v2.txt: Still not quite a bag of words but alphabetically ordered. All punctuation except apostrophes removedMLL Bag of Words v3.txt: A alphabetically ordered bag of words with mixed caseMLL Bag of Words v4.txt: A alphabetically ordered bag of words with all lower case
-
156Why aren't some lines imported using scan("file.txt", character(0)) deconstructed into words [r]?3/12/2015 9:42 PMNot sure but at minimum had to do with remnant punctuation (other than apostrophes)
-
157Would making 'two passes' of word separation magically separate all the lines that remained intact?3/12/2015 9:55 PMIt totally did!!Step 1: Read the MLL line by line, remove all punctuation that isn't an apostrophe, then save to fileResult: Reduced MLL from 257,736 lines to 52,263 lines and words
UPDATE 1:Step 2: Read output of Step 1, reduce it's sentences into words (which didn't completely work) then keep unique words.Result: Further reduced to 46,778 lines and wordsStep 3: Read output of Step 2 then repeat Step 2Result: Further reduced to 34,785 words (no lines)Step 4: Dance on your desk!Step 5: Ignore case (make all lowercase) then find unique words of outputResult: Further reduced to 27,745 words (no lines)Step 6: -
158Ways text still needs to be reduced?3/12/2015 10:16 PM1: Unique words are case agnostic so ignore case2: Map multiple misspellings and slang to a single word3: Handle numeric characters
-
159What is this font?3/13/2015 5:43 PM
-
160What file format does EverNote save it's files in?3/13/2015 5:55 PMNatively, EverNote saves information in various database formats all who's direct access is not straightforward.
-
161How can one plot EverNote Note creation dates?3/13/2015 6:17 PMPer Jing Conan Wang's The Personal Analytics of My Evernotes, it is possible to get Evernote data through export commands in the official client.
One can even export all the notes of a particular notebook to HTML or three other formats as in the image below.
The options button permits adding note details such as creation date, modification date and more. Although Intra-note formatting appears perfectly translated to HTML, appearance of note titles and method of dividing notes can be made more appealing by aid of a reformatting script. -
162What is the ENEX file format?3/13/2015 6:35 PMAn EverNote XML derivative. Appears to be less useful to me than the HTML.
-
163How should EverNote productivity be communicated?3/13/2015 7:18 PMAs a cumulative sum of work since the project's inception. Accumulation better demonstrates the gravity of the endeavor, in contrast to the total note counts graphed as a function of day or month at The Personal Analytics of My Evernotes by Wang
-
164How do I install Aspell on Windows 7?3/13/2015 10:04 PMSetup for 64-bit Windows 7 appeared to be what I needed but it's focused on setting up Emacs. Instead, following instructions from Enable aspell spellchecker in windows 7 for notepad++
UPDATE 1:Installation of Aspell is proving a waste of time. Aspell is merely the first of three programs, the others being hunspell and ispell,that the [r] routine aspell() checks at runtime.
UPDATE 2:I have vanquished the Aspell installation problem - checkout my StakeOverflow Post! The primary reason to try to install Aspell is (supposedly) because it is faster than Hunspell.
Step 1. Download Aspell Win32 from following link (Yes, it's ancient)
Step 2. Install Aspell Win32 in Appdata roaming folder (it may be convenient to create a SpellCheckers parent folder before hand)
Step 3. Add Aspell Win32 to the Windows PATH. Learn how to add variables to your system path. In our example, type the followingC:\Users\UrPC\AppData\Roaming\SpellCheckers\Aspell\bin\where UrPC is the name of your PC.
Step 4. Confirm this environment variable edit by clicking OK several times
Step 5. Download Aspell dictionary from following link (I don't believe newer versions will work but who knows)
Step 6. Run the Aspell dictionary 'installer' which will unpack itself to a folder called TmpInstall in the same directory.
Step 7. Open the TmpInstall folder and run setup-Aspell-en-0.50-2.exe. It should auto detect where your Aspell Win32 installation is based on the system path we added in Step 3. If not, proceed to Step 8.
Step 8. Open windows PowerShell and type the wordaspell
You should see command information vomit down your prompt. If you don't you may have entered the PATH information incorrectly (and need to fix it) or need to restart Windows. If after doing this you still throw an error, consider the Appendix.
Appendix: If Aspell was previously uninstalled you may need to delete it's orphaned registry key. Run regedit.exe and search for aspell. Delete any aspell key associated with an uninstalled aspell directory. Be very careful not to unintentionally change anything other than the aspell key in question as deleting something by accident could send you to Windows Hell.
-
165How do I install Hunspell on Windows 7?3/14/2015 11:09 AMPer Help-GNU-EMACS:This is not an installation in the sense that one runs an installer.Step 1: Download Windows Hunspell zip from http://sourceforge.net/projects/ezwinports/filesStep 2: Extract the contents of the zipfile to any directory. I chose the new 'hunspell' folder I made in 'SpellCheckers' in appdata
Step 3: Add hunspell to Windows PATH
a. Start --> Control Panel --> System --> Advanced system settings -->"Advanced" tab --> Environment Variablesb. In the "System variables" window select "Path" and click "Edit..."c. A window "Edit System Variable" will pop up showing the current value of "Path". At the end of that string add
which on my system is;{folder for hunspell.exe}
Step 4: If needed, add additional dictionaries at;C:\Users\YOUR_PC\AppData\Roaming\SpellCheckers\hunspell\bin\
d. Click your way out: OK --> OK --> OK --> close the control panel window.C:\Users\YOUR_PC\AppData\Roaming\SpellCheckers\hunspell\share\hunspell -
166Can I get a crash course in spell checking in [r]?3/14/2015 12:05 PMFrom Watch Your
Words
src<-"http://svn.R-project.org/R/trunk/src/library/stats/man/lm.Rd"
f<-"lm.Rd"download.file(src, f)a <- aspell(f, "Rd")print(subset(a,Original %in% c("ANOVA", "regressor")),verbose=TRUE)
UPDATE 1:The above link is actually Watch Your Spelling -
167I've installed a spell checker, now what?3/14/2015 1:14 PMaspell(file) returns a list of words it believes are misspelled in file in addition to the location of the suspected error and multiple suggested corrections. To minimize the overhead of running aspell() with the Hunspell executable* on my word list I ought:1. Further minimize the file to be corrected by removing words that appear twice due to capitalizations, i.e. Don't needHello, hello, heLLoto be counted three times. Ignore caps!2. Run aspell() on output of 1.3. Train aspell(), that is create a dictionary, that it will use to auto-correct misspelled words in the future.... OR...create a hash table that maps only the mispelled words to4. Use the new dictionary to systematically correct each of the 5500+ unique lyric files (save to new file)5. Run unique word count analysis on each of the corrected files from 4.6. Plot and summarize results
*The Hunspell executable is slower than the Aspell executable (which I can not get working on Windows 7) due to its more powerful
UPDATE 1:Steps 1 and 2 produce a spelling error vector 12,323 elements long (of 27,745 words). How do I correct 12,323 words in less than 10 hours? -
168How do I edit the windows registry?3/14/2015 3:43 PM
-
169How to lower case everything in [r]?3/14/2015 4:27 PM
-
170In general, how are aspell and hunspell loading my PC?3/14/2015 8:54 PMa <- aspell("MLL Bag of Words v3.txt",program="aspell")
b <- aspell("MLL Bag of Words v3.txt",program="hunspell") -
171Does EverNote preserve links to other notes when exported to html?3/14/2015 9:23 PMSurprisingly yes!
-
172How do I correct 12,323 words in less than 10 hours?3/14/2015 9:35 PMOr, more generally, what problems need to be solved to rectify all Billboard Hot 100 Lyric files?Problem 1. Several files have supplemental header information such as artist and release date that need to be removed. This data is often enclosed in parenthesis.
Problem 2. Make Unicode conversion as described in this question.Solution 1. Inspect and tag all files that have parenthesis. Then manually (or if better process evident) remove extraneous information.
Problem 3. Many misspelled words are a product of people screaming with their keyboards, i.e.Solution 2. Use gsub() to replace all instances of certain Unicode characters.Yo Akon = Yoooooooooo Aaaaakooon!
Ah = aaaaaaaaaaaaah OR aaaaah OR aaaahhhhhhhhhh
Using MS Word's spellcheck is not possible since there is no clear way to programmatically feed all 5500+ lyric files to that engine for correction.
Solution 3. Assuming that one is operating on only misspelled words... an algorithm as follows:
1. If word is known, mark Known=TRUE
else, mark Known=False
Sub-algo 1a. If difference between suggested respelling and word is case, accept suggestion.Sub-algo 2a. Identify all words with apostrophes after word.
Sub-algo 2b. Substitute the letter g for apostrophe
Sub-algo 3a. Identify all words with apostrophes before word.
Sub-algo 3b. remove apostrophe
Sub-algo 4a. Identify all words that are strung togetherlikethis
Sub-algo 4b. Separate such words using the SplitWords() functionSub-algo 5a. Identify all words with repeated letters. Perhaps ignore some repetitions such as 'pp'
Sub-algo 5b. Eliminate all repeated letters.
2. Validate output of Sub-algos with dictionary cross check.
Note that spell checking a corpus thousands of words long in this way will always retain residual error and that further reduction of this error requires significant expense (time/money). An automated abridged spell check of this nature is therefore useful considering it's noise reduction per effort ratio.if reduced word is a known word, consider said word correct
else, seperate error to bin for further analysis.
3. Repeat 1-2 using a different sub-algo each pass until no word is marked unkown.4. If word is found to be spelled correctly but is not in dictionary, add said word to main dictionary.
Don't loose sight of the fact that The ultimate desire is to obtain a function that corrects misspelled words in place. -
173Will this paper 'Automatic Generation of English Respellings' help me answer the previous question?3/14/2015 9:44 PMUnfortunately, the Automatic Generation of English Respellings only reinforces my perception that correcting the MLL's misspellings will be an arduous task.
-
174What is imputation?3/15/2015 12:43 AMIn statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation"
This vocabulary question courtesy of 'An introduction to data cleaning with R' -
175Are there tools that provide better text processing and cleaning than [r]?3/15/2015 1:00 AMR is designed for statistical analysis. Although it has many tools for analyzing text perhaps perhaps better ones are available in Python.
-
176How to determine dictionaries available to aspell()?3/15/2015 5:17 PMAfter having added the Spanish dictionary from aspell.net, it would be nice to confirm that [r] detects it.
ANSWER:Couldn't find specific code to easily list that but used aspell("test bag of words.txt", control = c("--master=es")) and confirmed with output. -
177How are different dictionaries added to aspell() at runtime [r]?3/15/2015 5:56 PMUse the "-extra-dicts=SOME_LANGUAGE" control option.
aspell("test bag of words.txt", control = c("--master=en_US","-extra-dicts=es"))
-
178How to set aspell() to ignore case [r]?3/15/2015 5:58 PMUse the ignore-case control option.
-
179Which aspell() suggestion mode is preferred?3/15/2015 6:02 PMFour suggestion modes exist:
- ultra
- This method will use the fastest method available to come up with decent suggestions. This currently means that it will look for soundslikes within one edit distance apart without doing any typo analysis. It is slower than Ispell by a factor of 1.5 to 2 when a single word list is used. It speed is only minor affected by the size of the word list, if at all, but it is strongly effected by the number of word lists use. In this mode Aspell gets about 87% of the words from my small test kernel of misspelled words.
- fast
- This method is like ultra except that it also performs typo analysis unless it is turned off by setting the keyboard to none. The typo analysis brings words which are likely to be due to typos to the beginning of the list but slows things down by a factor of about two. This mode should get around the same number of words that the ultra method does.
- normal
- This method looks for soundslikes within two edit distance apart and perform typo-analysis unless it is turned off. Is is around 10 times slower than fast mode with the english word list but returns better suggestions. Its speed is directly proportional to the size of the word list. This mode gets 93% of the words.
- bad-spellers
- This method also looks for soundslikes within two edit distances apart but is more tailored for the bad speller where as fast or normal are more tailed to strike a good balance between typos and true misspellings. This mode never performs typo-analysis and returns a huge number of words for the really bad spellers who can't seam to get the spelling anything close to what it should be. If the misspelled word looks anything like the correct spelling it is bound to be found somewhere on the list of 100 or more suggestions. This mode gets 98% of the words.
Answer: Use ultra for testing sub-algos and normal when creating/making the final word list correctionsUse control option "sug-mode= "Default mode currently unknown. -
180aspell() doesn't seem to want to use multiple dictionaries at the same time, why?3/15/2015 6:21 PMUnknown.I was under the impression that
Perhaps it's actually another command.aspell("test bag of words.txt", control = c("--master=en_US","-extra-dicts=es"))At any rate, the Spanish dictionary absolutely does work when it is the master. -
181It appears there are extended functionality wrappers to aspell(0 - how do I get them?3/15/2015 7:29 PMSeveral neat functions for enhancing aspell() are available from OmegaHat though not in Windows flavor. Duncan Temple Lang invites anyone who would like the Windows binaries to mail him. I shall:
aspell() Windows Binary - A reply to your 10 year old invitationHello Dr. Temple LangWe've never met and probably never will. I, however, have had the benefit of being able to use your documentation of aspell() and other help files to work my way through several coding uncertainties - thank you! I write to you now wondering specifically about your package aspell()'s functionality on windows machines and whether procuring a windows binary of the package at install.packages("Aspell", repos = "http://www.omegahat.org/R"), as you mentioned you may supply if one were to request it from you, was possible. I realize it's been a while since various portions of your aspell() documentation have been updated and understand if new life directions have led to it's abandonment. I have been considering using Python to handle text preprocessing needs and, if you cannot provide a windows binary of your aspell wrapper, would you be able suggest an alternative?
Have a good one,Andrew Agostini
UPDATES:None -
182How to match substrings according to wild card [r]?3/15/2015 9:52 PM
-
183Why doesn't correctionRules() work?3/16/2015 6:16 PMYou got to install it dummy:
install.packages("deducorrect")
library(deducorrect) -
184What is deducorrect()?3/16/2015 6:17 PMA collection of methods for automated data cleaning where all actions are logged. Think deductive correction, deductive imputation, and deterministic correction.
-
185Is the deducorrect workflow going to work well for mapping corrections to (many) misspelled words?3/16/2015 7:01 PMIn the few examples of it at work in An introduction to data cleaning with R it is not apparent that it will readily remedy my problems.Note: this is a powerful package to be used in the future.
-
186Should I begin correcting misspellings anew - Should I start from scratch?3/16/2015 7:05 PMYes. Several preprocessing steps I took in an earlier question, such as using Notepad++ to Find & Replace certain texts is not reproducible for determining final word counts in the final analysis of the corrected 5500+ lyrics. Thus, everything I accomplished in Notepad++ will need to be emulated programmatically in R.This is what I get for being eager... the lazy man works twice.
-
187What song lyrics need to be found?3/16/2015 7:11 PM
[1] "2014 80" "2014 81" "2014 82" "2014 83" "2014 84" "2014 85" "2014 86" "2014 87" "2014 88" "2014 89"
[11] "2014 90" "2014 91" "2014 92" "2014 93" "2014 94" "2014 95" "2014 96" "2014 97" "2014 98" "2014 99"
[21] "2014 100"Awesome!! Apparently I didn't check to see if my scraper finished like it was supposed to.UPDATE 1: Turns out I'm not as much of an idiot as I thought. The following are the YEAR/RANK names that are missing1969 90 Instrumental1977 21 Instrumental1977 67 Smoke from a Distant Fire Sanford-Townsend Band
1979 39 Heaven Knows Donna Summer and Brooklyn Dreams1980 43 Instrumental1981 90 You've Lost That Lovin' Feelin' Hall & Oates
1985 92 Born in the U.S.A. Bruce Springsteen
1986 13 Friends and Lovers Gloria Loring and Carl Anderson
1988 86 I Don't Want to Live Without You Foreigner
1995 72 Player's Anthem Junior M.A.F.I.A. featuring The Notorious B.I.G.
1995 96 I Miss You N II U
1995 98 Best Friend Brandy
1995 99 Misery Soul Asylum
2002 08 What's Luv? Fat Joe featuring Ashanti
2009 30 Don't Trust Me 3OH!3
2012 32 Young, Wild & Free Snoop Dogg and Wiz Khalifa featuring Bruno Mars
2012 47 Gangnam Style PSY
2012 80 So Good B.o.B
2013 53 Try Pink
2013 71 22 Taylor Swift
2014 37 Break Free Ariana Grande featuring ZeddThe following will be omitted from this study.1969 90 Instrumental1977 21 Instrumental1980 43 Instrumental2012 47 Gangnam Style PSY
UPDATE 2: -
188How to get name of every file in a folder?3/16/2015 7:17 PM
- Click Start, point to Programs, and then click MS-DOS Prompt (or Command Prompt in Windows NT).
- At a command prompt, locate the drive that contains the folder whose contents you want to list. For example, if you want to create a text file that contains a list of the contents of a folder on drive C, type the following command at a command prompt, and then press ENTER:
c:
- At a command prompt, locate the folder whose contents you want to list. For example, if you want to create a text file that contains a list of the contents in the Windows folder on drive C, type the following commands at a command prompt, and press ENTER after you type each command:
cd\
cd windows - Type the following command at a command prompt, and then press ENTER, where filename is the name of the text file that you are creating:
For example, if you want to create a file named Windowsfolderlist.txt, type the following command at a command prompt, and then press ENTER:
dir > filename.txt
NOTE: The text file that you create is located in the folder that you are in when you follow these steps. In the earlier example, the Windowsfolderlist.txt file is located in the Windows folder.dir > windowsfolderlist.txt
- Use a text editor, such as Notepad++, to view or print this file.
-
189How to get difference of two lists [r]?3/16/2015 7:44 PMSay Lobs is a matrix of observed elements and Lexp is a matrix of expected elements. Say you would like to know what observed elements are not in the expected matrix.
The difference between these two matrices is:
UPDATE 1:Lexp[!(Lobs %in% Lexp)]The above is wrong. Instead, the answer is from this SO postLexp[is.na(match(Lexp,Lobs))] -
190How does sapply() work[r]?3/16/2015 8:32 PMhttp://www.r-bloggers.com/using-apply-sapply-lapply-in-r/
NOTE: Check out rollapply() from quantmod package for future stock analysis work - it can be used to compare the current observation with the value, say 5 periods, before it (which is really useful for when I port my MATLAB code for this to [r]). -
191Should I make a list of useful code snippets?3/16/2015 8:59 PM#Find difference between two matrices named exp and obsexp[is.na(match(exp,obs)),]
#Make files according to vector listfileNames<-sapply(exp[is.na(match(exp,obs))], function(x) paste0(x,".txt"))
sapply(fileNames, function(x) write(" ",file=x))
#Scrapelibrary(RCurl)
library(XML)
options(RCurlOptions = list(proxy = "socks5://127.0.0.1:9150"))
my.handle <- getCurlHandle()html <- getURL(url=master$Target[song], curl=my.handle, .opts = list(ssl.verifypeer = FALSE,followlocation=TRUE))
#Stage populating a folder with files (50 at a time)setwd("~/School/Graduate School Admissions/Essay/Dynamic/DataWrangling/Lyrics/Missing")
count<-50
for (i in 1:length(fileNames)){
write(paste0(smallFiles$Title[i]," ",smallFiles$Artist[i]," Lyrics"),file=fileNames[i])
if(i%%count+1==count){
cat ("Press [enter] to continue")
line <- readline()
}
}
#Replace occurrence of a vector of values with new ones in a body of text.
# Add attributes to specific XML/HTML nodeslibrary(XML)
kbbHTML <- readLines("http://www.kbb.com/used-cars/honda/accord/2014/private-party-value")
kbbInternalTree <- htmlTreeParse(kbbHTML,useInternalNodes=T)
specific.nodes <- getNodeSet(doc = kbbInternalTree, path ="//a[contains(@href,'/honda/accord/')]")
sapply(specific.nodes, function(x) xmlAttrs(x)<-c(Fig_Vodka="Don't mind if I do")) -
192Is Player's Anthem by Junior M.A.F.I.A politically correct?3/16/2015 11:42 PMMaybe...Niggas uh, bitches ha uh(Niggas) Grab your dick if you love hip-hop(Bitches) Rub your titties if you love Big PoppaGotcha, open off the words I say because"This type of shit it happens everyday"Now who smoke more blunts than a little bit?What are you a idiot?Listen to the lyrics I spit like M1's
-
193Is it possible to collect click (event) data?3/17/2015 11:51 AM
-
194What should I look for in a good web hosting company?3/17/2015 11:32 AMasmallorange.com: My friends use them, they look affordable and sincere. This will be my host!
-
195arvixe seems too good to be true - is it?3/17/2015 11:34 AM
-
196How much does Google analytics cost?3/17/2015 12:04 PM
-
197How to get Google analytics for domains?3/17/2015 11:42 AMSign up for an account
- Find the tracking code snippet for your property. Sign in to your Google Analytics account, and select the Admin tab. ...
- Find your tracking code snippet. ...
- Copy the snippet. ...
- Paste your snippet (unaltered, in its entirety) into every web page you want to track. ...
- Check your setup.
-
198Should I use Google analytics?3/17/2015 2:51 PMThere are certainly reasons for and against.After considering the reasons listed on Tristan Denyer's blog, I will seek an alternative analytics platform (if any at all).
-
199Does my web hosting service need to have something special to use Google analytics?3/17/2015 11:52 AMNope. Just include the right HTML calls to files from Google Analytics.
-
200Does Google analytics slow down websites?3/17/2015 2:19 PMAny additional code that must be run and then fetches data from anywhere will delay, in some fashion, the user experience.The delays caused by Google analytics is minimal and, if configured correctly, will not adversely affect site use even on it's first visit.What's more the analytics.js (or legacy ga.js) file is cached by your browser for 12 hours before downloading a new one, enabling analytics to strike minimal overhead on your network.
-
201What is a honeypot?3/17/2015 2:52 PMA Honey Pot is a system for identifying spammers and the spambots they use to scrape addresses from your website, via baiting or luring with a 'honeypot' of fake email addresses, so as to be able to block them from future scrape and SPAM ploys
Using the Project Honey Pot system you can install addresses that are custom-tagged to the time and IP address of a visitor to your site. If one of these addresses begins receiving email we not only can tell that the messages are spam, but also the exact moment when the address was harvested and the IP address that gathered it. -
202Is there a way to remove any untranslated Unicode from a text file via [r]?3/17/2015 8:42 PM
Yes. Follow this StackOverflow post for how to remove unusual Unicode characters
UPDATE 1: After further investigation the above, although able to omit certain Unicode characters from the corpus object, doesn't provide a clear way to save these removals to text file. There is, however, a way to do this in Python.
UPDATE 2: gsub() in [r] works perfectly fine, i.e.gsub("<U+2028>", " ", text) -
203What is corpus() [r]?3/17/2015 8:49 PMA function from the tm package (text mining). It can be used along with the inspect() function to quickly compile word statistics from a single or various documents.Another awesome function from the tm package is the findAssocs() function which can be used to find word correlations by specified degree of association.
-
204What's one way Python can remove unwanted Unicode from text files?3/17/2015 9:30 PMThis.
-
205Remove "<U+2028>" from text using python?3/17/2015 10:56 PM
-
206How to suppress warning while using try() in [r]?3/18/2015 12:35 AM
-
207This is kind of late in the game but... did my scraping algo save empty lyric files?3/18/2015 1:02 AMUnfortunately yes. The following files have three or less lines indicating they may be empty. Some of these definitely are.fileNames[which(master$LineCount<4),]
[1] "1958 6.txt" "1958 33.txt" "1958 34.txt" "1958 57.txt" "1958 61.txt" "1958 69.txt" "1958 71.txt" "1958 75.txt"
[9] "1958 79.txt" "1958 89.txt" "1958 94.txt" "1958 96.txt" "1958 97.txt" "1959 22.txt" "1959 36.txt" "1959 37.txt"
[17] "1959 52.txt" "1959 53.txt" "1959 58.txt" "1959 60.txt" "1959 66.txt" "1959 75.txt" "1959 81.txt" "1959 86.txt"
[25] "1959 95.txt" "1959 98.txt" "1960 21.txt" "1960 36.txt" "1960 53.txt" "1960 57.txt" "1960 81.txt" "1960 90.txt"
[33] "1961 11.txt" "1961 15.txt" "1961 17.txt" "1961 27.txt" "1961 29.txt" "1961 41.txt" "1961 44.txt" "1961 54.txt"
[41] "1961 55.txt" "1961 60.txt" "1961 72.txt" "1961 74.txt" "1962 5.txt" "1962 17.txt" "1962 25.txt" "1962 32.txt"
[49] "1962 44.txt" "1962 53.txt" "1962 59.txt" "1962 66.txt" "1962 83.txt" "1962 87.txt" "1962 88.txt" "1962 90.txt"
[57] "1962 92.txt" "1962 100.txt" "1963 1.txt" "1963 2.txt" "1963 5.txt" "1963 8.txt" "1963 22.txt" "1963 25.txt"
[65] "1963 28.txt" "1963 36.txt" "1963 48.txt" "1963 53.txt" "1963 58.txt" "1963 63.txt" "1963 90.txt" "1963 97.txt"
[73] "1964 5.txt" "1964 12.txt" "1964 34.txt" "1964 49.txt" "1964 60.txt" "1964 84.txt" "1964 86.txt" "1964 89.txt"
[81] "1964 90.txt" "1964 99.txt" "1965 1.txt" "1965 11.txt" "1965 15.txt" "1965 16.txt" "1965 18.txt" "1965 19.txt"
[89] "1965 39.txt" "1965 47.txt" "1965 49.txt" "1965 72.txt" "1965 75.txt" "1966 33.txt" "1966 55.txt" "1966 61.txt"
[97] "1966 66.txt" "1966 86.txt" "1966 98.txt" "1967 6.txt" "1967 21.txt" "1967 33.txt" "1967 72.txt" "1967 81.txt"
[105] "1967 83.txt" "1968 14.txt" "1968 21.txt" "1968 43.txt" "1968 52.txt" "1968 65.txt" "1969 15.txt" "1969 47.txt"
[113] "1969 49.txt" "1969 58.txt" "1969 63.txt" "1969 83.txt" "1969 95.txt" "1970 59.txt" "1970 61.txt" "1970 64.txt"
[121] "1970 92.txt" "1971 31.txt" "1971 56.txt" "1971 58.txt" "1971 64.txt" "1971 70.txt" "1971 94.txt" "1971 95.txt"
[129] "1971 99.txt" "1972 2.txt" "1972 17.txt" "1972 22.txt" "1972 28.txt" "1972 43.txt" "1972 51.txt" "1972 70.txt"
[137] "1972 76.txt" "1972 95.txt" "1973 16.txt" "1973 38.txt" "1973 42.txt" "1973 54.txt" "1973 61.txt" "1973 66.txt"
[145] "1973 72.txt" "1974 73.txt" "1974 75.txt" "1974 79.txt" "1974 94.txt" "1974 98.txt" "1974 99.txt" "1975 20.txt"
[153] "1975 45.txt" "1975 51.txt" "1975 58.txt" "1975 60.txt" "1975 66.txt" "1975 85.txt" "1975 88.txt" "1976 10.txt"
[161] "1976 11.txt" "1976 29.txt" "1976 48.txt" "1976 62.txt" "1976 68.txt" "1976 71.txt" "1976 81.txt" "1977 9.txt"
[169] "1977 23.txt" "1977 36.txt" "1977 71.txt" "1977 99.txt" "1978 56.txt" "1978 72.txt" "1978 76.txt" "1978 79.txt"
[177] "1979 15.txt" "1979 19.txt" "1979 41.txt" "1979 67.txt" "1979 80.txt" "1980 20.txt" "1980 88.txt" "1981 7.txt"
[185] "1981 16.txt" "1981 30.txt" "1981 43.txt" "1981 49.txt" "1981 54.txt" "1981 56.txt" "1981 58.txt" "1981 61.txt"
[193] "1981 63.txt" "1981 78.txt" "1981 93.txt" "1982 10.txt" "1982 15.txt" "1982 34.txt" "1982 44.txt" "1982 56.txt"
[201] "1982 66.txt" "1982 73.txt" "1982 81.txt" "1983 7.txt" "1983 28.txt" "1983 38.txt" "1983 39.txt" "1983 40.txt"
[209] "1983 55.txt" "1983 60.txt" "1983 67.txt" "1983 81.txt" "1983 90.txt" "1984 14.txt" "1984 22.txt" "1984 40.txt"
[217] "1984 45.txt" "1984 57.txt" "1984 67.txt" "1984 70.txt" "1984 98.txt" "1984 100.txt" "1985 6.txt" "1985 27.txt"
[225] "1985 37.txt" "1985 51.txt" "1985 59.txt" "1985 67.txt" "1985 75.txt" "1985 82.txt" "1985 83.txt" "1985 99.txt"
[233] "1986 4.txt" "1986 7.txt" "1986 75.txt" "1986 94.txt" "1987 50.txt" "1987 54.txt" "1987 55.txt" "1987 86.txt"
[241] "1988 19.txt" "1988 24.txt" "1988 30.txt" "1988 31.txt" "1988 42.txt" "1988 47.txt" "1988 48.txt" "1988 50.txt"
[249] "1988 72.txt" "1988 75.txt" "1989 1.txt" "1989 11.txt" "1989 47.txt" "1989 53.txt" "1989 54.txt" "1990 14.txt"
[257] "1990 64.txt" "1990 71.txt" "1991 68.txt" "1992 25.txt" "1992 46.txt" "1992 56.txt" "1992 80.txt" "1992 81.txt"
[265] "1992 92.txt" "1992 98.txt" "1993 18.txt" "1993 73.txt" "1993 100.txt" "1994 15.txt" "1994 25.txt" "1994 40.txt"
[273] "1994 54.txt" "1994 66.txt" "1995 4.txt" "1995 61.txt" "1995 66.txt" "1996 25.txt" "1996 65.txt" "1996 66.txt"
[281] "1996 75.txt" "1996 78.txt" "1996 100.txt" "1997 2.txt" "1997 8.txt" "1997 57.txt" "1997 58.txt" "1997 60.txt"
[289] "1997 72.txt" "1997 77.txt" "1997 79.txt" "1998 49.txt" "1998 67.txt" "1998 73.txt" "1998 90.txt" "1999 60.txt"
[297] "1999 81.txt" "1999 91.txt" "2000 26.txt" "2000 67.txt" "2001 33.txt" "2002 35.txt" "2002 83.txt" "2003 9.txt"
[305] "2003 96.txt" "2004 52.txt" "2004 64.txt" "2004 89.txt" "2005 20.txt" "2005 55.txt" "2005 95.txt" "2005 97.txt"
[313] "2005 100.txt" "2006 26.txt" "2006 82.txt" "2007 28.txt" "2008 45.txt" "2008 85.txt" "2008 97.txt" "2009 52.txt"
[321] "2009 98.txt" "2010 6.txt" "2010 11.txt" "2010 63.txt" "2010 65.txt" "2010 81.txt" "2011 55.txt" "2011 77.txt"
[329] "2011 84.txt" "2013 99.txt" "2014 15.txt" "2014 90.txt" -
208How able am I to provide lexical corrections to the lyric corpus?3/18/2015 10:49 AMError-Related Negativities During Spelling Judgments Expose Orthographic Knowledge suggests an answer requiring several vocabulary examinations. This paper may encroach on 'No Shit Sherlock' territory though I can't be certain without further investigation.
Even so, mechanistic word corrections (spell checkers) exist to supplement/enhance a human's spell checking ability so it is without reserve for ineptitude I ought proceed with the investigation.
As it pertains to effectiveness in spelling words correctly:On my own I'm: AverageWith a friend we're: Hopefully better than AverageWith a machine I'm: Definitely better than with a standard friendWith friends and machines: An unstoppable slayer of poorly spelled words - I'm the Juggernaut Bitch! -
209What is the etymology of the word 'Investigation'?3/18/2015 10:58 AMinvestigation (n.) early 15c., from Old French investigacion (14c.), from Latin investigationem (nominative investigatio) "a searching into, a searching for," noun of action from past participle stem of investigare "to trace out, search after," from in- "in, into" (see in- (2)) + vestigare "to track, trace," from vestigium "footprint, track" (see vestige).
-
210Is this saxophonist in a Cat or a Deere?3/18/2015 6:55 PMDuane Eddie's brass is in a Deere for his song Forty Miles of Bad Road.
-
211How does one perform modulus in [r]?3/18/2015 8:03 PMUse %%, i.e.> 3 %% 6
[1] 3
> 12 %% 6
[1] 0 -
212How to cause [r] to wait for key press?3/18/2015 8:07 PM
-
213Are do, boo, daa, ray, me , fo... etc, considered words?3/18/2015 8:54 PMBaby Talk by Jan & Dean use daa's more as melody than words.ANSWER:Not sure... need to resolve laterUPDATE 1:
-
214Was Stevie Wonder awesome?3/18/2015 9:49 PM
-
215Why do some lyrics have the term [Incomprehensible] in brackets?3/19/2015 7:58 PMIt's because the word at that moment is not understandable from the song.
-
216Would it be awesome to make a words per track duration investigation?3/19/2015 8:10 PMDepends on who's asking... but YES! Consider for a moment the great length of the song Tubular Bells noting it's few words. Contrast that with Rap God and I think we may have the beginning of an interesting story to tell. Who knows... maybe not.
-
217What is a unique word - what are the rules on words?3/19/2015 8:34 PMAny word or words that are spelled out are considered themselves...M E T H O D O F L O V E = Method of love
Growling/slurring/stuttering a word does not add said word to the English dictionaryBri-ri-ri-ri-right = Bright
Other examples...OM M M M M M M M M M M M M M M M M My Lord = Oh My Lord
Shit is shitsh*t = shit
Remove dashesY-M-C-A = YMCA
d'you = do you(Mmmmmm-mmm) is not a wordbeeotch = bitchfellaz = fellasYes-sir-ee = yessireeSome-a-times = some timesC'mon = come onsumpin'=something
From Itsy Bitsy Teenie Weenie Yellow Polka Dot Bikini[(badadup)] will not be considered wordsbopopopopopopop
From Dang Me by Roger Millersurple: the word syrup was intentionally said wrong to rhyme with the word purple
From Gloria Shadows of KnightThis youtube music video is slightly different in lyrics than the ones available on the first page of google results for the same
From People Got to Be Free The RascalsNat'ral = Natural, 'rrive = arrive,
humpty is a new word.
I Want You Back The Jackson 5Gimme = Give meya is a word
Neither I nor the Internet seems to agree as to what the line after "Day dreamin' and I'm thinkin' of you" actually says. Alternatives are as follow:Look at my heart floating awayLook at my mind floating awayLook at my love floating awayLook at my love blowing awayGiven the context of the song, I will assume the correct lyric is Look at my heart floating away
Tell me something good Rufus'nuff = enough
Cut the Cake Average White Bandgimme means give me
menage-a-troi = Ménage à troisniggaz = niggersecstacy = ecstasykilla = killerwanna = want toMariot = Marriot
crunk is a wordmackin is a word
Return to Love Rollercoaster Ohio Players Lyrics 1976 301995 43
Waikikiiiiii = Waikiki
caine = cocaineWhat does fessin' mean?
inbetweener is a word
Dat = ThatDes=thisDa=the (but not always)dey=theyIz=isbeasting is wordthang=thingSorta = sort of -
218Would the 100th ranked song in 2005 sound better with gender reversals?3/19/2015 9:24 PMBoy, give me thatBoy, give me that
Boy, give me that penis
Boy, give me that
Boy, give me that
Boy, give me that penis
Boy, give me that
Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me that
You know you want it
Boy, don't act like you don't want it
Boy, I want it just as bad as you do
And look, see I can tell from this lil vibe
You got me feeling that you dig me
Boo, I'm digging you too
You wanna be one of the chosen few
Then gon jig up in this motherfucker
Maybe me and you can do it big up in this motherfucker
Sit you in a crib where you can chill
Don't have to move a muscle
Give you some be good now you be good
Mommy gon hustle
Come here, let me whisper in your ear
I gotta tell you something
Listening to this song kinda make a nigga want something
Did some daydreaming
Now I'm fiending like I'm on something
Boy, don't hold it from me
'Cuz right now I'll be don strong on ya
I ain't the type to ruin your life
By running game and throwing your dreams
Get in your brain, suit your game
Ease your pain and show you things
Sit you on some leather seats while blowing green
And switching lanes
Boy, stop playing, let me beat it out the frame
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
he five foot seven a hundred and thirty nine pounds
Thirty six, twenty four, thirty eight, pretty fine brown
Bad lil' bra, I ain't seen him in a minute
Since the All-Star game and I'm still tryin' to hit it
Got a baby for this nigga that I used to sell things
He caught a fed case and he ain't leave him no change
he sold all his jewels, he sold all his cars
Now he dancing in the shaker club stripping for the stars
Sliding down the pole slow, drop it to a split
penis popping on a handstand, man, he the shit
he still looking tight though, still built right though
Run my game right and after the club
he might go back to the telly with me
Shake his jelly with me
Let my people bust on his face and his belly with me
I got Lil Webbie with me and he ain't hating
We some players in this bitch, so baby, stop your hesitating
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Now I can tell from your size that that penis is fire
So I'm here and I'm willing to give you whatever it require
For you to lay down on your back and then open your thighs
Long sharp deep and wide, have you rolling your eyes
You a big fine horse, I had no choice but to try it
Look like it's worth a couple G's but ain't some shit I buy
Let me whisper in your ear again, I ain't gon' lie
I might share a lil meal just don't tell nobody
Look you know you want it
Boy, don't act like you don't want it
Boy, you want it just as bad as I do
But check this out
You gon' be wishing that you been gave me your money
By the time I finish rumbling with you
While rhymes get loose, let's take this shit to the room
And you just keep yourself excited til' we get to the room
'Cuz I've been wetted down since I met you
I'm ready to give you the blues
Don't stunt, now take off your shoes
Don't act confused, you know what time it is
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there
Boy, give me that penis
Boy, give me that
Boy, give me, Boy, give me that there -
219How to handle theme songs or songs with few words?3/19/2015 11:16 PMThe Mission Impossible theme song (Ranked 66th in 1996) didn't become popular because of its words, nor did Tubular Bells (Ranked 79 in 1974). Even so these songs do have a few words to them. I propose labeling instrumental songs any song that approaches a word count per duration ratio between zero and (??some future defined number??).
UPDATE 1:The remaining songs that fall in this category areYear Rank Title Artist1958 8 Tequila The Champs2014 71 Animals Martin Garrix -
220Do I need to package and present my website as if it were a finished product?3/20/2015 12:15 AMAlthough the presentation of my investigation should be elegant, I think it's fair to describe my pursuit as a work in progress that, although in flux, adequately introduces my work ethic, interests, and humor. Just make it clear that findings need to be refined (source information needs further pre-processing) and will continue to be refined until there is no doubt a reasonable yet practical approximation of the historical Billboard Hot 100 corpus has been obtained.
-
221Should I backup this EverNote Notebook?3/20/2015 9:14 AMYes Yes and YesUPDATE 1:Done
-
222How to add a carriage return to the end of every text file in a folder [r]?3/20/2015 9:35 AM
-
223What categories (for aid of computer processing) ought the Billboard corpus have?3/20/2015 11:17 AMDuplicate: The song appears on multiple Billboard year end rankings.
Instrumental:1. A song void of words spoken in some language2. A song which is obviously popular because of it's instrumental score AND NOT the few words it may contain. Only a few songs fall in this categoryIncomprehensibly Foreign: I cannot understand the language nor does the percent stake of it's share in the corpus argue I learn how to understand the language. PSY's Gangnam style, discussed here, falls in this category.Languages I can navigateEnglishSpanishSimilar Latin languages such as Italian and written French (context always helps)
I once had a ten minute conversation with an Italian couple in the Grand Canyon where I spoke Spanish, they spoke their native tongue, and the three of us understood each other impressively well. # A toast to awesome connections!
Regular: A song which is NOT mostly Instrumental or Incomprehensibly Foreign - a typical radio song. (Needs technical definition) -
224How many songs scraped contain lyric data that belong to another song?3/20/2015 1:28 PMThe answer is uncertain at the moment
however I believe it is safe to say the number constitutes less than 1% of the corpus. Take for instance the lyrics scraped for San Antonio Rose by Floyd Cramer.Dime San AntonioDime donde vasParado de manos¿Dónde llegarás?(llegarás, llegarás)Dime San AntonioDime donde vasParado de manos¿Dónde llegarás?(llegarás, llegarás)Manual inspection reveals that the above, in fact, are wrongand the correct words are sung here.UPDATE 1: As it turns out this particular version of San Antonio Rose doesn't have lyrics - it's Instrumental! The song has a long history of re-recordings that not only confused my web scraper but me.By golly text mining is anything but straight and simple.UPDATE 2:This error occurred in manual screening of 300 or so files indicating that, at that rate,
of the corpus could be affected. Which is quite a bit more than the 1% I was banking on earlier. I am forced to explore ways of cross checking lyric integrity after April 1st. This means that I must post a disclaimer on my findings indicating that whatever I share is an intermediate result not to be relied on fully. Since the larger goal of this project is to communicate my ability to conduct computer enhanced investigations I think, given time constraints and having provided adequate disclaimers, there is little harm in sharing this as a "Work in progress."300/5700*100=5.3%Although it is certainly preferred to have a complete investigation to share, that day will just have to wait in full knowledge that "Good things take time." -
225What am I going to write about in my entrance essay?3/20/2015 2:25 PMDirective: Get from A to B
GeniusI'm not a geniusBut those who are have also been called other thingsNikola Tesla - The wizard of Menlo ParkEinstien - ...Artists.Magicians.Focus on the word Magician - one who performs magic (much like a wizard)But why call a human being a wizard or magician?These people did incredible things - inexplicable things.- List of some awesome smart person stats
- Item 1
- Item 2
- ...
Further describe the art of "magic" and the implications it has on the people who witness it.
Describe the sense of wonder imparted
Describe the awe.... I'm not sure what else. It's all wonder
Any magician of note, disregarding people who are otherwise very intelligent, are memorable because they had or have the capacity to impress.
Return to I'm not a genius
I'm not a genius!
I do, though, recognize the power of the wonder of the works of these genius magicians.
A wonder much like that a four year old explores her world with
A wonder that it excites and at times may down right frighten
But a wonder, none the less, that causes a day in the life of the wonderer/wanderer to be interesting, nay, exhilarating, nay - worth living.
When we cease to learn we ought as well be dead
Indeed many if not most successful things are those that accomplish some task while imparting a sense of wonder
B: I'm mesmerized by the seemingly inexhaustible capacity for computers and creative people to jointly cause unprecedented improvement in every facet of human experience. Like a kid, like a 4 year old boy, I want to keep on being amazed by the world and would like to have fun amazing the world too.
UPDATE 1:Following is an alternate possible title/hook... Clears throat for big speech by coughing...
I don't give ashitabout Big Data
That is, in the sense that Facebook and Google monitor your every digital interaction in hopes of leveraging knowledge of routines to sell you more. That is, after all, what my friends think about when introduced to my plans to become a data scientist. Didn't you want to be an engineer?, they'll ask. It's the seventh time the stars of this conversation are aligning. Be it with friends or family, everyone I talk to seems to disconnect the notions between what a data scientist does (which to be fair, the name is dumb... all scientists work with data) and what an engineer accomplishes - as if somehow the two practices are mutually exclusive. Well versed, now, I reply
Any difficult problem which requires lots of number crunching is solved or guided to a solution with statistics. Data science is the process of bolstering statistical investigation with the number crunching power of computers. Big Data is useless without Big Computers and Big Brains to control them. The Big Data mining of personality, habits and general Internet usage by Big Companies for profit is a small perhaps dark segment of data science I could care less for. On the other hand there are almost an unlimited number of other applications for Big Data and the scientists who know how to work with it. Take for instance
...Talk about Johns Hopkins professor I met 5 months ago using DNA sequencing to suggest otherwise unknowably effective medications to cancer patients...
amassing genitic information that
...Talk about some archelogoists who were interested in charting migration patterns of Neanderthals vs early farmers leading to the depiction of the story of survival of those who could cultivate their own food...
Big Data is sooo much more than logging cat picture posts.blablabla more words blablabla
-
226As it pertains to grep(), how does as.logical() transform inputs?3/21/2015 10:55 AMas.logical(1)
[1] TRUE
> as.logical(0)
[1] FALSE
> as.logical(4)
[1] TRUE
> as.logical(-1)
[1] TRUE
> as.logical(logical(0))
logical(0)> grep("Read more:","Read more: lkdsjflkjds fsllj d")>0
[1] TRUE
> grep("Read more:","lkdsjflkjds fsllj d")>0
logical(0)
> as.logical(grep("Read more:","lkdsjflkjds fsllj d"))>0
logical(0)
> as.logical(grep("Read more:","lkdsjflkjds fsllj d")>0)
logical(0) -
227How to skip execution of current loop in a for loop[r]?3/21/2015 11:40 AMnext
-
228What is logical(0), integer(0), numeric(0)...?3/21/2015 11:49 AMAll of these denominations describe a data type vector, i.e. logical or integer etc, of zero length. It is filled with nothing.
-
229How to handle integer(0) or any other data_type(0) in boolean expressions [r]?3/21/2015 11:55 AMAs suggested by this StackOverflow post, use function any()
-
230Is there a way to open files in their default application from [r]?3/21/2015 12:13 PMNot that I can find. Will attempt this in AutoHotKey.
-
231How to quickly open a set of files in their native application using AHK?3/21/2015 12:13 PM; AutoHotkey Version: 1.x
; Language: English
; Platform: Win9x/NT
; Author: A.N.Other <myemail@nowhere.com>
;
; Script Function:
; Template script (you can customize this template by editing "ShellNew\Template.ahk" in your Windows folder)
;
#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
; Write to the array:
ArrayCount = 0
dir:="C:\Users\Artemis\Documents\School\Graduate School Admissions\Essay\Dynamic\DataWrangling\Lyrics"
file = %dir%\error_file.txt
Loop, Read, %file% ; This loop retrieves each line from the file, one at a time.
{
ArrayCount += 1 ; Keep track of how many items are in the array.
Array%ArrayCount% := A_LoopReadLine ; Store this line in the next array element.
}
; Read from the array:
Loop %ArrayCount%
{
; The following line uses the := operator to retrieve an array element:
element := Array%A_Index% ; A_Index is a built-in variable.
; Alternatively, you could use the "% " prefix to make MsgBox or some other command expression-capable:
;MsgBox % "Element number " . A_Index . " is " . Array%A_Index%
fileName := Array%A_Index%
RunWait, %fileName%, %dir%
} -
232How are AHK arrays formed?3/21/2015 12:14 PMThe following is a good example from AHK's help pages
; Write to the array: ArrayCount = 0 Loop, Read, C:\Guest List.txt ; This loop retrieves each line from the file, one at a time. { ArrayCount += 1 ; Keep track of how many items are in the array. Array%ArrayCount% := A_LoopReadLine ; Store this line in the next array element. } ; Read from the array: Loop %ArrayCount% { ; The following line uses the := operator to retrieve an array element: element := Array%A_Index% ; A_Index is a built-in variable. ; Alternatively, you could use the "% " prefix to make MsgBox or some other command expression-capable: MsgBox % "Element number " . A_Index . " is " . Array%A_Index% }
-
233What's the hotkey for find/replace in Notepad++?3/21/2015 1:27 PMCTRL-H
-
234Is it possible to get AHK to wait until i'm done editing one lyric file before opening another in queue?3/21/2015 8:56 PMYes! Adapted from Gogo, the following will wait until the save window appears before making a noiseLoop
{
WinWaitActive Save ; makes the active window to be the Last Found
WinWaitNotActive ; waits until the active window changes
SoundBeep, 750, 50
}By incorporating the above into previous code, one now has an easy way to edit multiple files in a queue while minimizing carpel tunnel! YaaaAY!#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
; Write to the array:
ArrayCount = 0
dir:="C:\Users\Artemis\Documents\School\Graduate School Admissions\Essay\Dynamic\DataWrangling\Lyrics"
file = %dir%\error_file.txt
Loop, Read, %file% ; This loop retrieves each line from the file, one at a time.
{
ArrayCount += 1 ; Keep track of how many items are in the array.
Array%ArrayCount% := A_LoopReadLine ; Store this line in the next array element.
}
; Read from the array:
Loop %ArrayCount%
{
; The following line uses the := operator to retrieve an array element:
element := Array%A_Index% ; A_Index is a built-in variable.
; Alternatively, you could use the "% " prefix to make MsgBox or some other command expression-capable:
;MsgBox % "Element number " . A_Index . " is " . Array%A_Index%
fileName := Array%A_Index%
RunWait, %fileName%, %dir%
WinWaitActive Save ; makes the active window to be the Last Found
WinWaitNotActive ; waits until the active window changes
} -
235How do you spell queu?3/21/2015 8:56 PM
-
236How to detect Window title changes (the by product of closing a document in notepad++) in AHK?3/21/2015 9:03 PMThis but most importantly this AHK forum post suggest that detecting the change of a particular window's title may not be needed, in my case, if I can code to detect some action that happens synonymously with the saving (done with editing) of an interim lyric file. In simpler terms, detecting the "save as" dialog window instead of window changes is solution enough for the question before last.Even so, the following code makes a cute noise when switching windows (compliments of Gogo).
Loop { WinWaitActive A ; makes the active window to be the Last Found WinWaitNotActive ; waits until the active window changes ; ........... SoundBeep, 750, 50 }
-
237Which songs require editing on account of having error keywords?3/21/2015 11:54 PM1958 41.txt
1958 59.txt
1958 64.txt
1959 1.txt
1960 9.txt
1960 39.txt
1960 49.txt
1962 79.txt
1963 12.txt
1963 30.txt
1963 46.txt
1964 29.txt
1965 36.txt
1966 94.txt
1967 57.txt
1967 99.txt
1968 2.txt
1968 18.txt
1969 46.txt
1969 60.txt
1971 17.txt
1971 54.txt
1971 67.txt
1972 84.txt
1973 83.txt
1974 10.txt
1974 65.txt
1974 92.txt
1976 25.txt
1978 86.txt
1979 58.txt
1980 42.txt
1981 55.txt
1981 100.txt
1983 76.txt
1984 23.txt
1985 30.txt
1987 43.txt
1987 77.txt
1991 9.txt
1995 20.txt
1996 40.txt
1996 87.txt
1997 86.txt
1998 91.txt
2001 52.txt
2012 19.txt
2012 85.txt
2013 62.txt
2013 75.txt1958 21.txt
1958 56.txt
1959 81.txt
1959 84.txt
1960 19.txt
1960 98.txt
1961 35.txt
1961 70.txt
1961 79.txt
1961 84.txt
1962 11.txt
1962 35.txt
1962 41.txt
1962 81.txt
1962 85.txt
1964 76.txt
1964 83.txt
1965 2.txt
1965 5.txt
1965 39.txt
1965 93.txt
1966 4.txt
1966 32.txt
1966 42.txt
1966 53.txt
1966 78.txt
1966 80.txt
1968 5.txt
1968 76.txt
1969 3.txt
1969 79.txt
1970 6.txt
1970 28.txt
1970 31.txt
1970 63.txt
1970 71.txt
1971 36.txt
1971 39.txt
1971 49.txt
1971 53.txt
1971 76.txt
1972 45.txt
1972 61.txt
1972 68.txt
1973 20.txt
1973 23.txt
1973 89.txt
1973 98.txt
1974 16.txt
1974 21.txt
1974 33.txt
1974 55.txt
1974 56.txt
1974 99.txt
1975 70.txt
1976 8.txt
1976 26.txt
1976 30.txt
1976 59.txt
1977 63.txt
1977 66.txt
1978 18.txt
1978 70.txt
1979 30.txt
1979 73.txt
1979 85.txt
1980 61.txt
1980 78.txt
1980 96.txt
1981 6.txt
1981 16.txt
1981 89.txt
1981 95.txt
1982 33.txt
1982 57.txt
1982 64.txt
1982 74.txt
1983 23.txt
1983 41.txt
1983 47.txt
1983 91.txt
1984 3.txt
1985 20.txt
1985 63.txt
1986 3.txt
1986 12.txt
1986 18.txt
1986 39.txt
1986 43.txt
1986 58.txt
1986 64.txt
1986 90.txt
1986 93.txt
1987 17.txt
1987 26.txt
1987 37.txt
1987 80.txt
1987 87.txt
1987 89.txt
1988 4.txt
1988 18.txt
1988 22.txt
1988 32.txt
1988 54.txt
1988 66.txt
1988 74.txt
1988 93.txt
1990 8.txt
1990 58.txt
1990 59.txt
1991 5.txt
1991 10.txt
1991 25.txt
1991 27.txt
1991 38.txt
1991 45.txt
1991 58.txt
1991 60.txt
1991 62.txt
1991 67.txt
1991 84.txt
1991 93.txt
1992 11.txt
1992 26.txt
1992 27.txt
1992 31.txt
1992 48.txt
1992 49.txt
1992 63.txt
1992 67.txt
1992 77.txt
1992 84.txt
1992 88.txt
1992 94.txt
1993 24.txt
1993 59.txt
1993 67.txt
1993 74.txt
1993 75.txt
1993 84.txt
1993 98.txt
1994 36.txt
1994 55.txt
1994 59.txt
1994 60.txt
1994 71.txt
1994 81.txt
1994 93.txt
1995 2.txt
1995 10.txt
1995 29.txt
1995 43.txt
1995 80.txt
1995 84.txt
1995 86.txt
1996 46.txt
1996 52.txt
1996 64.txt
1996 86.txt
1996 87.txt
1996 91.txt
1996 97.txt
1997 5.txt
1997 17.txt
1997 24.txt
1997 41.txt
1997 44.txt
1997 61.txt
1997 90.txt
1998 1.txt
1998 16.txt
1998 19.txt
1998 40.txt
1998 45.txt
1998 65.txt
1998 68.txt
1998 75.txt
1998 83.txt
1998 88.txt
1999 39.txt
1999 55.txt
1999 58.txt
1999 61.txt
1999 63.txt
1999 67.txt
1999 71.txt
1999 98.txt
2000 16.txt
2000 18.txt
2000 25.txt
2000 51.txt
2000 82.txt
2000 91.txt
2001 3.txt
2001 16.txt
2001 41.txt
2001 56.txt
2001 58.txt
2001 64.txt
2001 77.txt
2001 80.txt
2001 82.txt
2002 13.txt
2002 30.txt
2002 54.txt
2002 73.txt
2003 38.txt
2003 41.txt
2003 69.txt
2003 89.txt
2004 13.txt
2004 23.txt
2005 30.txt
2005 33.txt
2005 87.txt
2005 98.txt
2006 10.txt
2006 46.txt
2006 51.txt
2006 62.txt
2006 65.txt
2007 12.txt
2007 27.txt
2007 43.txt
2007 56.txt
2008 64.txt
2008 65.txt
2008 66.txt
2009 12.txt
2009 49.txt
2010 50.txt
2010 55.txt
2012 57.txt
2013 77.txt -
238How to create and publish graphs to plotly?3/22/2015 2:58 PMGet setup!
install.packages("devtools") library("devtools")
install_github("ropensci/plotly")
library(plotly)
Authenticate
Plot something
library(plotly)
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
qplot(carat, price, data=dsamp, colour=clarity)
py <- plotly()
py$ggplotly()
Make graph and publish to JSON or JavaScript. Alternatively one could embbed the code
UPDATE 1:
Although it is awesome and powerful, I will not be using Plot.ly
-
239What is iFrame HTML code?3/22/2015 3:10 PMPlotly's fallback sharing mechanism is to use iFrames.
iFrames are good for several reasons1. Advantage of being able to visually show data from other domains without letting them stomp all over your page with unlimited access2. If one wants to show a PDF, an opened iframe can let the Adobe Reader plugin show that file.3. They encapsulate all the data that one is trying to embed meaning it can't interfere with other code
There seem to be a lot of people who hate iFrames.
-
240Should I use iFrame or Javascript to share Plotly code?3/22/2015 3:16 PMJavascript all the way.iFrame is at best a fallback for sharing content.
-
241How is JavaScript integrated into a web page?3/22/2015 4:14 PMIntegrate Javascript from an external source
Scripts can also be placed in external files.
External scripts are practical when the same code is used in many different web pages.
JavaScript files have the file extension .js.
To use an external script, put the name of the script file in the src (source) attribute of the <script> tag:
Example
<!DOCTYPE html><html><body><script src="myScript.js"></script></body></html> -
242What is Node.js?3/22/2015 4:42 PM
-
243How to embed Plotly graph using an iFrame?3/22/2015 4:57 PMAlthough iFrames are generally an unpreferred method... Embedding Interactive Graphs in Blogs and Websites
-
244How to embed a graph from Plotly without iFrames?3/22/2015 5:01 PMdon't care... am using dygraph instead of Plotly
-
245What am I not understanding about finding the maximum value in a dataset in [r]?3/22/2015 8:05 PMNote the following code:> which.max(words)
[1] 4020
> max(words)
[1] "995"
> words[4020]
[1] "3195"UPDATE 1: words is not numeric!> summary(words)
Length Class Mode
5431 character characterUPDATE 2:Unique count statistics have been thrown off by the following errors> dmaster[4502,]Year Rank Title Artist Duplicate Instrumental LineCount uCount4666 2004 66 One Thing Finger Eleven FALSE FALSE 53 3> length(unique(scan("2004 66.txt", character(0))))Read 3 items[1] 3Warning message:In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :EOF within quoted string -
246How to avert Warning message: EOF within quoted string [r]?3/22/2015 9:29 PMAccording to this StackOverflow post, add
to length(unique(scan("2004 66.txt", character(0), quote = "")))quote = ""
-
247How does having a domain name through one company and hosting through another work?3/22/2015 11:16 PMAlthough not hosting through Host Gator, they say:You can leave the domain registered where it is and change the name servers to point to host. This would make it so that the host servers manage your DNS and you can make DNS changes directly from your hosting control panel.ASmallOrange has this to say:
In NameCheap:n order for people to find your website, you will need to configure DNS and set nameservers. You can either use our nameservers, or you can use third-party name servers. Either way, you would need to configure them so your domain works with your A Small Orange web hosting account.
Need to remove registrar lockNeed to point NameCheap to a custom DNS server
Set DNS to
The above is a result of following the directions provided at A Small Orange but for the company I purchased my domain from (NameCheap):ns1.asoshared.com
ns2.asoshared.com
Click My Domains
Click the View Details button directly adjacent to the domain in question
Untick the Registrar Lock box so that the domain is temporarily unlocked and click Save
In the nameserver fields, change them to your new host's nameservers
Tick the Registrar Lock box again so that the domain is locked
- Save your changes
UPDATE 1:A domain name was purchased and made separate from a hosting company by recommendation of a friend. In hind sight, being a newbie and all to this, it would have been easier to buy the domain name and hosting through one company. -
248What's a quick way to learn how to format my EverNotes for the visual style I'm trying to achieve?3/23/2015 5:37 PMFind a website that implements a similar visual style and adapt its source code.Muaaahaahaaa - I found a site that implements something close to what I want... Figures it'd be Google's.
-
249What is wordle?3/23/2015 5:39 PMWordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text.It's something I might want to use for this.
-
250How to make a collapsible FAQ in html?3/23/2015 5:51 PMCheck out this demo site!This tutorial by jasalguero seems a good place to start figuring this out.
-
251Has somebody already figured out how to turn exported EverNote HTML notebooks into a collapsible lists?3/23/2015 5:53 PMIf so, Google doesn't think they belong on the first two pages of the above search query.
-
252What happens when one tries to expand huge lists like the one I will inevitably make by exporting this notebook?3/23/2015 6:07 PMTime will tell.UPDATE 1:MaterializeCSS and the javaScript behind it seem to be handling 400+ collapsible items without any page lag.
-
253What is jsFiddle?3/23/2015 6:23 PMA custom environment (based on user selections) to test (or fiddle with) your JavaScript, HTML, and CSS code right inside your browser. - techrepublicThe website:An example:
-
254If I copy and past the source HTML from the following website, will the collapsible lists work?3/23/2015 6:45 PM
-
255How does Google make it's fancy schmancy collapsible lists?3/23/2015 6:46 PMThey worked hard. Otherwise... Unknown.Hunting for the above answer, though, did guide me to Narayan Prusty's site on the Materialize CSS Framework that included a drop down box behavior, reminiscent of the interactivity of menus on a smartphone, I really like.
-
256What is CODEPEN?3/23/2015 7:21 PMIt's pretty much like jsFiddle and is awesome too.
-
257What is S.O.L.I.D. Object Oriented Design?3/23/2015 7:33 PMS.O.L.I.D is an acronym for the first five object-oriented design(OOD) principles by Robert C. Martin
- S – Single-responsibility principle
- O – Open-closed principle
- L – Liskov substitution principle
- I – Interface segregation principle
- D – Dependency Inversion Principle
-
258Do I just want to download a free themed HTML and modify it?3/23/2015 7:41 PMThese free HTML templates look awesome but I will make mine semi-from-scratch.UPDATE 1:Turns out my entire website is built was built in text editor. I can't really say a template was used.
-
259How are links within an EverNote note preserved when exported to HTML?3/23/2015 8:01 PMConsider the following code:Every note has a name as in line 144.Hyperlinks to other notes are simply href calls to that note's name as in line 154
-
260How large [Mb] will this notebook be when exported to HTML?3/23/2015 8:06 PMThe size of the HTML (couple hundred kb) + the size of all the images it references.The folder where not only these images are stored but every file associated with this HTML file will be exported as a
'hidden'folder to the same directory as the HTML file. At 266 notes, this folder currently holds ~1MB in files.Note how EverNote exports check boxes below:Answer:A couple megabytes.UPDATE 1:I guess the folder wasn't hidden... I'm just blind. -
261How are images referenced in HTML exported EverNote notebooks?3/23/2015 8:12 PMConsider the following code (which is the HTML version of a previous note):The HTML file references images stored locally in a folder called, per line 4437, Admissions Website_files (who's prefix is the name of this local notebook in EverNote)All of these images will need to be packaged in a folder and sent to my server and any reference calls to them will need to be updated to work on that server.
-
262What's the legacy reach of materializecss?3/23/2015 8:40 PMmaterializecss supported browsers:Chrome 35+, Firefox 31+, Safari 7+, IE 10+Most current browser versions:Chrome 40, Firefox 31+, Safari...don't care
-
263Is Materializecss too new to use?3/23/2015 8:43 PMBetween Chrome and Firefox being the most popular browsers and then people/those browsers being pretty good about updating it should not be problematic for most of my intended audience... I hope.
-
264What is a good analogy that describes the interaction between HTML, CSS, and JavaScript?3/23/2015 9:15 PMIf web design is a puppet show, then HTML is the marionettes, CSS the clothes they are wearing and JavaScript the strings that bring them to life. And who is the Puppet Master you ask? You are.
-
265Can two or more different HTML files request the same resource (i.e. an image)?3/23/2015 9:19 PMYes, absolutely. Although one or more will have to wait until an earlier request finishes reading it if several requests occur simultaneously.
-
266What is a .eot file?3/23/2015 9:45 PMEmbedded OpenType (EOT) fonts are a compact form of OpenType fonts designed by Microsoft for use as embedded fonts on web pages. These files use the extension " .eot ". They are supported only by Microsoft Internet Explorer, as opposed to competing WOFF files.
-
267What are WOFF files?3/23/2015 9:46 PMWOFF (the Web Open Font Format) is a web font format developed by Mozilla in concert with Type Supply, LettError, and other organizations. It uses a compressed version of the same table-based sfnt structure used by TrueType, OpenType, and Open Font Format, but adds metadata and private-use data structures, including predefined fields allowing foundries and vendors to provide license information if desired.
There are three main benefits to using WOFF:- The font data is compressed, so sites using WOFF will use less bandwidth and will load faster than if they used equivalent uncompressed TrueType or OpenType files.
- Many font vendors that are unwilling to license their TrueType or OpenType format fonts for use on the web will license WOFF format fonts. This improves availability of fonts to site designers.
- Both proprietary and free-software browser vendors like the WOFF format, so it has the potential of becoming a truly universal, interoperable font format for the web, unlike other current font formats.
-
268How do I view the icons in a .woff file?3/23/2015 9:50 PMDon't really need to. To see what the Material-Design-Icons look like go to the github repository where Google has shared them with the world.
-
269What is the anatomy of a MaterializeCSS collapsible element?3/23/2015 9:59 PM
-
270What is <td> [HTML]?3/23/2015 10:09 PM
<table></table> Creates a table
<tr></tr> Sets off each row in a table
<td></td> Sets off each cell in a row
<th></th> Sets off the table header (a normal cell with bold, centered text) -
271What is a good HTML editor?3/23/2015 10:13 PM
-
272...or, are there any tips for coding HTML in Notepad++?3/23/2015 10:14 PMYes, this youtube video sets up Notepad++ nicely. Note that in new versions of Notepad one does not need to download TextFX Character for tag autocompletion as it's now included. Simply enable it in the preferences menu.
-
273Why doesn't XML tools' (in Notepad++) pretty print function work on EverNotes HTML export?3/23/2015 10:56 PMBecause EverNote doesn't nest all of its tags upon export - it just poops them one atop the other
-
274How to expand a collapsible item from link within another collapsible item [js]?3/23/2015 11:42 PMCheckout my SO post How to expand a collapsible item from link within another collapsible item [js]?:
Courtesy of skobaljic...
<!DOCTYPE html>
<html>
<head>
<!--Import materialize.css-->
<link type="text/css" rel="stylesheet" href="css/materialize.min.css" media="screen,projection"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no"/>
</head>
<body>
<div>
<ul class="collapsible" data-collapsible="accordion">
<li>
<div class="collapsible-header"><i class="mdi-navigation-chevron-right"></i>First</div>
<div class="collapsible-body">
<p>Hello StackOverflow! SO's da' bomb diggidy!</p>
</div>
</li>
<li>
<div class="collapsible-header"><i class="mdi-navigation-chevron-right"></i>Second</div>
<div class="collapsible-body">
<p>Why is the person who invests your money called a broker?</p>
</div>
</li>
<li>
<div class="collapsible-header"><i class="mdi-navigation-chevron-right"></i>Third</div>
<div class="collapsible-body">
<p>I'd like to <a href="#" data-click=".collapsible .collapsible-header:first">open the First collapsible element</a> in this list.</p>
</div>
</li>
</ul>
</div>
<!--Import jQuery before materialize.js-->
<script type="text/javascript" src="https://code.jquery.com/jquery-2.1.1.min.js"></script>
<script type="text/javascript">
$('[data-click]').on('click', function (e) {
$( $(this).data('click') ).trigger('click');
});
</script>
<script type="text/javascript" src="js/materialize.min.js"></script>
</body>
</html> -
275How to indent in HTML?3/24/2015 8:04 PMUse <p> tags.
-
276How will I get help answering my SO post?3/24/2015 4:47 PMMake a FaceBook post and promise alcohol.$40 worth of Adult Beverages (or cash if preferred) to the first friend or friend of a friend who solves my JavaScript question before this 4 hour timer beeps:http://bit.ly/1BiUuc2.
-
277How will I convert EverNote Notebooks into beautiful web pages?3/24/2015 8:31 PM1. Focus on formatting/converting 1 note into it's beautiful form.2. Write script to convert every other3. Integrate lessons learned later (i.e. how to link one note to another).1. Convert one note<li><div class="collapsible-header"><i class="mdi-navigation-chevron-right"></i>Note Title Goes Here</div><div class="collapsible-body"><p>Note Goes Here</p></div></li>In EN's exported HTML:Every <h1> tag is a note titleEvery <h1> tag is preceded by a <a name="#SomeUniqueNumber"/> tagNote title's, note creation date/tags, and note contents are node siblings a.k.a not nested within one another.2. Script1. Map all <a> to their respective <h1> tags and creation date labels in hash table.2. Identify all note divs.2. Populate Note title in <div class="collapsible-header"><i class="mdi-navigation-chevron-right"></i>Note Title Goes Here</div> according to hash table3.
-
278I need to get jQuery 1.11.0 to make my lists dynamically collapse. Where do I get that?3/24/2015 9:29 PM1.11.2 is available at https://jquery.com/download/
UPDATE 1:It doesn't look like I actually need to get jQuery. The code that initializes Materialize includes jQuery -
279How to load js in HTML?3/24/2015 9:41 PMPer The best way to load external JavaScript, it's as simple as.
<script type="text/javascript" src="http://your.cdn.com/first.js"></script>
Script load commands should be placed at the bottom of the page, just inside the</body> tag. There are many ways to optimize load times of sites for which the above link and others should be consulted.
Apparently it's also possible to inline JavaScript within HTML. -
280How to inline JavaScript in HTML?3/24/2015 10:12 PMAs it turns out several lines of code I've been working with in HTML were JavaScript I didn't even know it.
A js line within HTML is anything that begins with the tag
<script type="text/javascript">
Hence, the lines
<script type="text/javascript" src="https://code.jquery.com/jquery-2.1.1.min.js"></script>
<script type="text/javascript" src="js/materialize.min.js"></script>
which initializes javascript from an external file, are actually javascript! Such is defined by the type="text/javascript part. It all clicked when reading How does inline Javascript (in HTML) work? -
281How to change EverNotes default secondary font?3/24/2015 10:24 PMCode copied into EverNote becomes really ugly (font changes to monospace). Don't think it's possible to configure 'secondary' font.
-
282How does EverNote choose to convert fonts?3/24/2015 10:25 PMDoesn't matter. No time for these questions!
-
283Are there any keyboard shortcuts for changing font type in EverNote?3/24/2015 10:50 PMDoesn't look like it. Poop.Could always make an AHK script to do this...UPDATE 1:Work here has been postponed indefinitely.
-
284What does <br/> mean [HTML]?3/24/2015 10:56 PMThe <br> tag is an empty tag which means that it has no end tag.
-
285How to right justify in HTML?3/24/2015 11:12 PM<div align="right">This is some text!</div>
-
286How to pad alignments in HTML?3/24/2015 11:14 PMSimply putting whitespace either before or after text to be padded doesn't work (at least not literal whitespace in code). Instead, one could specify this and other styling properties in CSS. This demo answers the question.
-
287Am I going to make a sub-optimal website?3/24/2015 11:20 PMTime is money and money is time - I have neither and something's gotta give.
-
288What tags does EverNote use in its export?3/24/2015 11:25 PM</div><hr><table><tr></tr> Sets off each row in a table<td></td> Sets off each cell in a row<a><br/><b><i><ul><li><pre><span><blockquote>
-
289How to position page elements like a pro?3/25/2015 6:20 PM
-
290How to center <divs>?3/25/2015 8:21 PM
-
291Why not just center the <body> tag?3/25/2015 8:33 PM
-
292Why are my EverNote bullets disappearing when I put them in Materialize!!??!!3/25/2015 9:25 PMObserve the highlighted portions of the following image: perhaps it's possible that the unordered list (<ul>) tags within original notes conflict with the collapsible element tags (which are also <ul>) expected by Materialize.
-
293Does the CSS property (if 'property' is the right word) margin pad the vertical space between a div's contents?3/25/2015 9:48 PMThe following will pad the contents 10px away from the top and bottom edge and 50px from the left and right edge of a div container.
div.note {
margin: 10px;padding-left: 50px;padding-right: 50px;} -
294Should I reformat certain notes in EverNote before publishing this to HTML?3/25/2015 9:53 PMMixing bold and regular font within a single hyperlink within EverNote seams to export with weird behavior.Note this screenshot of the exported HTML and the gaps between the hyperlink:
Consequentially, I should remove bold formatting text from hyperlinks. -
295How to make images resize to fit in div?3/25/2015 10:13 PMDo not apply an explicit width or height to image styling. Instead, give it:
img {
max-width: 100%;
max-height: 100%;
}Example: http://jsfiddle.net/xwrvxser/1/
-
296How to customize Materialize card-panels?3/25/2015 10:27 PMBasic modifications can be experimented with here as a result of forking scotch.io's card-panel example. Beyond that one would need to read the materialize manual (if there is one - I would hope there is one) to get fancy.
-
297Are card-panels over kill for displaying a simple time stamp?3/25/2015 10:27 PMI think so. I will not pursue implementing a card-panel.
-
298What do the s and m parts in a card-panel instantiation mean?3/25/2015 10:30 PMThe following created a card panel:
Using this CODEPEN, one can deduce that the 'm' number configures the width of the card-panel and I still don't know what 's' does.<div class="row"> <div class="col s12 m5"> <div class="card-panel teal"> <span class="white-text">I am a very simple card. I am good at containing small bits of information. I am convenient because I require little markup to use effectively. I am similar to what is called a panel in other frameworks. </span> </div> </div> </div>
-
299Is there a MaterializeCSS manual?3/25/2015 10:41 PMNot that I can find. Poop!This is the best I can come up with: http://materializecss.com
-
300What is the MaterializeCSS color palette?3/25/2015 10:53 PM
-
301What's the modern way of centering text within a div (or paragraph)?3/25/2015 11:23 PM
-
302I've seen various height properties with units of 'em' instead of pixels - what does 'em' mean?3/25/2015 11:35 PMStackOverflow:When used to specify font sizes, the em unit refers to the font size of the parent element. So, in the previous example, the font size of the H1 element is set to be twice the font size of the BODY element. To find what the font size of the H1 element will be, we need to know the font size of BODY. Because this isn't specified in the style sheet, the browser must find it from somewhere else – a good place to look is in the user's preferences. So, if the user sets the normal font size to 10 points, the size of the H1 element is 20 points. This makes document headlines stand out relative to the surrounding text. Therefore: Always use ems to set font sizes!...On another hand, one could set font size dynamically using Materialize's Flow Text Typography
-
303How to permanently set search time frame to 'Last Year' In Google?3/25/2015 11:42 PMWith a plethora of old web development tutorials online it is prudent to filter out the old ladies and only focus on the newest tips!It takes a work-around but it is possible (because one cannot set default Time Range settings) by creating a bookmark which forces Google to search year or younger articles: https://www.google.com/webhp?tbs=qdr:yInstead of creating a bookmark I set Firefox to open https://www.google.com/webhp?tbs=qdr:y whenever a new tab is opened. Here's how to do that!
-
304How to make a new tab open to a particular web page in Firefox?3/25/2015 11:54 PMCourtesy of Curtis Parfitt-Ford Mozilla Support:
- In the Location bar, type about:config and press Enter. The "This might void your warranty!" warning page may appear.
- Click I'll be careful, I promise! to continue to the about:config page.
- Type browser.newtab.url in the search box to find it on the list.
- Double-click on browser.newtab.url and type in the URL (https://www.google.com/webhp?tbs=qdr:y) you want to open in a new tab
Restart Firefox and open a new tab to make sure the change was made.UPDATE 1:This is nice because one needn't login to Google to use it. -
305What is the <code> tag (It looks like it can do cool things)?3/26/2015 12:19 AMIt's for formatting the text entered in said tag for visual interpretation as code.
-
306How to quickly make a website in adobe?3/26/2015 6:31 PMThese aren't the best answers...This seems like an outdated video but manages to useful for becoming more familiar with Dreamweaver.TechnicalCafe shows how to make a multi-page website using notepad++.
-
307Should I export a new test EverNote notebook to run my [r] script on?3/26/2015 7:41 PMYes, start with a fresh copy. This question took about 30x longer to write than it did to answer.Please Note:The goal here is to show process... however complicated or simple it may be.
-
308How to read XML from file in [r]?3/26/2015 7:50 PMPer A Short Introduction to the XML package for R (also by Duncan Temple Lang - this guy's a programming beast)...
To parse an XML document, you can use xmlInternalTreeParse() or xmlTreeParse() (with useInternalNodes specified as TRUE or FALSE) or xmlEventParse(). If you are dealing with HTML content which is frequently malformed (i.e. nodes not terminated, attributes not quoted, etc.), you can use htmlTreeParse(). You can give these functions the name of a file, a URL (HTTP or FTP) or XML text that you have previously created or read from a file.
Per Tobi Bosede,xmlfile=xmlParse("fileName.xml")
-
309How to determine how many children a node has in [r]?3/26/2015 7:54 PMUse
length(xmlChildren(node))
xmlSize(node) -
310What's the difference between xmlInternalTreeParse(), xmlTreeParse() and xmlEventParse() [r]?3/26/2015 8:03 PMNot 100% sureUPDATE 1:Solved my problem and don't need to care.
-
311How to read HTML from file in [r]?3/26/2015 8:11 PM
rawHTML <- paste(readLines("path/to/file.html"), collapse="\n")
-
312Can you create HTML in [r] in way that's not simply writing text in a loop?3/26/2015 8:23 PMSeems possible.Take a look at the bottom of http://www.omegahat.org/RSXML/shortIntro.pdf
-
313Should I read this article on Functional Programming in [r] later?3/26/2015 8:31 PMHells yes! Check out its outline
Motivation motivates functional programming using a common problem: cleaning and summarising data before serious analysis.
Anonymous functions shows you a side of functions that you might not have known about: you can use functions without giving them a name.
Closures introduces the closure, a function written by another function. A closure can access its own arguments, and variables defined in its parent.
Lists of functions shows how to put functions in a list, and explains why you might care.
Numerical integration concludes the chapter with a case study that uses anonymous functions, closures and lists of functions to build a flexible toolkit for numerical integration.
-
314How to get parent nodes in [r]?3/26/2015 8:37 PM
-
315How to get root nodes in [r]?3/26/2015 8:37 PMUse xmlPathapply or xmlRoot{XML}
-
316How to get a XML node of interest in [r]?3/26/2015 8:47 PMRefer to the Tree/DOM-based parsing section of Duncan's short XML Intro
XPath is an XML technology that provides a language for accessing subsets of an XML tree. It allows us to express things such as "find me all nodes named a" or "find me all nodes name a that have no attribute named b" or "nodes a that have an attribute b equal to 'bob'" or "find me all nodes a which have c as an ancestor node". It has a similar feeling to R's subsetting capabilities and works for trees rather than vectors and data frames. It is also very powerful and efficient. But it takes a little time to learn. Some decent tutorials are available on the Web (e.g. Zvon and w3schools) and there are books that cover this subject, e.g. [XML in a Nutshell], [XPathXPointer].
The XPath functions in the XML package are getNodeSet() and xpathApply() . Basically, you specify the document returned from xmlInternalTreeParse() and the XPath expression to identify the nodes. getNodeSet() returns a list of the matching nodes. xpathApply() is used to apply a function to each of those nodes, e.g. find nodes named "a anywhere in the tree that have an "href" attribute and get the value of that attributesrc = xpathApply(doc, "//a[@href]", xmlGetAttr, "href")
Of course, once we have the nodes of interest, we need to be able to extract their information. There are several functions to do this: xmlName() , xmlAttrs() , xmlGetAttr() , xmlChildren() and xmlValue() . xmlName() gets the name of the node/element. xmlAttrs() returns all the attribute name-value pairs as a character vector while xmlGetAttr() is used to query the value of a single attribute with facilities for providing a default value if it is not present and converting it if it is. We tend to use xmlGetAttr() as we typically know which attributes we are looking for. xmlAttrs() is used when doing general/meta- computations.
-
317How to parse note titles from EverNote HTML export in [r]?3/26/2015 9:27 PM#Get Note Titles
html.titles<-xpathApply(doc, "//h1", xmlValue) -
318How to parse note title anchors from EverNote HTML export in [r]?3/26/2015 9:33 PM#Get Note Title Anchors
html.tAnchors<-xpathApply(doc, "//a[@name]", xmlGetAttr, "name") -
319How to parse note creation date from EverNote HTML export in [r]?3/26/2015 9:34 PM#Get Note Creation Date (Has text in it)
html.aDates<-xpathApply(doc, "//table[@bgcolor]", xmlValue)
UPDATE 1:Get only date information with (this helps):html.aDates<-xpathApply(doc, "//table[@bgcolor]/tr/td/i", xmlValue) -
320I assumed note titles, creation dates and anchors where associated 1 to 1 - why do I have an extra anchor?3/26/2015 9:37 PMObserve this variable snapshot from RStudio:
An additional (rogue?) anchor appears to have been introduced to my parsed list. Either that or a note title AND date are somehow getting overlooked by xpathApply(). The mismatch might have been caused by a note that was once dubbed "Conflicting" by EverNote as a result of me "unsharing" (clicking the unshare button in EverNote) it after it was accidentally shared. At any rate, smaller files don't have the problem.UPDATE 1:Much to my increased confusion, Notepad++ reports 319 hits of the anchor tag... not 320.
UPDATE 2:The difference between the rStudio and notepad++ lists of "<a name=" occurrences is
In my HTML this means that note 1102 is to blame. Thus it is in a sick twist of fate that the same guy who made the website that I copied the following 'error causing text' from is the guy who wrote the [r] package I used to have the luxury of having an error in the first place...diff<-anchs[is.na(match(anchs,truth)),]> diff[[1]][1] "tex2html11" -
321Can I compare the Notepad++ anchor hits to those parsed in [r]?3/26/2015 10:07 PMYeah Buddy!UPDATE 1:This was over kill... I should've made a manual sanity check through the 300 or so element list of anchors in [r] rather than assuming I wouldn't be able to find the errant anchor on account of it being some indistinguishable number (from the others). Turns out it was a simple textual item.
-
322I think I've confused my Set Operation nomenclature - what's the truth!?3/26/2015 10:15 PMThis MATLAB document helps clear the water...
It's official - What I've been calling 'Difference' is actually 'Intersection.' NOTE: probably every previous use of the word "Difference" within a 'Set' context is incorrect.Well I feel like a genius.UPDATE 1:I think I'm confused about being confused... I have several examples of me knowing what I'm doing...here's 1...here's another... -
323Should I improve my method for parsing note title anchors from EverNote HTML export in [r]?3/26/2015 10:56 PMYes. But not this week.UPDATE 1:Done!html.titles<-xpathApply(enHTML, "//h1", xmlValue)
-
324Is this Sax/Drum duet by Dave Mathews Band Freakin' Awesome?3/26/2015 11:06 PM
-
325How to get all parent XML div tags [r]?3/26/2015 11:29 PMThe question above is poorly constructed - it does not capture my intent. Instead I am concerned with obtaining the branch structures of all outer most divs within an XML file.
-
326How to parse tags in HTML/XML tags according to NOT conditions in [r]?3/26/2015 11:57 PMHere's how to implement a not condition:
Checkout my SO post.tst<-xpathApply(doc, "//div[not(table[@bgcolor='#D4DDE5'])]", xmlValue) -
327How can I generate awareness of my SO post?3/27/2015 3:18 AMEmail the Chicago R Users group!Have a bodacious day!Good day CRUGers!I have posted a StackOverflow question that could use the wisdom of anyone familiar with XML parsing. The question comes as a byproduct of the need to reformat EverNote output such that it can retain it's elegance (or any custom style for that matter) when moved out of the client and into a website. If you're interested, have a spare moment or need something to help you procrastinate, help remove the entropy in my life and head on over to How to parse HTML/XML tags according to NOT conditions in [r].
-
328How to parse intra-note links from EverNote HTML export in [r]?3/27/2015 10:41 AMhtml.iLinks<-xpathApply(doc, "//a[starts-with(@href, '#')]", xmlGetAttr, "href")
-
329What do I need to do to get this website up in 3 days?3/27/2015 11:56 AMFinish EverNote converterFormat 1 page fully - make a templateFind two MaterializeCSS example sites that I'd like to replicate elements from and make it my own.Invest minimal time in styling... Can tweak this laterClone and tweak other pages from templateMake EverNote usage dygraphEmbed EverNote usage dygraph in HTMLRefine BillBoard Analysis and export to dygraphEmbed BillBoard Analysis dygraph in HTMLLink all pages together by navigation menuPush files to server and make sure they work
-
330How to get the branch structure of all outer most note HTML divs in [r]?3/27/2015 1:36 PM
Note the difference in how HTML is interpreted by [r]
UPDATE 1:The XML package can either parse the document into a tree structure of R objects (as with using htmlTreeParse) or into a tree structure of pointers to C-level objects. In the latter case, the parsed structure is maintained as lower-level objects in memory, and is not immediately accessible in R. Indeed, incorrectly accessing the parsed document object can cause R to crash. However, parsing the document into this C-level structure internal to libxml2 permits the use of XPath expressions. For more, do help("xmlParse").Checkout my StackOverflow post linked at this note.//div[not(table[@bgcolor='#D4DDE5']) and not(ancestor::div)]
-
331How to get every row who's name matches div in [r]3/27/2015 2:06 PM
which(row.names(s) == "div") -
332Make xml file in [r]?3/27/2015 2:59 PM
-
333How does newXMLNode() in [r] work?3/27/2015 3:05 PMhttp://www.inside-r.org/packages/cran/XML/docs/newXMLDoc
-
334What is the worst type of failure in [Life]?3/27/2015 4:23 PMAndrew! Why are you asking this right now!?
-
335How to subdivide XMLNode into constituent nodes [r]?3/27/2015 6:26 PMProblem solved and this not answered.
-
336What types of [r] XML elements are there?3/27/2015 6:29 PMenRoot=XMLNodecbdiv=XMLInternalElementNodeenHTML=HTMLInternalDocumentenTree=XMLDocumentContentenXML=XMLInternalElementNode
-
337How to add an XMLNode to an XMLInternalElementNode [r]?3/27/2015 6:32 PMAfter plenty of head banging it is apparent that one cannot add XMLNodes (and Branches for that matter) to XMLInternalElementNode without brute, explicit mapping of every sub-node and leaf of an XMLNode to an XMLInternalElementNode (or tree). Refer to much later note
-
338What is saveXML() [r]?3/27/2015 6:34 PMMethods for writing the representation of an XML tree to a string or file. Originally this was intended to be used only for DOMs (Document Object Models) stored in internal memory created via xmlTree, but methods for XMLNode, XMLInternalNode and XMLOutputStream objects (and others) allow it to be generic for different representations of the XML tree.
saveXML(doc, file=NULL, compression=0, indent=TRUE, prefix = '<?xml version="1.0"?>\n', doctype = NULL, encoding = "", ...)
-
339How to add branch to XML node [r]?3/27/2015 6:39 PMRead earlier note
-
340Are Susan Cain's "High Reactives" statistically likely to be diagnosed ADD?3/27/2015 6:43 PMSmall noises distract me. Eliminating audible distractions by using earplugs helps to boost my focus.
-
341How to index XML nodes in [r]?3/27/2015 6:52 PMThe following are two very good articles for learning the basics of indexing XML in [r].
Indexing can be accomplished in several ways:xmltop[[1]][[1]][[5]][[2]] #title of first article xmltop[['PubmedArticle']][['MedlineCitation']][['Article']][['ArticleTitle']]
UPDATE 1: One must also read XPath Syntax. -
342What is xmlDOMApply {XML}[r]?3/27/2015 9:17 PM
This recursively applies the specified function to each node in an XML tree, creating a new tree, parallel to the original input tree. Each element in the new tree is the return value obtained from invoking the specified function on the corresponding element of the original tree. The order in which the function is recursively applied is "bottom-up". In other words, function is first applied to each of the children nodes first and then to the parent node containing the newly computed results for the children.
-
343What is a XML Namespace?3/27/2015 9:50 PMXML Namespace is a mechanism to avoid name conflicts by differentiating elements or attributes within an XML document that may have identical names, but different definitions. We will be covering the basics of namespace, including declaration methods, scope, attribute namespace, and default namespace
-
344What is a URI?3/27/2015 10:01 PMA uniform resource identifier is a namespace name - a string of characters used to identify a name of a resource.
-
345What is cat() [r]?3/27/2015 10:16 PMConcatenate and Print - Outputs the objects, concatenating the representations. cat performs much less conversion than print.
-
346Can I use addChildren() to add an XMLNodeSet to an XMLInternalNode[r]?3/27/2015 10:27 PMNo, because the 'kid' argument needs to be a list of children nodes which should be of the same "type" (i.e. internal or R-level nodes) as the
node
argument.addChildren(node, ..., kids = list(...), at = NA, cdata = FALSE, append = TRUE) removeChildren(node, ..., kids = list(...), free = FALSE) removeNodes(node, free = rep(FALSE, length(node))) replaceNodes(oldNode, newNode, ...) addAttributes(node, ..., .attrs = NULL, suppressNamespaceWarning = getOption("suppressXMLNamespaceWarning", FALSE), append = TRUE) removeAttributes(node, ..., .attrs = NULL, .namespace = FALSE, .all = (length(list(...)) + length(.attrs)) == 0)
However, they can also be regular strings in which case they are converted to XML text nodes. -
347Can I convert my XMLNodes to string and then add them to a XMLInternalNode[r]?3/27/2015 10:31 PMIf it is possible, it is not by using paste(), print() or cat().
-
348How to convert XMLNodes to XMLInternalNode() [r]?3/27/2015 11:05 PMThere do not appear to be conversion functions from the base XML package to accomplish this. One needs to understand that the XMLNode type and the XMLInternalNode() type are two fundamentally different representations of an XML tree. The first is a list of lists of lists of lists... as deep as an xml branch is long. XMLInternalNodes on the other hand are C representations of the same data. XMLNodes could be converted to the other but would require traversal and mapping of the entire list of lists into an XMLInternalNode. Otherwise, one is not compatible with the other just as much as tigers should have sex with gorillas.
-
349What are CDATA nodes?3/27/2015 11:08 PMCDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "
<![CDATA[
" and end with the string "]]>
":] -
350What is I() in [r]?3/27/2015 11:15 PMAsIs() - Change the class of an object to indicate that it should be treated ‘as is’.
-
351Can I use xmlTreeParse() to turn my EverNote HTML into xmlInternalNodes?3/27/2015 11:46 PMMaybe.
-
352Are there such things as the HTMLNode data type in the XML package [r]?3/27/2015 11:51 PMNo.
-
353What is the XML_PARSE_HUGE option in [r] and how do I define it?3/27/2015 11:54 PMhttp://stackoverflow.com/questions/17154308/parse-xml-files-1-megabyte-in-r
xmlTreeParse(i, useInternalNodes = TRUE, options = HUGE)
-
354Is there a more desirable format I should output my EverNote data in?3/28/2015 12:14 AMxml isn't an option for what I'm trying to do... nor is it available as an export option..ANSWER: No.
-
355If I can't go from xmlNode to xmlInternalNode can I go from xmlInternalNode to xmlNode [r]?3/28/2015 2:08 AMIrrelevant. Source problem solved.
-
356Why does xmlNode convert < to < in [r]?3/28/2015 2:20 AMSome characters, mainly <, >, & and " are reserved as they are part of the base XML syntax. As such, they cannot appear in the content of the nodes, otherwise parsing would fail. So they are replaced by entities, which are a sort of coding of these characters. For example, "<" is coded as "<", "&" is coded as "&", etc.
-
357How can one work around text conversion that occurs when adding text to a xmlNode [r]?3/28/2015 2:39 AMOne could replace all the occurrences of reserved words which for which it is known they should be taken literally after saving the xmlTree to file. This could only work, however, if one could avoid converting previously existing reserved word Entity Names. Avoidance would almost require converting preexisting entity names before [r] manipulation to an unlikely string... as in < maps to kjjkdpoes. Then one would change the... It's more complicated than it needs to be and I've grown tired of documenting the possibility...The better solution is to realize I already have [r] list access to every sliver of EverNote data I need and rather than trying to create [r] xmlNodes I should just manipulate such lists to grow the tree via no more than simple text concatenation... many of them.
-
358How to tell [r] everything I'm about to give it is a string?3/28/2015 3:22 AM
-
359How to see text formatting in notepad++?3/28/2015 3:42 AM
-
360How to create batch files in [r]?3/28/2015 3:57 AMTurns out [r] is not great (or known) for creating scripts. This is probably the closest thing I've seen to it being used to dynamically create scripts.All I really want to do is dynamically generate lots of text that just so happens to be HTML.
-
361How to concatenate a vector of strings in R?3/28/2015 4:35 AM
-
362How to return reserved entities in note titles [r]?3/28/2015 6:17 AMRecall how reserved words are translated to entities in xml. Several of my note titles have reserved words in them that need to be handled as entities.The current form:html.titles<-xpathApply(enHTML, "//h1", xmlValue)is allowing reserved words to be stored.The function call:html.titles<-xpathApply(enHTML, "//h1")returns the title names with reserved words translated to entities. It also, however, pads the title with a heading tag that needs to be removed<h1>What is the <code> tag? It looks like it can do cool things.</h1>
-
363How to convert XMLNodeSet to list of characters?3/28/2015 6:42 AMif html.titles is an XMLNodeSet then sapply(html.titles, function(x) saveXML(x,indent=FALSE,prefix=""))
-
364What is this and why is it in my autogen HTML?3/28/2015 6:55 AM
They always precede a tag.
UPDATE 1:Not sure why they appeared in autogen (converted) EverNote HTML. -
365Why am I getting question mark boxes in my autogen HTML?3/28/2015 7:07 AM
-
366How does spending 3 days on something I wanted done in 6 hours make me feel?3/28/2015 4:02 PM
-
367What is HTMLTreeparse [r]?3/28/2015 4:49 PMParses HTML trees
-
368What exactly does the 'useInternalNodes' property of xmlTreeParse accomplish?3/28/2015 4:53 PMuseInternalNodesa logical value indicating whether to call the converter functions with objects of class XMLInternalNode rather than XMLNode. This should make things faster as we do not convert the contents of the internal nodes to R explicit objects. Also, it allows one to access the parent and ancestor nodes. However, since the objects refer to volatile C-level objects, one cannot store these nodes for use in further computations within R. They “disappear” after the processing the XML document is completed. If this argument is TRUE and no handlers are provided, the return value is a reference to the internal C-level document pointer. This can be used to do post-processing via XPath expressions using getNodeSet. This is ignored when parsing an HTML document.
-
369Is there an easier way to modify my EverNote HTML document other than using [r]?3/28/2015 5:14 PMThere's always Python but I should prabably stick with the XML package in [r].
-
370What is the 'free' property in addChildren() [r XML Package]?3/28/2015 6:14 PMA logical value indicating whether to free the C-level memory associated with the child nodes that were removed. TRUE means to free that memory. This is only applicable for the internal nodes created with xmlTree and newXMLNode and related functions. It is necessary as automated garbage collection is tricky in this tree-based context spanning both R and C data structures and memory managers.
-
371How does removeChildren() work [r XML Package]?3/28/2015 6:17 PMRun this example:b = newXMLNode("bob",
namespace = c(r = "http://www.r-project.org",
omg = "http://www.omegahat.org"))
cat(saveXML(b), "\n")
addAttributes(b, a = 1, b = "xyz", "r:version" = "2.4.1", "omg:len" = 3)
cat(saveXML(b), "\n")
removeAttributes(b, "a", "r:version")
cat(saveXML(b), "\n")
removeAttributes(b, .attrs = names(xmlAttrs(b)))
addChildren(b, newXMLNode("el", "Red", "Blue", "Green",
attrs = c(lang ="en")))
k = lapply(letters, newXMLNode)
addChildren(b, kids = k)
cat(saveXML(b), "\n")
removeChildren(b, "a", "b", "c", "z")
# can mix numbers and names
removeChildren(b, 2, "e") # d and e
cat(saveXML(b), "\n")
i = xmlChildren(b)[[5]]
xmlName(i)
# have the identifiers
removeChildren(b, kids = c("m", "n", "q"))
x <- xmlNode("a",
xmlNode("b", "1"),
xmlNode("c", "1"),
"some basic text")
v = removeChildren(x, "b")
# remove c and b
v = removeChildren(x, "c", "b")
# remove the text and "c" leaving just b
v = removeChildren(x, 3, "c") -
372I have extra 'div' tags I need to remove - how do I this in [r]?3/28/2015 7:09 PMIt appears the div parent tags are a result of the xpathApply function.
So the options to remove them arehtml.notes<-xpathApply(enHTML, "//div[not(table[@bgcolor='#D4DDE5']) and not(ancestor::div)]")1. Rewrite the tag to select only the children - should be the easiest and uses the /* operatorhtml.notes<-xpathApply(enHTML, "//div[not(table[@bgcolor='#D4DDE5'] or ancestor::div)]")
2. Remove them later (because I already have properly formed div tags)3. Modify them so that they include the right attributes - ease is on par with 1. -
373How happy does solving a problem that stumped me for three days make me feel?3/28/2015 7:48 PM
-
374Why is my converted note 'What would a sentiment analysis of the Billboard Hot 100 say about american emotion' empty?3/28/2015 7:56 PM
-
375Why don't my converted notes have images?3/28/2015 8:00 PM
Probably because I'm using a base EverNote HTML export who's image folder I deleted...UPDATE 1:Because I've been copying and pasting HTML code from one file into another in a different folder...which has different subdirectories and can't reference the correct image files. -
376How to restore intra-notebook links in a web translated EverNote notebook[r]?3/28/2015 8:08 PMGiven:
All note anchors
html.tAnchors
All intra-notebook anchors
html.iLinks
All note Titles
html.titles
Solution:The index of every title in html.titles that corresponds to an intra-notebook link is the index of every element in html.tAnchors that equals an element in html.iLinksHash all note titles and note anchors
Map note title to every intra-notebook anchor
iLinkMap<-which(!is.na(match(html.tAnchors,gsub("#","",html.iLinks))))
Modify all intra-notebook anchors to reflect updated mapping (don't forget to add # to beginning of string names)
-
377How to modify XML tags in place using XML package [r]?3/28/2015 8:16 PM#Modify intra-notebook referencesiLinkMap<-which(!is.na(match(html.tAnchors,gsub("#","",html.iLinks))))
newILink<-paste0("#",html.titles[iLinkMap])
html.iLinksNodeSet
UPDATE 1:Modifying attributes is proving obnoxious. The alternative is to string replace them by gsub(). This too has it's difficulties. -
378What is xmlAttrs {XML} [r]?3/28/2015 8:26 PMPerhaps the answer to my previous questionxmlAttrs()
Usage
xmlAttrs(node, ...) 'xmlAttrs<-'(node, append = TRUE, suppressNamespaceWarning = getOption("suppressXMLNamespaceWarning", FALSE), value)
Arguments
node
The
XMLNode
object whose attributes are to be extracted.append
a logical value indicating whether to add the attributes in
value
to the existing attributes within the XML node, or to replace the set of any existing attributes with this new set, i.e. remove the existing ones and then set the attributes with the contents ofvalue
....
additional arguments for the specific methods. For XML internal nodes, these are
addNamespacePrefix
andaddNamespaceURLs
. These are both logical values and indicate whether to prepend the name of the attribute with the namespace prefix and also whether to return the namespace prefix and URL as a vector in thenamespaces
attribute.value
a named character vector giving the new attributes to be added to the node.
suppressNamespaceWarning
seeaddChildren
-
379Is there such a thing as xmlSetAttr{XML} [r]?3/28/2015 8:35 PMNope - need to set it via other method.UPDATE 1I believe one can set attributes with xmlAttr
-
380Why isn't match(html.tAnchors,html.iLinks) producing meaningful results in [r]?3/28/2015 9:42 PMhtml.tAnchors and html.iLinks did not match because the later has pound symbols in it. Need to remove the pound symbol
-
381How to add hashtag to titles in [r]?3/28/2015 9:56 PMpaste0("#",html.titles[iLinkMap])
-
382If I can't change node attribute values, can I remove them and add new ones?3/28/2015 10:32 PMYes although one needs to be cautious as to the data types for which doing the following is valid... Using
removeAttributes(node, ..., .attrs = NULL, .namespace = FALSE, .all = (length(list(...)) + length(.attrs)) == 0)
UPDATE 1:removeAttributes(d) xmlAttrs(d) <- c(name = "Motor Trend fuel consumption data", author = "Motor Trends")
It is definitely possible to add node attributes. Refer to addAttributes{XML} within addChildren documentationUPDATE 2:I finally did it! Check out the solution for adding attributes to a [r] nodeset. -
383Why are the number of intra-notebook links different from the number of mapped intra-notebook links?3/28/2015 11:07 PM> duplicated(html.iLinks)
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE -
384How to resize html.iLinks such that it's values correspond to newILink?3/28/2015 11:24 PMhtml.iLinks[which(!duplicated(html.iLinks))]
-
385Why isn't my gsub() function to modify node attribute values working?3/28/2015 11:45 PMSeveral reasons:
The hashtag character needs to be escaped
Don't forget to assign the output of a function to a variable, else nothing useful happens.
-
386How does Matthew Inman copyright his website?3/29/2015 1:53 AM
-
387Should I copyright my website?3/29/2015 12:12 PMI guess. I would like to have some of my code to be free and open however it may be a good thing to have the website structure and (other) content I post to retain its belonging to me such that it ought be attributed if shared anywhere off my site by someone other than myself.
-
388How to integrate dygraph into website?3/29/2015 12:22 PM
Someone needed to do something like this<script type="text/javascript" charset="UTF-8"> $(function() { var plotdiv = document.getElementById("plot1"); new Dygraph(plotdiv,csvdata,{}); plotdiv = document.getElementById("plot2"); new Dygraph(plotdiv,csvdata, {}); return true; })</script>Dygraphs generates graphs from CSV files. The dygraphs library parses this data (including column headers), resizes its container to a reasonable default, calculates appropriate axis ranges and tick marks and draws the graph.
In most applications, it makes more sense to include a CSV file instead. If the second parameter to the constructor doesn't contain a newline, it will be interpreted as the path to a CSV file. The Dygraph will perform an XMLHttpRequest to retrieve this file and display the data when it becomes available. Make sure your CSV file is readable and serving from a place that understands XMLHttpRequest's! In particular, you cannot specify a CSV file using"file:///"
A startup guide is available at http://dygraphs.com/tutorial.html -
389What is bower?3/29/2015 12:26 PMBower is a package manager for Javascript libraries that allows you to define, version, and retrieve your dependencies.
-
390Who is John Lindquist?3/29/2015 12:32 PMI guy who makes to the point videos tutorials on building websites on Egghead.io
-
391How to install Bower?3/29/2015 12:36 PMBower requires Node and npm and Git.
-
392What is node.js?3/29/2015 12:40 PMAs an asynchronous event driven framework, Node.js is designed to build scalable network applications. In the following "hello world" example, many connections can be handled concurrently. Upon each connection the callback is fired, but if there is no work to be done Node is sleeping.
var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello World\n');}).listen(1337, "127.0.0.1"); console.log('Server running at http://127.0.0.1:1337/');
This is in contrast to today's more common concurrency model where OS threads are employed. Thread-based networking is relatively inefficient and very difficult to use. Furthermore, users of Node are free from worries of dead-locking the process—there are no locks. Almost no function in Node directly performs I/O, so the process never blocks. Because nothing blocks, less-than-expert programmers are able to develop scalable systems. -
393Why isn't the command 'npm install -g bower' working?3/29/2015 12:47 PMNeed to run command from cmd... not node.js shell
-
394Why am I getting cmd error 'bower ENOGIT git is not installed or not in the PATH'?3/29/2015 12:50 PM
-
395How to make an EverNote cumulative note sum DyGraph?3/29/2015 2:07 PMAs observed in this note, it is desirable to display EverNote note creation as a cumulative sum.
-
396Can one alter the original creation date of an EverNote note?3/29/2015 2:29 PMThe reason physical logbooks are preffered for professional recording is that they are difficult to tamper with. Such makes them useful as admissible evidence in a court of law if one ever needed to defend actions they took or make similar proofs.So how does EverNote stack up to the tampering of someone who wanted to alter the way information was stored in EverNote?
At first glance changing the creation date of a note is, as it pertains to vandalism, discouragingly easy. One can simply click on the 'Created' field above every note and change the date. This will reorder the display of said note within the client. I did this once (I promise only once) to reorder my project motivation note so that when I shared it with potential admissions teams, the flow of my thoughts and introduction to my project made more sense.
When creating the cumulative sum note creation graph (which should be at the top of the website this would be read on) I found that the order of the note did not change according to the order that it's (modified) date would have put it in. Thus, it appears, that EverNote only changes the visual order of a note within a client based on date but does note restructure the note entry database to reflect such a change. This makes intuitive developer sense - it's how most file management systems work.
Exported, however, this is merely HTML which can be restructured in any text editor to reflect any order the creator wants. Thus, such a representation of EverNote data should not be esteemed for evidence to the degree of a physical logbook for tamper resistance.
It may be increasingly more difficult to change the entries within the EverNote database itself however I do not doubt it can be reordered by anyone with average database management skills.
Disclaimer as this information pertains to this projectAfter recognizing the confusion that trying to reorder my project motivation note may cause (making it first would clash with the dygraph) I modified its date a second time to return it to its original place in the note date creation scheme. -
397How to plot dates in [r]?3/29/2015 3:12 PMfDates<-strptime(html.Dates, "%m/%d/%Y %I:%M %p")plot(fDates,1:length(fDates))Behold hideous formating!
-
398How does as.Date{zoo} work [r]?3/29/2015 3:17 PM
-
399How to convert date and time in [r]?3/29/2015 3:20 PM
fDates<-strptime(html.Dates, "%m/%d/%Y %I:%M %p") -
400What's the difference between "POSIXlt" and "POSIXct" [r]?3/29/2015 3:24 PM
Not sure I need to care
UPDATE 1:There are two basic classes of date/times. Class
"POSIXct"
represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. Class"POSIXlt"
is a named list of vectors representingsec
0–61: seconds.
min
0–59: minutes.
hour
0–23: hours.
mday
1–31: day of the month
mon
0–11: months after the first of the year.
year
years since 1900.
wday
0–6 day of the week, starting on Sunday.
yday
0–365: day of the year.
isdst
Daylight Saving Time flag. Positive if in force, zero if not, negative if unknown.
zone
(Optional.) The abbreviation for the time zone in force at that time:""
if unknown (but""
might also be used for UTC).
gmtoff
(Optional.) The offset in seconds from GMT: positive values are East of the meridian. UsuallyNA
if unknown, but0
could mean unknown.
-
401How to add column to POSIXct?3/29/2015 3:42 PMacum<-data.frame(dates=fDates, cumsum=1:length(fDates))
-
402How to fill space under line in dygraph?3/29/2015 3:45 PMThe
fillGraph
option specifies that y values should be filled verticallydygraph(nhtemp, main = "New Haven Temperatures") %>% dyAxis("y", label = "Temp (F)", valueRange = c(40, 60)) %>% dyOptions(axisLineWidth = 1.5, fillGraph = TRUE, drawGrid = FALSE)
-
403What data type/format of [r] data is dygraph expecting?3/29/2015 3:57 PMDepends on the type of graph - take for instance the Deaths from Lung Disease (UK) examplelungDeaths <- cbind(ldeaths, mdeaths, fdeaths)lungDeaths is an mts type.
UPDATE 1:Refer to the dygraphs Data Format section of the dygraph manual.
There are five types of input that dygraphs will accept:CSV data
URL
array (native format)
function
DataTable
UPDATE 2:Per the dygraphs for R Home Page dygraphs automatically plots xts time series objects (or any object convertible to xts) -
404How to convert data.frame to mts [r]?3/29/2015 3:59 PMDon't care - coercing my time series into a mts is not desired
-
405How to create mts in [r]?3/29/2015 4:11 PMmts is a time series object. Unfortunately, this form of time series does not support my particular time series application since it is used to hold data observations which have been uniformally sampled. At any rate, objects like this are produced by ts {stats} (I think)
-
406My data.frame with posixlt data isn't converting to an xts correctly - why?3/29/2015 5:07 PMPart of the answer my be because there is not a 1:1 correlation between notes and note creation dates (because having minute resolution data means that if two notes where created in the same minute then a plot of said data would fail the vertical line test).
-
407How to add column to xts [r]?3/29/2015 5:12 PM
-
408Why does xts(1:length(html.Dates),order.by = fDates) return a time series who has more values than timestamps?3/29/2015 5:28 PMThere is not a 1:1 correlation between notes and note creation dates because a single time stamp can correspond to multiple notes since it's only in minute resolution.
Hence each of the following questions was asked in the same minute.html.Dates[which(duplicated(html.Dates))][[1]][1] "2/17/2015 1:13 PM"
[[2]][1] "3/11/2015 6:11 PM"
[[3]][1] "3/21/2015 12:13 PM"
[[4]][1] "3/21/2015 8:56 PM"
[[5]][1] "3/25/2015 10:27 PM"
[[6]][1] "3/26/2015 8:37 PM"
The offset between the number of unique time stamps and note values is 2 meaning that, per the above output, there were two instances during this project where I asked two questions in the same minute and one instance in which I asked three questions in the same minute. Sheesh
If this is a problem one path to remediation is to assign a bogus yet incrementally correct SECOND value to duplicated time stamps.
UPDATE 1:Clearly I was blind when I wrote the things I did above. Obviously each of those timestamps listed are different.
ANSWER:I DONT KNOOOW
UPDATE 2:Perhaps fears of the Y2K Crash weren't entirely crazy - friggin' time changes screwed me up. Below is an excerpt of timestamps converted to POSIXlt. Note how CST turns to nothing then to CDT.
Solution:Force GMT absolute time referencing by changing fDates<-strptime(html.Dates, "%m/%d/%Y %I:%M %p") to fDates<-strptime(html.Dates, "%m/%d/%Y %I:%M %p",tz = "GMT") -
409How to fix xts duplicate timestamps without loosing data in [r]?3/29/2015 5:42 PMTurns out there is a function to handle this in the xts package called make.time.unique()
-
410How to change rStudio format so that one can see microsecond time output?3/29/2015 7:35 PM
-
411How does make.index.unique(x) work [r]?3/29/2015 7:35 PMds <- options(digits.secs=6) # so we can see the changex <- xts(1:10, as.POSIXct("2011-01-21") + c(1,1,1,2:8)/1e3)xmake.index.unique(x)options(ds)
-
412How to modify column of xts data type [r]?3/29/2015 8:23 PMIf a is of type xts then one can modify its first column using:
a[,1]<-1:length(html.Dates)
-
413How to add range selector functionality to dygraph [r]?3/29/2015 8:24 PM
Per the Range Selector documentation add the highlighted argument below
dygraph(nhtemp, main = "New Haven Temperatures") %>%
dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))
-
414What are some of the links that helped solve my dates problem that went undocumented?3/29/2015 8:29 PM
-
415How to extract the first date from a XTS object in [r]?3/29/2015 8:37 PMIf a is your xts object then min(index(a))
-
416How to specify additional properties to a dygraph object [r]?3/29/2015 8:40 PMUse the %>% operator like so:dygraph(a, main = "Numer of Questions I Asked to Learn How To Effectively Communicate my Research") %>%
dyAxis("y", label = "Questions Asked") %>%
dyOptions(axisLineWidth = 1.5, fillGraph = TRUE, drawGrid = TRUE) %>%
dyRangeSelector(dateWindow = c(min(index(a)), max(index(a)))) -
417How to change name of XTS object header [r]?3/29/2015 8:43 PM
names(a)[1]<-"changed name"
-
418How to export/embed a R dygraph to HTML?3/29/2015 8:55 PM
@Jonathan's recommendation at the SO post above causes my lower page content to disappear if it is used within the header above said content.Use an iFrame
UPDATE 1:Solution:<iframe width="100%" height="480" frameborder="0" seamless="seamless" scrolling="no" src="path/to/dygraph.html" ></iframe> -
419How to embed an html file into another html?3/29/2015 10:04 PM
-
420How to specify div dimensions?3/29/2015 10:09 PM
#your_div_id { width: 855px; margin:0 auto;}
{ height: 100px; }
-
421How to make a temporary "site under construction" countdown page?3/31/2015 11:32 PMOne can use free templates available online, I chose this jQuery Countdown and the following HTML<!DOCTYPE>
<html>
<head>
<title>jquery-countdown plugin test</title>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
<script src="js/jquery.countdown.js"></script>
<link href="css/media.css" rel="stylesheet" type="text/css" />
<script>
$(function(){
$(".digits").countdown({
image: "img/digits.png",
format: "dd:hh:mm:ss",
startTime: "06:23:25:14"
});
});
</script>
</head>
<body>
<div class="wrapper">
<div class="cell">
<div id="holder">
<div class="digits"></div>
</div>
</div>
</div>
</body>
</html> -
422What is a PSD file?3/31/2015 11:49 PMA .PSD file is a layered image file used in Adobe PhotoShop. PSD, which stands for Photoshop Document, is the default format that Photoshop uses for saving data. PSD is a proprietary file that allows the user to work with the images’ individual layers even after the file has been saved.
-
423How to separate countdown digits in countdown template?4/1/2015 12:05 AMUse the following to format the counter here$(function(){
$(".digits").countdown({
image: "img/digits.png",
format: "dd:hh:mm:ss",
startTime: "06:23:25:14"
});
}); -
424How to get a div to fit to contents?4/1/2015 1:03 AM
-
425How to see CSS element borders?4/1/2015 1:32 AMSet the div colorbackground-color: #b0e0e6;
-
426Can CSS relative position property values be negative?4/1/2015 1:52 AMYes
-
427How to make background fit size of browser?4/1/2015 2:35 AM
-
428It's April 1st and I really need to go live - what's my website missing?4/1/2015 10:11 PM
Finish formatting AutoGen EverNote HTML
Review notes for mistakes/missing content
Make Home page "Full Billboard" dygraph
Make annual average uniqueness dygraph
Link pages together
Setup Github repo for Share page
Create 'Bag-o-words' download
Make "Motivation" Page
Post "This is an intermediate result - do not consider final" disclaimer on analysis
Perhaps post a countdown to when I believe I will be able to post the final results
-
429Why is an iFrame for a dygraph causing after page content to disappear?4/1/2015 10:53 PMThe iFrame used by @Jonathan in this StackOverflow post causes down page content to disappear. Turns out the problem with his iFrame was poor formating (in ways uncertain). The proper way to format an iFrame for use with dygraphs is documented here.
-
430Is my iFrame content disappearing because the encapsulated content has it's own header tag?4/1/2015 10:54 PM
-
431Will plot.ly's iFrame integration suggestion work for properly embedding an dygraph?4/1/2015 11:08 PMDude YES! Check out the answer in this other note
-
432Do I have to put my iFrame in a div for it to work correctly?4/1/2015 11:22 PMNope
-
433How to center an iFrame?4/1/2015 11:27 PMAlign="center" doesn't do the job on its own. Need something else.
-
434How to substitute numbers in place of question icons in MaterializeCSScollapsible list?4/1/2015 11:37 PMA first attempt is of the form:
This, however, displays the number '1' in a font much too large. Need to investigate how to override the external CSS (or modify tags and generate custom CSS) to get the right look.<div class="collapsible-header"><i class="mdi-navigation">1</i>Third</div>UPDATE 1:Per the awesome advice of @DogfaloHTML<span class="numbering" >22</span>CSS.numbering {
text-align: center;
margin-right: 1rem;
width: 2rem;
display: inline-block;
} -
435How to add attributes to tags selected by xpath?4/2/2015 12:42 AM
Using addAttributes() as noted in this update
UPDATE 1:Here's a solution
The solution above was made open source on StackOverflow herelibrary(XML)
## download the webpage
kbbHTML <- readLines("http://www.kbb.com/used-cars/honda/accord/2014/private-party-value")
## parse the downloaded document to an XMLInternalDocument
kbbInternalTree <- htmlTreeParse(kbbHTML,useInternalNodes=T)
#kbbInternalTree <- htmlParse(kbbHTML, asText = TRUE) #equally valid parsed content as above
## select nodes matching our XPath expression
specific.nodes <- getNodeSet(doc = kbbInternalTree, path ="//a[contains(@href,'/honda/accord/')]")
sapply(specific.nodes, function(x) xmlAttrs(x)<-c(Fig_Vodka="Don't mind if I do")) -
436What is PixelPerfect?4/2/2015 10:07 PMhttps://hacks.mozilla.org/2015/03/pixel-perfect-2-extension-for-firefox-developer-tools/
-
437What are XPath functions?4/2/2015 10:09 PMFunctions that can be performed via XPath. Here's a list of functions for XPath 2.0
-
438Can one pass internalNodes pointers to the addAttributes function [r,XML]?4/3/2015 10:01 AMNo longer needed answered on account of original problem has been solved.
-
439What is catalogLoad{XML}?4/3/2015 10:10 AM
-
440What is the sxslt package [r]?4/3/2015 10:15 AMPer reference in newXMLDoc {XML} the sxslt package
-
441Can the sxslt package be used to modify node attributes [r]?4/3/2015 10:24 AMProbably but it's minimally documented.
-
442Who wrote the book "XML and Web Technologies for Data Sciences with R"?4/3/2015 10:30 AMDeborah Nolan and Duncan Temple LangTurns out there's a section in this book that shows, in plain English, demonstrates how to change the attributes of a node. The result of such knowledge is put to work in this example.
-
443Would this project's workflow have been possible without Duncan Temple Lang?4/3/2015 10:33 AMNo - he developed the [r] packages RCurl and XML which allowed me to do 80% of my work. Standing on the shoulders of giants.
-
444Can one not be a master of {RCurl} or {XML} and be a good Data Scientist?4/3/2015 10:38 AMIf working in [r], probably not.
-
445Can {SVGAnnotation} be used to add attributes to XML nodes?4/3/2015 11:28 AMA reference to {SVGAnnotation} suggests a way of adding attributes to SVG nodes who may be reconstructed to do the same for XML nodes.UPDATE 1:Forget doing it with SVGAnnotation - the solution for adding attributes to nodes is here
-
446How to view [r] function source code?4/3/2015 12:46 PM
-
447What is a workflow for automatically specifying which converted notes will appear expanded in my collapsible list?4/3/2015 1:20 PMOne can tag notes in Evernote.By reserving a special word in one's tagging workflow he or she can, from within EverNote, select which notes are desired to be expanded by default.One would then label all notes of interest, for example, by the tag 'ACTIVE'.One would export the notebook with tags using EverNote optionsA [r] routine responsible for identifying notes with the 'ACTIVE' tag would add the
active
class to the collapsible-header. -
448I tried setting several MaterializeCSS collapsible-headers to active - can only one be active at a time?4/3/2015 1:38 PMAs is only one header can be active at a time with the accordion class. Instead, use
<ul class="collapsible" data-collapsible="expandable">
-
449How to make a pop-out collapsible [MaterializeCSS]?4/3/2015 1:54 PM
<ul class="collapsible popout" data-collapsible="accordion">
-
450Why isn't the collapsible pop-out class working?4/3/2015 2:06 PMNot sure.
-
451Can I make this complicated Dygraph?4/3/2015 3:59 PMIt is desirable to create a graph that contains:x-axis with rank and respective year labelsVisual divisions between years or decades of the x-axisMay not be supported at this timex-axis scroll bary-axis with 'uniqueness' ratingHyperlinked data points for each song to it's youtube or other video
-
452How to make scrollable annotations in dygraph?4/3/2015 4:22 PM
-
453How to make per series highlighting in dygraph?4/3/2015 4:27 PM
-
454How to create a dygraph with multiple x-axis labels?4/3/2015 7:06 PM
I'm working with a lot of rank data that would benefit from a way to simultaneously display its respective year on the x-axis. For example, I want to create the following graph adapted from the dygraph gallery:
Note how the rank information (red arrow) for a particular weekend (green arrow) are both displayed on the x axis.
I know this might not be possible with dygraphs now, at least it wasn't available in these demos, so I guess my follow up question would be are there any plans to make this possible (how about in the [r] {dygraph} package)? Apparently a plotter called flot can do this -
455Are [r] produced dygraphs able to make anything other than time series graphs?4/4/2015 2:21 PMCurrently, per documentation below, no:poop.
-
456What are alternatives for plotting non-time series Billboard rank data?4/4/2015 2:27 PMCan still use Dygraphs though will have to use the more mature, non-[r] specific, direct to web schema and make the dygraph in javascript/HTML by first exporting my data to CSV.
-
457Can I raise awareness about my feature request for multiple x-axis labels in dygraphs?4/4/2015 2:42 PMAbsolutely - I've posted a link to my SO post on the [r] dygraph development teams website.
-
458Did I miss something important in Rhett Allain's plot.ly?4/4/2015 3:14 PMYes.
The focus of this prior note was a graph made by Rhett Allain. It turns out that searching his name along with his plot.ly graph's title leads to his Why Are Songs on the Radio About the Same Length? WIRED article which links to an awesome repository of music information called the MusicBrainz Database. Information contained therein will supplement my analysis well. It doesn't appear to focus on providing lyric data, if any at all. -
459My SO post might be in danger of someone changing it - should I make a copy of it?4/4/2015 3:42 PMYES.I'm working with a lot of rank data that would benefit from a way to simultaneously display its respective year on the x-axis. For example, I want to create the following graph [adapted from the dygraph gallery](http://dygraphs.com/gallery/#g/highlighted-weekends):![enter image description here][1]Note how the rank information (red arrow) for a particular weekend (green arrow) are both displayed on the x axis.I know this might not be possible with dygraphs now, at least it wasn't available in [these demos](http://dygraphs.com/tests/), so I guess my follow up question would be are there any plans to make this possible (how about in the [r] {dygraph} package)? Apparently a plotter called [flot](http://sohu.io/questions/1144590/multiple-x-axis-tick-sets-in-flot) can do this.**UPDATE 1**If indeed this feature does not exist yet, then the following, although potentially obvious to Dygraph developers, is a thought for accomplishing the task *easily* (perhaps I'm wrong). At first I thought it would be necessary to provide input data of the form shown in Table A![enter image description here][2]However such input is a major deviation from the existing Dygraph parser model who expects one abscissa. Such suggests that a modification to the parser to accept a "Dual Label" option, requiring that both labels be contained in a single abscissa element as in Table B, would be easier. Thereafter, with the option specified, the parser would manage CSV as it usually would with the exception that it is now "bin cognizant" and detects division between labels 1 and 2 by use of an acceptable delimiter (in this case a single quotation mark - maybe not the best choice) and divisions between label 1 abscissa elements by name change. Behind the scenes each point gets its unique x coordinate and the "Dual Label" option causes the dygraph to visually scoot up a couple pixels to accommodate an extra label. Not sure how to handle full zoomed scrolling put simply leaving a label 1 element centered until an adjacent label 1 element comes on screen is an option.`Dygraphs rule!`[1]: http://i.stack.imgur.com/89boh.png[2]: http://i.stack.imgur.com/CLLVl.png
-
460How to correlate which EverNote note is tagged as being 'ACTIVE' [r]?4/4/2015 6:40 PMAs mentioned in How will I convert EverNote Notebooks into beautiful webpages?, note title's, note creation date/tags, and note contents are exported as node siblings. Therefor I do not believe I will be able to use xpath to identify tagged notes in a way that could re-associate them to their respective note at time of conversion (unless this note suggests otherwise).
Instead, a boolean matrix set TRUE for any ordered occurrence of the tag 'ACTIVE' would work. In such a matrix each element would represent the position of a note within an original EverNote HTML export.Given:Every note tag is included in the <table> tag responsible for being a parent to creation date information, i.e:
Solution:<hr>
<a name="801"/>
<h1>Should I do bla?</h1>
<div>
<table bgcolor="#D4DDE5" border="0">
<tr><td><b>Created:</b></td><td><i>3/7/2015 3:54 PM</i></td></tr>
<tr><td><b>Tags:</b></td><td><i>ACTIVE, Book, String</i></td></tr>
</table>
</div>How to select only note tags whos value is 'ACTIVE' in [r]? and Is it possible to select nodes in XPath via 'sibling' conditions? suggest that the preceding-sibling AxisName could be used to access the note anchor in relation to every occurrence of an 'ACTIVE' tag.
UPDATE 1: Modifying the solution to How to select only note divs whos descendant includes the string 'ACTIVE' in [r]? produces this question's solution
html.active<-xpathApply(enHTML, "//div/descendant::i[contains(.,'ACTIVE')]/preceding::a[1]",xmlGetAttr, "name")
XPath is awesome and powerful indeed! -
461Why is my [r] routine for converting EverNote exports returning more 'dates' field than it should?4/4/2015 7:09 PMInclusion of tags in EverNote HTML export has caused my xpath filter for dates to break:html.Dates<-xpathApply(enHTML, "//table[@bgcolor]/tr/td/i", xmlValue)Given:EverNote HTML format,
<table bgcolor="#D4DDE5" border="0">
<tr><td><b>Created:</b></td><td><i>2/10/2015 7:42 PM</i></td></tr>
<tr><td><b>Tags:</b></td><td><i>ACTIVE</i></td></tr>
</table>Solution:I suppose I want the value of the <i> child of any <table> who's child<tr><td><b> value is 'Created'Per suggestion of XPath Examples, the above is overly complicated and html.Dates can be correctly located using:html.Dates<-xpathApply(enHTML, "//table[@bgcolor]/tr[1]/td/i", xmlValue)
-
462How to correctly use the xpath function 'contains' in [r]?4/4/2015 7:22 PMfn:contains(string1,string2)Returns true if string1 contains string2, otherwise it returns false
Example: contains('XML','XM')
Result: true -
463How to xpath according to descendant [r]?4/4/2015 7:37 PMI know that ancestor::div locates tags that have an ancestor div don't know how tags with a specific child (perhaps a div) are found.ANSWER:
-
464Is it possible to select nodes in XPath via 'sibling' conditions?4/4/2015 8:21 PMYes, siblings are specified in XPath Axes by the following-sibling or preceding-sibling AxisName
-
465How to select only note tags who's value is 'ACTIVE' in [r]?4/4/2015 8:36 PMhtml.active<-xpathApply(enHTML, "//table[@bgcolor]/tr[2]/td/i[contains(.,'ACTIVE')]", xmlValue)
-
466How to select only note divs who's descendant <t> includes the string 'ACTIVE' in [r]?4/4/2015 8:56 PMhtml<-xpathApply(enHTML, "//div/descendant::i[contains(.,'ACTIVE')]")
-
467Is it possible to Left Align and Right Align Text on the same line?4/4/2015 9:35 PMIt is desirable to have the 'Question asked date' tag inline yet separated from collapsible question headers. Left Align and Right Align Text on the Same Line has the right idea... but poor execution.
ANSWER:Instead refer to this much more current Align text to the left and right on the same line with CSS
<span style="float:right">pending </span> -
468Why is adding '<span style="float:right">pending </span>' causing my notes to not open?4/4/2015 10:16 PMThe problem was I wasn't running the entire script and an old variable was causing note content to be ignored... or something to that effect.
-
469Why doesn't the <span class="numbering" >22</span> by @Dogfalo work on local tests?4/5/2015 12:19 AMIn How to substitute numbers in place of question icons in MaterializeCSScollapsible list? @Dogfalo's tip to format collapsible numbers works in this fiddle but not in local tests.
Perhaps my version of Materialize needs updating... Update didn't helpUPDATE 1:I'm a dummy and didn't include the CSS styling from the above fiddle. -
470How to create a dygraph in HTML/javascript?4/5/2015 10:56 AMIn addition to requiring hardcoding of ones data set into HTML or fetching it from a csv as in How to integrate dygraph into website?, dygraphs require javascript.UPDATE 1:<html><head><script type="text/javascript"src="dygraph-combined-dev.js"></script></head><body><div id="graphdiv2"style="width:500px; height:300px;"></div><script type="text/javascript">g2 = new Dygraph(document.getElementById("graphdiv2"),"temperatures.csv", // path to CSV file{} // options);</script></body></html>
-
471How to draw a dygraph scatter plot?4/5/2015 11:31 AM
drawPoints: true, strokeWidth: 0.0
-
472How to specify 'maximum zoom in level' [dygraphs]?4/5/2015 11:35 AMBeing able to specify maximum zoom or being able to force x-axis tick labels (granularity) would both be adequate for disallowing graph users to 'zoom' out of the graphs focus.UPDATE 1:This example suggests a means to force x-axis labels.
-
473Are dygraph height and width options specified in js or HTML?4/5/2015 12:16 PM
-
474How do I need to improve my current Billboard Hot 100 Dygraph?4/5/2015 12:46 PMAdd labelsIncrease plot contrastLimit maximum zoom in (or achieve similar functionality such that x-axis number labels do not display decimals)Highlight decadesPad left and right of graph to better view boundary data
-
475Should I represent Billboard rank information differently?4/5/2015 1:06 PMThe current model represents the data as a scatter plot however it may be better to represent it as series plot, even though such a relation could be misleading (there isn't much of a relationship between rank 1 of the year 1992 and rank 2 of the year 1993 other than ... well... rank. That is to say any number of variables could influence the rank of an artists' song and 100% relation such as the one suggested by a series plot is simply not true). Even so, visualizing this rank data by series is useful in several ways. Visualizing it as such could reveal an interesting trend not previously observable. At the very least series dygraphs are well supported and tell a more compelling story than simple scatter plots currently producible (per limitations discussed in How to create a dyraph with multiple x-axis labels?).ANSWER: YesUPDATE 1:Perhaps it would interesting to allow the user to 'turn OFF' series visualization in my dygraph.
-
476How to translate Billboard Hot 100 Data from Scatter to Series Plot format[r]?4/5/2015 4:02 PMNeed to handle omission of instrumental or incomprehensibly foreign data points... mark by NA as highlighted belowyears<-tail(master$Year,1)-master$Year[1]+1
series<-matrix(NA, nrow = years, ncol = 100, byrow = TRUE,dimnames=list(1958:2014,1:100))
for(i in 0:(years-1)){
#print(paste0("i ",i))
for(j in 1:100){
#print(paste0("j ",j))
index<-100*i+j
print(paste0("index ",index))
if(master$Instrumental[index]==FALSE&master$I.Foreign[index]==FALSE){
series[i+1,j]<-master$uCount[index]
} else {
series[i+1,j]<-"NA"
}
}
} -
477How to handle missing series information in dygraphs?4/5/2015 4:21 PMMay be interested in following formatting options:
-
478Why is my translation from scatter to series plot missing 100 values?4/5/2015 4:54 PMNoob mistake, [r] matrices are not zero based. The following...
...should have been:years<-tail(master$Year,1)-master$Year[1]+1
series<-matrix(NA, nrow = years, ncol = 100, byrow = TRUE,dimnames=list(1958:2014,1:100))
for(i in 0:(years-1)){
#print(paste0("i ",i))
for(j in 1:100){
#print(paste0("j ",j))
index<-100*i+j
print(paste0("index ",index))
if(master$Instrumental[index]==FALSE&master$I.Foreign[index]==FALSE){
series[i,j]<-master$uCount[index]
} else {
series[i,j]<-"NA"
}
}
}years<-tail(master$Year,1)-master$Year[1]+1
series<-matrix(NA, nrow = years, ncol = 100, byrow = TRUE,dimnames=list(1958:2014,1:100))
for(i in 0:(years-1)){
#print(paste0("i ",i))
for(j in 1:100){
#print(paste0("j ",j))
index<-100*i+j
print(paste0("index ",index))
if(master$Instrumental[index]==FALSE&master$I.Foreign[index]==FALSE){
series[i+1,j]<-master$uCount[index]
} else {
series[i+1,j]<-"NA"
}
}
} -
479Why aren't my series showing up connected?4/5/2015 5:29 PMI forgot to restore my strokeWidth to 1.0 (instead of zero).
-
480Series plot is too busy - can it be simplified?4/5/2015 5:45 PMResting on the idea that a series plot is not completely inappropriate for this data set, I think it's possible and even recommended to make transform:into a scatter/series hybrid plot in which only the closest data point's series is made visible and all others remain hidden and are otherwise mere scatter representations of themselves. This seems quite doable by modification of code found at this FiddleUPDATE 1:Something that nearly accomplishes the above is available here
-
481Why does my scatter-series hybrid dygraph not clear highlighted series when others are selected?4/5/2015 8:00 PMNot sure but it is present in dygraph examples when one zooms in then resets the zoom.
-
482What is the dygraph way of deselecting a highlighted graph?4/5/2015 8:09 PM
-
483How to make all series the same color [dygraph]?4/5/2015 8:38 PMMight have to force colors by specifying each one
-
484How to set dygraph labels?4/5/2015 8:57 PMSet it in the options section:{ // options
title: 'Unique Word Count of Every* Song in the Billboard Hot 100 Since 1958',
titleHeight: 32,
ylabel: 'Unique Word Count',
xlabel: 'Year',
labelsDivStyles: {
'text-align': 'right',
'background': 'none'
}, -
485Why isn't the MaterializeCSS tab working?4/5/2015 10:14 PMIt's not meant to be used as a navigation bar. Instead, use navbar
-
486Why isn't the MaterializeCSS modal working?4/5/2015 10:59 PMThe js initialization:$(document).ready(function(){
// the "href" attribute of .modal-trigger must specify the modal ID that wants to be triggered
$('.modal-trigger').leanModal();});needs to occur after loading jquery...duh! -
487How many notes ought to be made 'ACTIVE'?4/6/2015 12:42 AM~1/8 the total note count.
-
488How to present MaterializeCSS Modal on page load?4/6/2015 1:51 AMInclude this js: $('#modal'). openModal();
-
489Why does calling a question on my 'Research' page via the # operator cause the page to load without CSS styling?4/6/2015 2:26 AMCalling <a href="research.html/#1280" data-click="true">Question 222</a> from index.html causes CSS styling to disappear
-
490Why isn't the MaterializeCSS navbar reacting to changes in browser size?4/6/2015 2:58 AM
-
491How to upload multiple files to asmallorange at the same time?4/6/2015 3:59 AMUse 3rd party FTP client - download FileZilla.Connect using:
The hostname/server/domain you wish to connect to (usually either your domain name or the IP of the server)
Username
Password
Target directory (optional, if you want to open up your website's public directory, use "public_ HTML/" -- that's where all content that is visible to public is)
-
492How to spell check an entire notebook in EverNote?4/7/2015 1:39 AMThis doesn't seem to be possible - need to do it note by note.
-
493How to spell check EverNote note titles?4/7/2015 8:14 PMSpell check doesn't appear to work on titles. Titles can be semi-programmatically checked by exporting R note titles from EverNote converter to MSword or similar, checked therein and manually corrected in EverNote.This is lame.
-
494How to change MaterializeCSS stock color?4/7/2015 9:14 PM
<div class="nav-wrapper light-green darken-2">
-
495What is scss?4/7/2015 9:38 PM
-
496How to call Sass code?4/7/2015 9:41 PM
Before you can use Sass, you need to set it up on your project. If you want to just browse here, go ahead, but we recommend you go install Sass first. Go here if you want to learn how to get everything setup.
-
497Is it possible to bold text from CSS?4/7/2015 9:57 PMYes. This makes much more sense than using HTML's <b> in many scenarios.p.normal {
font-weight: normal;
}
p.thick {
font-weight: bold;
}
p.thicker {
font-weight: 900;
} -
498Why is the dygraph question 'cumulative sum' graph behind by 5 hours?4/7/2015 10:04 PMIt is set to GMT. Typically [r] will adjust an XTS object to display it's time in local time zone format however this requires [r]'s time variable is set. Do so by Sys.setenv(TZ="America/Chicago")UPDATE 1:It's using the web server's time zone to translate and report time.
-
499How to re-format xts data to ignore (not show) seconds [r]?4/7/2015 10:41 PMNot sure. It is possible to drop time data with the drop.time argument but there is not ability to specify dropping of seconds, minutes, days or other time increment.
-
500What is the Watusi?4/8/2015 6:47 PMFrom the song Party Lights by Claudine Clark, the Watusi is a solo dance that enjoyed brief popularity during the early 1960s.
-
501How to extend correct statistics to duplicate song records [r]?4/8/2015 9:31 PMProblem:
Duplicate entries in the master lyric statistic file reference empty lyric files. The manner in which statistics are generated (explicit inspection of lyric files) reports incorrect statistics for files marked as duplicates. A solution other than outright copy and past of original lyric files used to fill contents of duplicate sibling is desired.
Given:Duplicates consist of a first appearance (known as an originator) and then a repeating entry (known as the duplicate).Duplicates are marked.Originator contains valid lyric statistics.
Solution:A function that accomplishes what I need may exist but I wouldn't know what it is called. It will be quicker to create one.
Pseudo-code
loop{is duplicate?
what is index of originator?
copy statistic information of originator to duplicate via index.
} -
502Why is a hyperlink overflowing into note titles contained on my research page?4/8/2015 11:22 PMThe huge hyperlink below begins and ends in odd places.
Such behavior has been noticed in the past and in relation to textual descriptions of XML or HTML code as in the green box. -
503What should my introduction of this body of work say?4/12/2015 1:39 PMThe graph on the previous page is simple - it's an interactive scatter plot describing non earth shattering information. For all the graph's relative simplicity, though, several challenges in data collection, correction, storage, organization, and communication begged solution before it could appear in current form. As one who appreciates detail I thought it would be interesting to convey what I knew would be the intricacies of even the simplest of data science investigations for neurotic others to inspect. In suit, I have converted and posted the EverNote journal I kept for this project below. Any one who can follow the first 460 questions below would likely be able to make this website.
Lastly, please know that the voice in the following journal is not one in pursuit of political correctness but simply one in pursuit of correctness. Conflictingly, at times the tempo of investigation meant my fingers did not record ideas perfectly and a presence of typos and grammatical mistakes are certain.Without further ado, welcome to a slice of my investigative brain. -
504Is 'Cut The Cake' by Average White Band similar to 'Sugar' by Maroon 5?4/12/2015 9:05 PMMany songs explore similar themes - nothing out of the ordinary. The similarity between the way in which 'Sugar' and 'Cut The Cake' explore themes of sex and desire are interestingly close. In as far as ideas are concerned... yes, these songs are similar.
-
505Need to re-scrape certain songs from a better source - what needs to be done [r]?4/15/2015 6:31 PMSome songs downloaded have to many errors in them (an example of crowd sourcing gone wrong). Need a way to quickly scrape higher quality versions.Given:List of error songsHigher quality lyric repository already used once for collecting lyrics not available from other repositories.URI format known.
Solution:Recreate URL target list using 'Generate URL Targets v4.R'Modify to generate target list from only Year-Rank information available in 'error_file.txt' produced by 'Inspect ScrapedFiles v5.R'Re-scrape songs using 'Scrape Ubiquitous Lyric Data v6.R'
-
506Is it possible to specify multiple seperators (sep) for [r] function read.css?4/15/2015 7:17 PMAs explained in this post, it is not possible in R without resorting to string parsing. This is achieved on SO here and alternatively with a custom function here.
-
507How to read only a subset of columns available in a csv using read.csv[r]?4/15/2015 7:42 PMYou can use the
colClasses
argument toread.csv
to select the columns you want. In this case, you can setcolClasses
toc("NULL", NA, NA)
read.csv(file="result1", sep=" ", colClasses=c("NULL", NA, NA))
More generally, you can use colClasses to specify the particular types of columns;
NA
means to use the default approach which is to try and figure out what the column is automatically. See the help page forread.csv
for more details. -
508Is shhhhhhh a word?4/15/2015 9:46 PM
-
509Bruce Willis had a singing career?4/18/2015 1:39 PMApparently yes, his song 'Respect yourself' topped the Billboard charts at No. 5 in 1987.