The R programming language


The R programming language has become one of the standard tools for statistical data analysis and visualization, and is widely used by Google and many others. The language includes extensive support for working with vectors of integers, numerics (doubles), and many other types, but has lacked support for 64-bit integers. Romain Francois has recently uploaded the int64 package to CRAN as well as updated versions of the Rcpp and RProtobuf packages to make use of this package. Inside Google, this is important when interacting with other engineering systems such as Dremel and Protocol Buffers, where our engineers and quantitative analysts often need to read in 64-bit quantities from a datastore and perform statistical analysis inside R.

Romain has taken the approach of storing int64 vectors as S4 objects with a pair of R’s default 32-bit integers to store the high and low-order bits. Almost all of the standard arithmetic operations built into the R language have been extended to work with this new class. The design is such that the necessary bit-artihmetic is done behind the scenes in high-performance C++ code, but the higher-level R functions work transparently. This means, for example, that you can:

• Perform arithmetic operations between 64-bit operands or between int64 objects and integer or numeric types in R.
• Read and write CSV files including 64-bit values by specifying int64 as a colClasses argument to read.csv and write.csv (with int64 version 1.1).
• Load and save 64-bit types with the built-in serialization methods of R.
• Compute summary statistics of int64 vectors, such as max, min, range, sum, and the other standard R functions in the Summary Group Generic

For even higher levels of precision, there is also the venerable and powerful GNU Multiple Precision Arithmetic Library and the R GMP package on CRAN, although Romain’s new int64 package is a better fit for the 64-bit case.

We’ve had to work around the lack of 64-bit integers in R for several years at Google. And after several discussions with Romain, we were very happy to be able to fund his development of this package to solve the problem not just for us, but for the broader open-source community as well.

YouTube: More Ways to Find What You’re Looking For

We’ve got some exciting additions to the list of supported search parameters for YouTube feeds that should make it easier to narrow down your search results to exactly the videos you’re looking for. Each of these search parameters has an accompanying element in a video entry’s metadata, which we’ll cover as well. Here’s a quick rundown:

  • license – This parameter lets you filter search results based on whether they’re Creative Commons licensed (license=cc) or use the standard YouTube license (license=youtube). The default behavior is to return videos regardless of their license in search results. The license for a given video entry is reflected in its element.
  • hd – This one lets you request videos that have high-resolution versions available. If you specify hd (no value is needed), all the videos in your search results will be available for playback in at least 720p, and higher resolutions, like 1080p, might be available, too. If you leave the parameter out, then search results won’t be filtered at all based on resolution. The element corresponds to this search parameter.
  • duration – If you cater to an audience with a short attention span, then this parameter is for you. This parameter lets you filter search results based on video length. To find videos less than 4 minutes long, use duration=short. To find videos that are between 4 and 20 minutes long (inclusive), use duration=medium. Only videos that are longer than 20 minutes will be returning when requesting duration=long. The element in a video entry provides a video’s exact runtime.
  • 3d – Finally, for those of you living in the future who want to find 3D content on YouTube, this aptly-named parameter is for you. Adding 3d (no value is needed) to your searches will ensure that all videos you get back are available for viewing in 3D. Videos that are available in 3D will have a element in them, and that element will contain more detail about the nature of the 3D content in the given video.

Putting it all together, let’s say you want to use the API to find Creative Commons-licensed 3D YouTube videos that are available in resolutions of 720p and above and are longer than 20 minutes.The following request URL will return a feed of such videos:

https://gdata.youtube.com/feeds/api/videos?prettyprint=true&v=2&license=cc&hd&duration=long&3d

Gmail Inbox Feed with .NET and OAuth

Gmail servers support the standard IMAP and POP protocols for email retrieval but sometimes you only need to know whether there are any new messages in your inbox. Using any of the two protocols mentioned above may seem like an overkill in this scenario and that’s why Gmail also exposes a read only feed called Gmail Inbox Feed which you can subscribe to and get the list of unread messages in your inbox.

The Gmail Inbox Feed is easily accessible by pointing your browser to https://mail.google.com/mail/feed/atom and authenticating with your username and password if you are not already logged in.

Using basic authentication to access the inbox feed doesn’t provide a very good user experience if we want delegated access. In that case, we should instead rely on the OAuth authorization standard, which is fully supported by the Gmail Inbox Feed.

OAuth supports two different flows. With 3-legged OAuth, an user can give access to a resource he owns to a third-party application without sharing his credentials. The 2-legged flow, instead, resembles a client-server scenario where the application is granted domain-wide access to a resource.

Let’s write a .NET application that uses 2-legged OAuth to access the Gmail Inbox Feed for a user in the domain and list unread emails. This authorization mechanism also suits Google Apps Marketplace developers who want to add inbox notifications to their applications.

There is no dedicated client library for this task and the Inbox Feed is not based on the Google Data API protocol but we’ll still use the .NET library for Google Data APIs for its OAuth implementation.

Step-by-Step

First, create a new C# project and add a reference to the Google.GData.Client.dll released with the client library. Then add the following using directives to your code:

using System;
using System.Linq;
using System.Net;
using System.Net.Mail;
using System.Xml;
using System.Xml.Linq;
using Google.GData.Client;

The next step is to use 2-legged OAuth to send an authorized GET request to the feed URL. In order to do this, we need our domain’s consumer key and secret and the username of the user account we want to access the inbox feed for.

string CONSUMER_KEY = "mydomain.com";
string CONSUMER_SECRET = "my_consumer_secret";
string TARGET_USER = "test_user";

OAuth2LeggedAuthenticator auth = new OAuth2LeggedAuthenticator("GmailFeedReader", CONSUMER_KEY, CONSUMER_SECRET
, TARGET_USER, CONSUMER_KEY);
HttpWebRequest request = auth.CreateHttpWebRequest("GET", new Uri("https://mail.google.com/mail/feed/atom/"));
HttpWebResponse response = request.GetResponse() as HttpWebResponse;

The response is going to be a standard Atom 0.3 feed, i.e. an xml document that we can load into an XDocument using the standard XmlReader class:

XmlReader reader = XmlReader.Create(response.GetResponseStream());
XDocument xdoc = XDocument.Load(reader);
XNamespace xmlns = "http://purl.org/atom/ns#";

All the parsing can be done with a single LINQ to XML instruction, which iterates the entries and instantiates a new MailMessage object for each email, setting its Subject, Body and From properties with the corresponding values in the feed:

var messages = from entry in xdoc.Descendants(xmlns + "entry")
               from author in entry.Descendants(xmlns + "author")
               select new MailMessage() {
                   Subject = entry.Element(xmlns + "title").Value,
                   Body = entry.Element(xmlns + "summary").Value,
                   From = new MailAddress(
                       author.Element(xmlns + "email").Value,
                       author.Element(xmlns + "name").Value)
               };

At this point, messages will contain a collection of MailMessage instances that we can process or simply dump to the console as in the following snippet:

Console.WriteLine("Number of messages: " + messages.Count());
foreach (MailMessage entry in messages) {
    Console.WriteLine();
    Console.WriteLine("Subject: " + entry.Subject);
    Console.WriteLine("Summary: " + entry.Body);
    Console.WriteLine("Author: " + entry.From);
}

If you have any questions about how to use the Google Data APIs .NET Client Library to access the Gmail Inbox Feed, please post them in the client library discussion group.