Migrating social media data with the data transfer project
New Address
The Data Transfer Project wants to make it easier to move your data between social media sites.
Data portability and transparency are ongoing issues that plague all major social media giants. Who owns the data you post to your social media accounts? Can you get a copy of the data if you ask for it? If you had a copy, what could you do with it?
Many leading social media companies have APIs that let you extract and upload data, but the data formats tend to be dissimilar and proprietary, which means if you obtained the data, you couldn't do much with it unless you are a programmer yourself and have plenty of time for personal coding.
Back in 2018, a few leading social media companies pledged to make an effort at addressing this problem. The result is the Data Transfer Project, which was recently rebranded and expanded as the Data Transfer Initiative [1]. The mission of the Data Transfer Project (and Initiative) is to support a common neutral format for social media data as it passes from one platform to another, as well as to provide the tools necessary for transforming data in and out of that format.
In the long term, the goal is for the user to be free from direct involvement in the migration. A user who wishes to move data from one platform (say Twitter) to another platform (say Facebook) will simply choose an option in the Facebook user interface, and the migration will happen automatically. According to the Data Transfer Project developers, the purpose of the project is to "Extend data portability beyond a user's ability to download a copy of their data from their service provider ("provider"), to providing the user the ability to initiate direct transfer of their data into and out of any participating provider" [2].
The project was originally started by Google, Apple, Meta, Microsoft, Twitter, and SmugMug (Figure 1), but other companies are invited to join in. The Data Transfer Project is still a work in progress, but some of the code is available on GitHub [3] (Figure 2), and the developers provide API keys for the participating services for those who are interested in testing [4].
How It Works
According to Meta's Engineering Blog [5], The Data Transfer Project consists of three main components:
- a set of shared data models to represent each vertical (i.e., photos, contacts, playlists),
- adapters, which handle the authentication of a user to a service (normally OAuth) and the transformation of data to and from the shared data models (importers and exporters),
- and a task management framework, which puts all the pieces together and handles the life cycle of a transfer job, including job creation and running the transfer.
The data models provide the neutral format needed for transferring data to or from the participating platforms. One of the guiding principles of the project is that the vendors should not have to rewrite their APIs. Instead, the vendor provides an adapter to transform the data from the platform's own format to the neutral format – and also to transform the neutral data to a format needed for data import. The adapters basically serve as extensions of the API. Without something like DTP, a vendor would have to create a different solution for migration to every different platform. With DTP, the vendor just has to write an adapter for importing data from the neutral format and one to export code into the neutral format.
Of course, you can't migrate the data unless you have access to it. In addition to the data adapters are authentication adapters that allow the requesting service to access the originating service. According to the DTP documentation, "OAuth is likely to be the choice for most providers, however the DTP is agnostic to the type of authentication."
Why and Why Now?
If you've ever tried to download your information from major providers like Facebook, you'll know that they already have an online tool to download your account data, so you might wonder why the DTP is necessary.
The reason is that the data you download isn't specifically designed to be interoperable with other services. Facebook has rectified this to some extent by allowing you to download your account data in JSON rather than HTML format to make it easier to reupload to another service [6] but stills caution that the files are for your personal use (Figure 3).
New regulations like GDPR (General Data Protection Regulation) in Europe also require companies to provide all data they hold on their customers. In the case of websites like Facebook, the data download doesn't necessarily include information like location data, facial recognition, links to friend's profiles, and so on.
As Congressman David Cicilline pointed out around the time of the Cambridge Analytica scandal, smaller companies will only become competitive with Facebook if it's easy for existing users to transfer all their data elsewhere, such as html links to friends' profiles [7].
Open Data and Open Source
In their announcements, both Facebook and Google play up the open source nature of the data transfer tools. There are public, open source extensions that allow the Data Transfer Project to be run on the Google Cloud Platform and Microsoft Azure. But the overall goal of the project is to support open data – not necessarily open source. An open source tool that supports migration to a proprietary format or a closed source service running in the cloud is not an ideal case study in free software. Because the software running on these platforms is proprietary, there's no way to be certain every part of your user data has been copied to your destination platform in a secure way.
Kevin Bankston, Director of the Open Technology Institute, has urged tech companies to go further [8]. Although he is optimistic about companies like Twitter and Facebook who offer downloads of account data in JSON format, he pointed out in 2018 that "Social networks should consider using the Activity Streams 2.0 open standard [9], a particular JSON-based format for exporting social media items. Facebook helped develop the standard at the World Wide Web Consortium, but right now only decentralized social network tools like Mastodon use it." (Since the time of writing Twitter has also adopted the Activity Streams format) [10].
Bankston mentions that tech giants could segue around the data transfer issue by making their platforms more interoperable. He points out that Meta's Developer Policy [11], e.g., makes it extremely difficult to create an app that makes full use of the API and replicates Facebook's core functionality like instant messaging.
Facebook also famously dropped support for the open XMPP standard for messaging in 2015 in favor of their own proprietary standards [12]. If they and other platforms were to agree to adopt XMPP (Figure 4), this would allow a Facebook user to view contacts and message and video-call users of other services like Skype or Google Chat without moving their data.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.