We are currently doing some migration test between our Drupal 7 website (https://atlas.cern/) and our Drupal 8 website (atlas-public-d8.web.cern.ch/).
From D7, we use the Views Datasource module to expose JSON from a view. From D8, we then use Feed, Feeds extensible parsers, Tamper and Feeds Tamper in order to get our data, clean them, and map them to create nodes.
It seems that the pictures we uploaded locally don’t have relative path but instead we would find links such as :
<img alt="Plots or Distributions,Physics,ATLAS" src="https://atlas-public-dev.web.cern.ch/sites/atlas-public-dev.web.cern.ch/files/field/image/ATLAS-chargeasymmetry-figure1.png" style="width: 1892px; height: 1415px;" />
On D7, local picture are located under /files/files. Our idea was to copy this folder under /files in D8 and then replace the current links by relative links pointing to this folder.
Is it something you would recommend ? What would you recommend to do so ?
Thank you very much for your help.
it seems these images were uploaded through a file field, but they are used inside the body field, so that’s why they have absolute paths…
I guess the only way to sort this out is maybe use a rewrite on the body field that uses some search/replace to adapt the URLs to your needs in the new site.
This module can help: https://www.drupal.org/project/views_regex_rewrite
Another possibility is to export the JSON as a file, then perform the search/replace offline, and use the result as input for the feed importer (I reckon you can upload a file directly instead of using a URL?).
Note: In any case, in terms of file management, you should really make sure that the files are imported through file fields, and not just directly copied on the filesystem, because this way you make sure Drupal will manage them and delete them or export them when nodes are deleted, exported, etc.
The part about getting a file field exposed by URL to be imported into a file field by a feed importer is covered in the post I wrote about exporting/importing content from D7 to D8: How To Export Content From D7 And Import It In D8 Using Views Data Export / Feeds Import
I just added an update today
Thank you very much for your answer.
As you noticed, these images were imported manually in a body field. I’ve tried changing manually the URL of a picture in a JSON file and importing it with Feed but i still get the same issue. From what i understood of your post, you don’t recommend importing manually the pictures in the target file system because Drupal won’t keep track of them. How would you make Drupal download the pictures inside a body field ?
When cloning a website, it seems that all the links are automatically changed even for the body fields. Do they somehow rebuild the map that keeps tracks of the files Drupal has ?
there are two different issues here:
Your ‘hacked’ JSON file gets imported but the URLs of the images are somehow changed in the import process, so you again get ‘problematic’ URLs after the import. Did I understand correctly? Can you put some code or screenshots so I can take a look at what might possibly be going on? Thanks!
The images themselves are not part of any Drupal’s native file fields, so they are basically referenced inside body fields.
This is only problematic in the sense that these files are orphaned, and Drupal will not delete them when you delete the nodes referencing them, or include them in an export. Also, as you pointed out, since the URLs are absolute, the cloning mechanism set up as part of the Drupal infrastructure will do some ‘magic trick’:
since it identifies the site’s name inside the content it will automatically be replaced with the target site’s name. This is pretty clever since normally when you clone a site, the filesystem is also cloned
My recommendation is: create a proper Image field on your target content type, and make the Feed Importer import the source image on this field instead. Then you can always reference it on the body field, but at least the image will be part of that node.
PS: This of course can only work if the original content type is also using image fields to store these images… otherwise, I guess you can only patch the URLs and own the technical debt