Sync

Sync is very useful when you are importing or modifying files directly on the file system, e.g. with FTP. It basically compares the File Info (file size, modification time and MD5 checksum) stored in the database with the actual files.

New files will be added automatically, including ID3 tag detection and thumbnail generation. For files in sub-directories Sync creates the Category tree, i.e. for “folder1/subfolder/file.pdf” it creates the category named “subfolder” and it’s parent category “folder1”.

You find the Tool on WP-Filebase Dashboard:

Where to find the Sync Button

The Mechanism

Sync can be divided in mutliple steps:

  1. At first the plugin checks for changed or deleted files. In normal sync operation it just compares sizes and modification dates to detect a file changed. In hash sync it compares the MD5 hashes, which is more reliable, but can take much more time. When a file change is detected, it updates the values in the database including extended file info (ID3 tags).
  2. New files are added to the database. When adding new files with sync WP-Filebase looks for an image in the same directory with the same basename to use it as thumbnails. Example: document.pdf and document.png. Images named folder.png are used for category icons.
  3. Remote Syncs are executed (WP-Filebase Pro only). See Remote Syncs.
  4. In a final step file tags are refreshed and categories are synchronized. (updating file count and checking for missing folders)

Thumbnails

You can simple add thumbnails to your files by uploading an image with the same name as the file. For example mypdf.jpg will be used as thumbnail for mypdf.pdf. Also images with names like document-80×120.jpg are use as thumbnails. Any images named ‘folder.jpg’, ‘folder.png’, ‘folder.gif’, ‘cover.jpg’, ‘_caticon.jpg’, ‘_caticon.png’ or ‘_caticon.gif’ will be treated as category icon.

Troubleshooting

A complete successfull sync should end with the message:

 Filebase successfully synced.

If not something went wrong. In most cases its PHP running out of memory, especially when adding a great amount of files. To reduce memory usage, disable ID3 detection (WP-Filebase -> Settings -> Misc).

If you still have problems, append &debug=1 to the URL of the sync page, for example: //wpfilebase.com/wp-admin/admin.php?page=wpfilebase_manage&action=sync&debug=1 . The plugin will output backtrace debug info in HTML comments. Right click on the page, view source  and scroll to the end of the document to see the last backtrace info. Copy this text and include it in a support request to get help.

Batch Sync

WP-Filebase Pro has a batch sync mechanism to partition the files to add. The sync is executed with multiple HTTP requests so the overall memory usage is reduced. That allows to sync an unlimited number of files at once.

The normal sync would run out of memory after about 1000 files (with ID3 tag detection enabled and depending on server’s operating system and configuration). Batch Sync detects if the memory usage is critically and starts a new request if needed.

18 thoughts on “Sync

  1. VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    Hi there! I have a question… on one site where I’m testing WP-Filebase, to see if it meets my requirements, I had already uploaded all the files — around 126 GBytes or so. But there are not really many files, just 44 in total, each of them being huge (mostly they’re videos made from workshops). The problem here is that I don’t get any progress from the “sync” feature.

    Even with debug on, all I get is:

    Synchronisation
    Starting sync. Memory usage: 59.8 MiB – Limit: 90.0 MiB

    Checking for file changes… done!

    Searching for new files…
    44 Files found, 36 new.

    and a bar saying 0% complete. It stays that way for hours.

    The debug file just says “INIT SYNC” every time I initiate it. No errors on the logs. Although this is on a shared server with limited memory, I don’t see anything happening to give me a clue if the server is timing out somehow, or hitting a block somewhere but not throwing any errors…

    Nevertheless, it has worked for a while: after all, it did manage to find 8 files, tagged them correctly and so forth, and everything works well on the page where the tree view is placed with a shortcode, etc. So I know that the setup (permissions, etc.) is fine. Some of the files that were found on the first run are a couple of GBytes long, so WP-Filebase seems not to have any trouble with huge files (even if there are much bigger ones).

    My problem is that I get no feedback for the remaining 36 files, and no clue if WP-Filebase is actually doing anything or not.

    Short of directly hacking the database to “reveal” those files — something which I’m not familiar with and would definitely avoid at all if possible — is there anything else I can do to actually “see” if WP-Filebase is doing something or not?

    Note that this is for an educational site where students who have missed a workshop or two are able to access a password-protected area where they have access to notes and I wish to give them access to whole videos or audio transcripts.

    1. VA:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      After several refreshes, testing at different times of the day, I managed, so far, to reach 8% 🙂 — I turned ID3 detection off, to see if that improved things, and separated all files according to different folders/categories (to see if that helped). I think that now it’s just a question of being lucky until I get the whole process to complete at least once.

      It would be nice to get a bit more feedback on what’s failing, though.

  2. VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)

    Some further feedback. To populate the database, apparently WP-Filebase calls md5sum externally. Now when this is invoked for huge files (several GBytes), it seems to fail with a “broken pipe” error, which is understandable. My hosting provider limits the amount of memory a script can consume, and it’s perfectly reasonable to expect that an application needing GBytes and GBytes of data will be killed sooner or later.

    While it’s reasonable to assume that MD5 checksums are useful for people to confirm that they have indeed downloaded the correct file (and that it has no errors), would it be possible to change the Sync functionality so that it’s a two-step process — the first time it runs, it just fills all the required fields on the database (so at least everything gets properly registered on the database), and on the second step it does the MD5 checksums and all other steps that require huge amounts of memory or of processing time? Ideally, of course, one should have some options to refresh/sync individual files.

    I’m sorry for all my comments here — as said, I’m aware I’m on a relatively underpowered server, and these kinds of issues will only affect those people who, like me, have to struggle every day with memory and running time limits — but I’m eager to get this plugin working for my customer, see if they like it, so that I can upgrade to Pro as soon as they approve it. However, if I can’t get them to test the plugin out, I will have no option but to try a different one. Which is a pity, considering that I have already spent so many hours with WP-Filebase and struggling to get it working…

  3. VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    All right, I can confirm that the main issue is not with the plugin by itself, but with the amount of running time for md5sum. For files with over a few GBytes it takes so long that the pipe connection between the PHP process and the md5sum process times out, and obviously Sync will fail.

    A temporary hack is just to edit classes/Admin.php and comment the contents of GetFileHash() by returning something fake. This allowed me to import all files in a few seconds (11 seconds, to be more precise).

    Obviously it means that the file hash is now useless. My next step is to go through the whole database and manually add the file hashes.

    The issue with the timeouts and having the exec call hang Apache/PHP is a tricky one. From what I’ve gathered, a (possible) solution would be to use proc-open() or the more simple popen() calls instead. These allow processes to be spawned asynchronously and not have PHP wait for them. Of course this requires a big change on the code! Essentially, there would be a loop gathering all filenames and relevant data, and a batch of asynchronous processes to be launched for each one (it’s not a good idea to launch hundreds of them at the same time, of course!), which would update the database as soon as the hashes are known. So, in theory at least, the whole database would be quickly populated with the filenames in a few seconds, while the hash processing might take hours or days, but eventually it would finish.

    Changing the code for implementing that asynchronous behaviour is utterly beyond me, I’m sorry.

    1. Fabian says:
      VN:F [1.9.22_1171]
      Rating: +1 (from 1 vote)

      Hello Gwyneth,

      good aspects you pointed out. Making the sync implementation asynchronous, would be a quite efficient and advanced approach. Like you said, it is a tough coding challenge though, since PHP is not designed for asynchronous tasks, much workarrounds would be needed.

      I think splitting the sync into 2 task is the best solution: first populate the database with file names, then run a cron-like task in the background that calculates md5 and reads ID3 data from files synchronously.

      I’ve put this on the todo list, however I’m quite busy with other features, like bulk upload/edit, so it might take some weeks until I can give it a try.

      Regards
      Fabian

      1. VA:F [1.9.22_1171]
        Rating: 0 (from 0 votes)

        Thanks, Fabian 🙂 That’s all right, as said, right now, my workaround works well enough for me: just edit GetFileHash() to return something fake, and add the MD5 sum manually, processing it on the console. It definitely works for me.

        I thought that a possible solution would be “on-demand” MD5 checksumming, i.e. while the file is being uploaded, it gets automatically MD5’d (which would happen on Flash/Java/JavaScript running locally and not server-side). While this alternate approach might be a bit tricky, apparently a few people are developing things like that:

        http://stackoverflow.com/questions/4188451/multiple-file-upload-with-md5-check-before-upload

        http://stackoverflow.com/questions/768268/how-to-calculate-md5-hash-of-a-file-using-javascript

        But this is a bit tricky in terms of security, since fake hashes might be uploaded, or something like that, and it’s better to let the server calculate them after the file is fully received.

  4. Raul Pessoa says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    Hello,
    is there anyway to trigger the Syncing process in less than an hour? I would be very appreciated if we could sync it more regularly…

    Thank you,
    Raul

    1. Fabian says:
      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      Hello Raul,

      next to the sync button there’s a URL for external cron service. You’ll find cronjob services at google.

      Regards
      Fabian

      1. Raul Pessoa says:
        VA:F [1.9.22_1171]
        Rating: 0 (from 0 votes)

        Hi Fabian,

        im sorry, but I don’t see any URL for external cron service… We are thinking of purchasing the PRO version, but this would be something important for our decision.

        Thank you,
        Raul

  5. Raul Pessoa says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    Hi Fabian,

    im sorry, but I don’t see any URL for external cron service… We are thinking of purchasing the PRO version, but this would be something important for our decision.

    Thank you,
    Raul

    1. Fabian says:
      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      Hi,
      this is only available in Pro, please check the demo at http://demo.wpfilebase.com/

      Regards

  6. VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)

    Hi,

    I purchased the Pro version… Is it possible to have the files sync without having to manually push the “Sync Filebase” in the admin area?

    I have added files via the FTP and manually synced the the files display. However my client needs to to be able to add files to the FTP and it automatically without the “Sync Filebase” having to be manually done. The WFP admin is saying “Last cron sync on October 3, 2013 at 2:49 pm.” but a file that I added yesterday is not showing up on the clients website. Please help!

    Thanks,
    tyra.nicole

  7. victor cosentino says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    Hi Fabian

    I am having problems searching pdfs. I am using the Pro version and trying to search on pdfs. There are over 400 documents on my site, almost all are pdfs. I have Ghostscript 9.10 installed and have run the rescan more than once. But my search results using your search widget do not generate hits on the pdfs. It only searchs the manually entered metadata. Some of the pdfs are simple scans of hardcopy documents so I do not expect to search in those but others are pdfs that can be searched using any pdf viewer.

    Other than running the rescan, are there any other steps I need to take to make the plugin’s search widget search these pdfs? In settings I have the following options which might affect searching checked: “Search Integration,” “Generate PDF thumbnails,” “Search ID3 Tags,” and “Content Keywords.” In breaking down the problem, I want to make sure ghostscript is working on the rescan. Is there a way to see if the rescan has extracted any searchable information from the pdfs? I can’t find any way to see what the rescan did. Any advice or help would be appreciated. Thanks,

    Victor

    1. Fabian says:
      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      Hi Victor,

      Below the form wehn editing a file, you’ll find the “File Info Tags” box. There you can what keywords are used for search.
      These keywords are automatically searched with the widget, no special configuration required.

      Regards
      Fabian

      1. victor cosentino says:
        VA:F [1.9.22_1171]
        Rating: 0 (from 0 votes)

        Hi Fabian,

        Thanks for the information. Using that I was able to determine that the widget is correctly searching the keywords. The problem seems to be that ghostscript is not generating keywords for every pdf. Some of them have text that can be selected in a viewer so the keyword should be extractable, but ghostscript isn’t extracting it. I am still looking into that but I’m not sure what to do if it is a limitation of ghostscript.

        Victor

  8. John Henry says:
    VA:F [1.9.22_1171]
    Rating: -1 (from 1 vote)

    Is it possible to include a file (txt?) on FTP uploads that will contain file information for other items such as image files?

    1. John Henry says:
      VA:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      images was a bad example, there is metadata there.. such as EXIF.. I guess I should ask about .zip files especially

  9. Ryan Moody says:
    VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)

    Can someone please tell me how to enable the “Batch Sync” feature of WP-Filebase Pro?

    Thanks,

    Ryan

Leave a Reply

Your email address will not be published. Required fields are marked *