Cleaning up filenames that have non utf8 characters in them

Using a code snippet to clean up filenames.

Gradually adjustments are made in WordPress so that non English characters from file names are sanitized (changed to English characters) when these are uploaded. The reason for changing the characters in a file name is to make sure that the server is able to read and use the file that has been uploaded.

Example from Norwegian letters, ÆØÅ. Æ is changed to AE. Ø is changed to O. Å has not changed yet but will be added to WordPress 6.1 coming this fall.

Some examples.

File: ~My WordPress Upload~.jpg
Default WordPress: My-WordPress-Upload.jpg
Code snippet Solution: my-wordpress-upload.jpg

File: ÐÕçument full of $$$.pdf
Default WordPress: ÐÕçument-full-of-.pdf
Code snippet Solution: document-full-of.pdf

File: Really%20Ugly%20Filename-_-That_-_Is_Too Common…..png
Default WordPress: Really-Ugly-Filename-_-That_-_Is_Too-Common…..png
Custom Solution: really-ugly-filename-that-is-too-common.png

The sanitize PHP code snippet

To be added to the child theme functions file or a code snippet plugin.

/**
 * Produces cleaner filenames for uploads
 *
 * @param  string $filename
 * @return string
 */
function wpartisan_sanitize_file_name( $filename ) {

	$sanitized_filename = remove_accents( $filename ); // Convert to ASCII

	// Standard replacements
	$invalid = array(
		' '   => '-',
		'%20' => '-',
		'_'   => '-',
	);
	$sanitized_filename = str_replace( array_keys( $invalid ), array_values( $invalid ), $sanitized_filename );

	$sanitized_filename = preg_replace('/[^A-Za-z0-9-\. ]/', '', $sanitized_filename); // Remove all non-alphanumeric except .
	$sanitized_filename = preg_replace('/\.(?=.*\.)/', '', $sanitized_filename); // Remove all but last .
	$sanitized_filename = preg_replace('/-+/', '-', $sanitized_filename); // Replace any more than one - in a row
	$sanitized_filename = str_replace('-.', '.', $sanitized_filename); // Remove last - if at the end
	$sanitized_filename = strtolower( $sanitized_filename ); // Lowercase

	return $sanitized_filename;
}

add_filter( 'sanitize_file_name', 'wpartisan_sanitize_file_name', 10, 1 );

Plugins:

https://wordpress.org/plugins/clean-image-filenames/
https://wordpress.org/plugins/sanitize-spanish-filenames/

This tutorial was originally published 2 July 2016.

Share the article:

Leave a Reply

Your email address will not be published. Required fields are marked *