Linux – Shell scripts in bash download files from ftp servers

Shell scripts in bash download files from ftp servers… here is a solution to the problem.

Shell scripts in bash download files from ftp servers

I have to write a shell script for the bash shell to transfer files from the ftp server
FTP Server — [email protected]
User user1
Password pass1

Now in /dir1/dir2 of the ftp server I have folders
of the following form

I have to copy the file “file1.iso from the latest folder, which is in this example.
I also have to check the integrity of the file when copying, i.e. assuming the file is being uploaded to the server, then if I start copying, in this case the copy will not be completed.

I have to do it every 4 hours. This can be done by making it a cron job. Please help

I’ve done it
I installed the ftp server folder on my local machine. To check if the file is fully uploaded, I check the size 50 times every 5 seconds, if the size is the same, copy it, otherwise run the script after 4 hours…
I maintain a text file “foldernames.txt” which contains the names of all the folders from which I copied the required files. So I checked to see if a new folder was added on the server by checking the name in the foldername.text file:

Everything works fine, now the only question is: Suppose a file is being downloaded, but there is some network failure at that time: How do I make sure I have fully downloaded the file…. I tried using md5sum and chksum but it takes a long time to calculate the installed folders. Please help

Here is my script…

# changing the directory to source location 
echo " ########### " >> /tempdir/pvmscript/scriptlog.log
echo `date`>> /tempdir/pvmscript/scriptlog.log
echo " script is strting " >> /tempdir/pvmscript/scriptlog.log
cd /var/mountpt/pvm-vmware
# array to hold the name of last five folders of the source location
declare -a arr
for folder in `ls -1 | tail -5 `; do
#echo $folder
echo " array initialised " >> /tempdir/pvmscript/scriptlog.log
#now for these 5 folders we will check if their name is present in the list of copied         
#  folder names
echo " checking for the folder name in list " >> /tempdir/pvmscript/scriptlog.log
## $(seq $((i-1)) -1 0 
for j in $(seq $((i-1)) -1 0  ) ; do
echo " ----------------------------------------" >>  /tempdir/pvmscript/scriptlog.log
echo " the folder name is $var3" >> /tempdir/pvmscript/scriptlog.log
# checking if the folder name is present in the stored list of folder names or not
foldercheck=$(grep $var3 /tempdir/pvmscript/foldernames.txt | wc -l)
if test $foldercheck -eq 1
echo " the folder $var3 is present in the list so will not copy it " >>  /tempdir/pvmscript/scriptlog.log
foldercheck=" "
echo " folder $var3 is not present in the list so checking if it has the debug.iso file ">> /tempdir/pvmscript/scriptlog.log
#enter inside  the new folder in source
cd  /var/mountpt/pvm-vmware/$var3
# writing the names of content of folder to a temporary text file
ls -1 > /var/temporary.txt
#checking if the debug.iso is present in the given folder
var5=$(grep debug.iso /var/temporary.txt | wc -l)
var6=$(grep debug.iso //var/temporary.txt)
# if the file is present then checking if it is completely uploaded or not  
rm -f /var/temporary.txt
if test $var5 -eq 1 
echo " it has the debug.iso checking if upload is complete   ">>/tempdir/pvmscript/scriptlog.log
# getting the size of the file we are checking if size of the file is constant or     changing    # after regular interval
var7=$(du -s ./$var6 |cut -f 1 -d '.')
#echo " size of the file is $var7"
sleep 50s
# checking for 5 times at a regular interval of 50 sec if size changing or not 
for x in 1 2 3 4 5 ; do
var8=$(du -s ./$var6 |cut -f 1 -d '.')
#if size is changing exit and check it after 4 hrs when the script will rerun
#echo " size of the file $x is $var7"
if test $var7 -ne $var8
echo " file is still in the prossess of being uploadig so exiting will check after 4 hr  " >> /tempdir/pvmscript/scriptlog.log
sleep 50s
#if the size was constant copy the file to destination
if test $check1 = "true" 
echo " upload was complete so copying the debug.iso file  " >>  /tempdir/pvmscript/scriptlog.log
cp $var6 /tempdir/PVM_Builds/ 
echo " writing the folder name to the list of folders which we have copied " >>  /tempdir/pvmscript/scriptlog.log
echo $var3 >> /tempdir/pvmscript/foldernames.txt
echo " copying is complete  " >> /tempdir/pvmscript/scriptlog.log
#echo $foldercheck >> /vmfs/volumes/Storage1/PVM_Builds/foldernames.txt
echo " it do not have the debug.iso file so leaving the directory "  >>/tempdir/pvmscript/scriptlog.log
echo $var3 >> /tempdir/pvmscript/foldernames.txt
#rm -f /var/temporary.txt


Here are some comments and requests for clarification, see the interrupt section below for a possible answer.

(Happy to update your question.) )

How big are these files?

Whether you can control when these files are created (for example, database backups).

It would also be helpful to know more details about these files, such as size, MB, GB, TB, PB? and the source, db-backup, or ???。 from which they were created

Are your concerns theoretical, proactive exploration of worst-case scenarios, or if you encounter real problems, how often are they and what are the consequences?

Is your SLA an impractical/unattainable management pipe dream? If so, then you must start creating documentation to indicate that the current system will require X number of additional resources (personnel, hardware, programming, etc.) to correct the defects in the system.

If the transferred file is a data file created by the source

system, one technique is to have the source system create a “flag” file that is sent after the main file is sent.

It may contain similar details

  filename : TradeData_2012-04-13.dat
  recCount : 777777
  fileSize : 37604730291
  workOfDate: 2012-04-12
  md5sum    : ....

So now your system is waiting to discover that the flag file has been delivered because you are using a standard naming convention for each file you receive, and you are using a standard date stamp embedded in the file. When the file arrives, your script calculates each relevant detail and compares them to the values stored in the flag file.

If you can’t schedule this level of detail, at least you can compare tests for new files after a generic flag file, a daily batch of files (all files are sent after all files are completed) against a set of tests that make sense for your particular situation,…… Some of the following:

  • The file must be at least X large
  • The file must have at least N records
  • The file can never be smaller than yesterday’s file
  • And so on

Then your defense is “we can’t have full control over these files, but we checked their X, Y, Z and it passed those tests, that’s why we loaded them”.

While rsync may be good, I don’t understand how you can be sure it’s safe to start loading files in some of the cases mentioned, as rsync might start adding more data to the file.

Read through your script, and if you can’t get a detailed flag file from the source code, you’re on the right track. Glenn Jackman’s solution aims to achieve the same goal with less code. You can put it into the script file “” or similar and put it into a while loop that exits only when the “” exits successfully. I guess I want some type of notification and it already takes 3*normalTime to run. But when you try to cover all the conditions, it can get very complicated. There are 3rd party tools to manage file downloads, but we never had the budget to buy them, so I can’t recommend any tools.


Hope this helps.

P.S. Welcome to StackOverflow (S.O.) Remember to read the FAQ, Use the gray triangle for a good Q/A vote, and accept the answer to your question, if any, that can be solved by pressing the check mark

Related Problems and Solutions