Open a file with more than 1048576 rows in Excel/Openoffice

Hi,

I sometimes work (sort, filter, create pivot tables, etc) with CSV files that have a few million rows.

Since Excel & Open office don't support more than 1048576 rows I have to first upload the CSVs into a database and then work on them - which isn't very convenient as I have to spend extra time on uploading the data. Depending on the file size this sometimes takes an hour or so. ๐Ÿ˜”

I'd like to avoid splitting the CSV into smaller files.

Has anyone faced a similar problem before? Are there any other solutions - perhaps there are tools I'm unaware of?

Also, slightly off topic, why is there a limitation on the number of rows? Earlier versions of office supported about 65k rows, now they've "upgraded" to a little over 1 million rows. ๐Ÿ˜จ

Replies

  • Manish Goyal
    Manish Goyal
    did you check this discussion?

    #-Link-Snipped-#
  • Kaustubh Katdare
    Kaustubh Katdare
    MS Access, perhaps? If that's not a solution; I'd rather setup a local database with phpMyAdmin (XAMPP or MAMP) and simply import all my data into it. It'd make all the operations a breeze. What say?
  • simplycoder
    simplycoder
    If Microsoft stack then perhaps it can be done using SSIS packages.
  • Aashish Joshi
    Aashish Joshi
    I did try importing the csv file "as-is" into Oracle 11g Express Edition using the Application Express UI. That failed miserably - 6 hours and not a single row imported! ๐Ÿ˜”

    Then I had a light-bulb moment! ๐Ÿ˜

    I wrote a small Perl script to read the csv and add the columns that were of interest to me into mysql using "Insert Delayed". This worked like a charm ๐Ÿ˜€
  • Kaustubh Katdare
    Kaustubh Katdare
    Aashish Joshi
    I wrote a small Perl script to read the csv and add the columns that were of interest to me into mysql using "Insert Delayed". This worked like a charm
    Sharing is caring, my friend.
  • Aashish Joshi
    Aashish Joshi
    Here's the perl script I used. Hope it helps someone ๐Ÿ˜€

    It will read a csv line-by-line and and add the values to an Oracle database. With minor modifications the same script can be used for MySQL and other databases as well.

    The script ignores the first line that usually contains the column names.

    use strict;
    use warnings;
    use Text::CSV;
    use DBI;
    use MIME::Lite;
    
    #open the csv file
    open my $fh,"<",'sample.csv';
    
    my $csv = Text::CSV->new({binary => 1, auto_diag => 1, allow_whitespace => 1});
    
    #skip the first line that contains the headers
    my $line = <$fh>;
    
    #connect to the database.
    #depending on the type of DB
    #you're using the connect string
    #will vary. Read the DBI CPAN docs for more info.
    #get the SID from your DBA and
    #enter that instead of XE.
    #for express edition the SID is XE
    my $db_string = 'DBI:Oracle:XE';
    
    #$db_username & $db_password need to be set
    #to real values. if you don't know this
    #ask your DBA or read the documentation!!
    my $db_username = 'scott';
    my $db_password = 'tiger';
    
    my $dbh = DBI->connect($db_string, $db_username, $db_password) or report_error("Couldn't connect to database!",DBI->errstr);
    
    #prepare the insert statement.
    #in the line below each '?' represents
    #a value that will be passed
    #by the script.
    my $sth = $dbh->prepare('INSERT INTO TABLE_NAME (column1,column2,column3) VALUES (?,?,?)') or report_error("Couldn't prepare insert statement!",DBI->errstr);
    
    #now, we read the csv line-by-line
    #and insert selected columns into the database
    #you'll need to identify what columns
    #you have to insert.for the sample csv
    #i'll upload the first, second and fourth colums only
    #note that Perl starts the column numbering
    #from 0.
    
    while ($line = $csv->getline($fh)){
    
        $sth->execute($line->[0],$line->[1],$line->[3]) or report_error("Couldn't add value from CSV to database!","Values: ".$line->[0],$line->[1],$line->[3]."\nDB Error:\n\n".DBI->errstr);
    
    }
    
    #close the csv file opened at the start
    close $fh;
    
    ##################################################################################
    #
    #    THE ERROR REPORTING SUB-ROUTINE IS DEFINED BELOW. IT TAKES 2 VARIABLES:
    #    1. THE FIRST ONE IS THE SUBJECT OF THE EMAIL
    #    2. THE SECOND ONE IS THE BODY.
    #
    #    IT ACCEPTS HTML AS WELL, ALTHOUGH IT SHOULD BE USED SPARINGLY TO ENSURE
    #    YOUR MAIL ISN'T INCORRECTLY MARKED AS SPAM.
    #
    ##################################################################################
    
    sub report_error {
       
        my $err_msg = MIME::Lite->new(
                        From    => 'from@domain.com',
                        To      =>'to1@domain.com,to2@domain.com',
                        Subject =>$_[0],
                        Type    =>'multipart/related'
                        );
        $err_msg->attach(
                        Type => 'text/html',
                        Data => $_[1]
                        );
    
        #enter your email username & password here
        #AuthUser is the username that you use to sign in to email clients
        #domain should be the SMTP domain
        MIME::Lite->send(
                        'smtp',
                        "domain.com",
                        Timeout=>60,
                        AuthUser=>'email_username',
                        AuthPass=>'password'
                        );
        $err_msg->send();
        my $err_time = DateTime->now();
        exit;
    }
    

You are reading an archived discussion.

Related Posts

I am engineering student currently in 6 th sem.I am pursuing Electrical And Electronics Engineering in India. I wish to Do MS in US or Australia, but i am confused...
Researchers from North Carolina State University have developed a new technique for using chains of nanoparticles to manipulate elastic polymers in three dimensions, a process that could be effectively used...
salam alykom ..hi my friends ..why we cant generate voltage more then about 33 kv in power plants and remove the setup transfermers
This question always there in my mind as i am E&TC engineer trying to get entry level in IT(development,testing or any other...) In market specially in pune there are govt...
A team of researchers from University of Cambridge has come up with two new systems that could revolutionise the tech behind driverless cars. Developed on top of deep learning algorithms,...