CrazyEngineers Archive
Old, but evergreen and popular discussions on CrazyEngineers, presented to you in read-only mode.
@Aashish Joshi • 21 Dec, 2015
Hi,
I sometimes work (sort, filter, create pivot tables, etc) with CSV files that have a few million rows.
Since Excel & Open office don't support more than 1048576 rows I have to first upload the CSVs into a database and then work on them - which isn't very convenient as I have to spend extra time on uploading the data. Depending on the file size this sometimes takes an hour or so. 😔
I'd like to avoid splitting the CSV into smaller files.
Has anyone faced a similar problem before? Are there any other solutions - perhaps there are tools I'm unaware of?
Also, slightly off topic, why is there a limitation on the number of rows? Earlier versions of office supported about 65k rows, now they've "upgraded" to a little over 1 million rows. 😨
I sometimes work (sort, filter, create pivot tables, etc) with CSV files that have a few million rows.
Since Excel & Open office don't support more than 1048576 rows I have to first upload the CSVs into a database and then work on them - which isn't very convenient as I have to spend extra time on uploading the data. Depending on the file size this sometimes takes an hour or so. 😔
I'd like to avoid splitting the CSV into smaller files.
Has anyone faced a similar problem before? Are there any other solutions - perhaps there are tools I'm unaware of?
Also, slightly off topic, why is there a limitation on the number of rows? Earlier versions of office supported about 65k rows, now they've "upgraded" to a little over 1 million rows. 😨
@Manish Goyal • 31 Dec, 2015
did you check this discussion?
Excel CSV. file with more than 1,048,576 rows of data
Excel CSV. file with more than 1,048,576 rows of data
@Kaustubh Katdare • 31 Dec, 2015
MS Access, perhaps? If that's not a solution; I'd rather setup a local database with phpMyAdmin (XAMPP or MAMP) and simply import all my data into it. It'd make all the operations a breeze. What say?
@simplycoder • 04 Jan, 2016 • 1 like
If Microsoft stack then perhaps it can be done using SSIS packages.
@Aashish Joshi • 07 Jan, 2016
I did try importing the csv file "as-is" into Oracle 11g Express Edition using the Application Express UI. That failed miserably - 6 hours and not a single row imported! 😔
Then I had a light-bulb moment! 😁
I wrote a small Perl script to read the csv and add the columns that were of interest to me into mysql using "Insert Delayed". This worked like a charm 😀
Then I had a light-bulb moment! 😁
I wrote a small Perl script to read the csv and add the columns that were of interest to me into mysql using "Insert Delayed". This worked like a charm 😀
@Kaustubh Katdare • 08 Jan, 2016
Sharing is caring, my friend.Aashish JoshiI wrote a small Perl script to read the csv and add the columns that were of interest to me into mysql using "Insert Delayed". This worked like a charm
@Aashish Joshi • 12 Jan, 2016
Here's the perl script I used. Hope it helps someone 😀
It will read a csv line-by-line and and add the values to an Oracle database. With minor modifications the same script can be used for MySQL and other databases as well.
The script ignores the first line that usually contains the column names.
It will read a csv line-by-line and and add the values to an Oracle database. With minor modifications the same script can be used for MySQL and other databases as well.
The script ignores the first line that usually contains the column names.
use strict; use warnings; use Text::CSV; use DBI; use MIME::Lite; #open the csv file open my $fh,"<",'sample.csv'; my $csv = Text::CSV->new({binary => 1, auto_diag => 1, allow_whitespace => 1}); #skip the first line that contains the headers my $line = <$fh>; #connect to the database. #depending on the type of DB #you're using the connect string #will vary. Read the DBI CPAN docs for more info. #get the SID from your DBA and #enter that instead of XE. #for express edition the SID is XE my $db_string = 'DBI:Oracle:XE'; #$db_username & $db_password need to be set #to real values. if you don't know this #ask your DBA or read the documentation!! my $db_username = 'scott'; my $db_password = 'tiger'; my $dbh = DBI->connect($db_string, $db_username, $db_password) or report_error("Couldn't connect to database!",DBI->errstr); #prepare the insert statement. #in the line below each '?' represents #a value that will be passed #by the script. my $sth = $dbh->prepare('INSERT INTO TABLE_NAME (column1,column2,column3) VALUES (?,?,?)') or report_error("Couldn't prepare insert statement!",DBI->errstr); #now, we read the csv line-by-line #and insert selected columns into the database #you'll need to identify what columns #you have to insert.for the sample csv #i'll upload the first, second and fourth colums only #note that Perl starts the column numbering #from 0. while ($line = $csv->getline($fh)){ $sth->execute($line->[0],$line->[1],$line->[3]) or report_error("Couldn't add value from CSV to database!","Values: ".$line->[0],$line->[1],$line->[3]."\nDB Error:\n\n".DBI->errstr); } #close the csv file opened at the start close $fh; ################################################################################## # # THE ERROR REPORTING SUB-ROUTINE IS DEFINED BELOW. IT TAKES 2 VARIABLES: # 1. THE FIRST ONE IS THE SUBJECT OF THE EMAIL # 2. THE SECOND ONE IS THE BODY. # # IT ACCEPTS HTML AS WELL, ALTHOUGH IT SHOULD BE USED SPARINGLY TO ENSURE # YOUR MAIL ISN'T INCORRECTLY MARKED AS SPAM. # ################################################################################## sub report_error { my $err_msg = MIME::Lite->new( From => 'from@domain.com', To =>'to1@domain.com,to2@domain.com', Subject =>$_[0], Type =>'multipart/related' ); $err_msg->attach( Type => 'text/html', Data => $_[1] ); #enter your email username & password here #AuthUser is the username that you use to sign in to email clients #domain should be the SMTP domain MIME::Lite->send( 'smtp', "domain.com", Timeout=>60, AuthUser=>'email_username', AuthPass=>'password' ); $err_msg->send(); my $err_time = DateTime->now(); exit; }
6k views
Related Posts
@Harshad Italiya · Mar 25, 2011
DIY: - Do It Yourself
Wireless Data Transmission Wireless Chat using RFM70
RFM70 is a GFSK transceiver module operating in the world wide ISM frequency band at 2400 -...
7.2k views
@Kaustubh Katdare · Apr 1, 2012
I've been hearing the rumors that Tata group companies, Tata Technologies and Tata Consultancy Services (TCS) have a surprise for the candidates recruited in the 2011-2012 campus recruitment season. The...
5.6k views
@Kaustubh Katdare · Aug 1, 2013
Engineers at Indian Institute Of Technology, Madras (Chennai) have developed a refrigerator that can achieve temperatures as low as -200 degree C, using a single stage vapour compression refrigerator that...
3.4k views
@jav · Jan 21, 2009
LET YOUR HEART BE IN PEACE
Hi, Sir, i wanna method for increasing the pressure of the steam, can you guide me in what all ways can i do that,...
4.2k views
@Ankita Katdare · Apr 3, 2015
Gionee's India body has rolled up its sleeves as it gets ready to launch its high-end smartphone Gionee Elife S7 in India at an unveiling event scheduled for April 4th,...
3.4k views