Hi I am looking at storing huge data in database (approximately 70,000,00 records). Each record consists of 1. Name 2. Contact number 3. Resume (may vary from 50KB to 150KB) 4. Skill set And when a user(having role based permission) wants to search database and retreive those user's record having certain skill set, should able to perform as quick as possible and show it in a view. Please suggest me if any such modules are available or need to be developed. For front end interface I am looking at webform. Do you have a better suggestion please recomond. Is there any books/links available , which can guide for efficient solution, please let me know. Best Regards Austin
Is this data that must be managed through Drupal directly? If so, what level of flexibility do you need with it? Is this data coming in from an existing 3rd party source? If so, why does it need to live in Drupal? Can you get away with leveraging the existing data source? Assuming your number is in fact 70 million records (if you have that many resumes I really have to wonder who your recruiter is... <g>), then it really hinges on what "skill set" means. A properly tuned SQL table should handle that many records if you add the right indexes and give it enough RAM, but if skill set is a very complex concept then that may not work. If skill set is an unpredictable structure, you may be better off looking at a Document DB such as MongoDB or Cassandra. The degree to which you need to manage that data through Drupal rather than just search it is also a major factor. So "needs more info". Also be aware that DrupalCon is this week (the big Drupal dev conference), so most people on the list will be quite busy this week. You may not get a very rapid reply. :-) --Larry Garfield On Sunday, March 06, 2011 4:20:47 pm Austin Einter wrote:
Hi I am looking at storing huge data in database (approximately 70,000,00 records). Each record consists of
1. Name 2. Contact number 3. Resume (may vary from 50KB to 150KB) 4. Skill set
And when a user(having role based permission) wants to search database and retreive those user's record having certain skill set, should able to perform as quick as possible and show it in a view.
Please suggest me if any such modules are available or need to be developed.
For front end interface I am looking at webform. Do you have a better suggestion please recomond.
Is there any books/links available , which can guide for efficient solution, please let me know.
Best Regards Austin
Thanks Larry. Please see response inline. I am not an expert, in learning stage, so please bear with me if I am asking very basic questions. Regards, Austin On Mon, Mar 7, 2011 at 4:14 AM, Larry Garfield <larry@garfieldtech.com>wrote:
Is this data that must be managed through Drupal directly? If so, what level of flexibility do you need with it?
[Austin] I do not mind if Drupal can manage data directly. Here the only concern is performance. While adding data, it should be faster and more importantly, while seraching one should be able to search quickly (given huge number of record).
Is this data coming in from an existing 3rd party source? If so, why does it need to live in Drupal? Can you get away with leveraging the existing data source?
[Austin] Data is not from 3rd party. Over the period of time, I really hope number of users will increase.
Assuming your number is in fact 70 million records (if you have that many resumes I really have to wonder who your recruiter is... <g>), then it really hinges on what "skill set" means. A properly tuned SQL table should handle that many records if you add the right indexes and give it enough RAM, but if skill set is a very complex concept then that may not work. If skill set is an unpredictable structure, you may be better off looking at a Document DB such as MongoDB or Cassandra.
[Austin] Skill set is basically comma (",") separated values. Example - PHP, SQL, Drupal, VoIP, SIP, HTTP etc. This is the field that will be used during searching of records instead of using resume just to have improved performance.
The degree to which you need to manage that data through Drupal rather than just search it is also a major factor.
[Austin] The data addition, delete and search are the major operations. If you can think any other data operations (which I might be missing) , really appreciate if you can share.
Is there any way, where I can compress the resume and store as zip file, when somebody really wants it unzip and give it. This may be usefull to save memory to great extent.
So "needs more info". Also be aware that DrupalCon is this week (the big Drupal dev conference), so most people on the list will be quite busy this week. You may not get a very rapid reply. :-)
--Larry Garfield
On Sunday, March 06, 2011 4:20:47 pm Austin Einter wrote:
Hi I am looking at storing huge data in database (approximately 70,000,00 records). Each record consists of
1. Name 2. Contact number 3. Resume (may vary from 50KB to 150KB) 4. Skill set
And when a user(having role based permission) wants to search database and retreive those user's record having certain skill set, should able to perform as quick as possible and show it in a view.
Please suggest me if any such modules are available or need to be developed.
For front end interface I am looking at webform. Do you have a better suggestion please recomond.
Is there any books/links available , which can guide for efficient solution, please let me know.
Best Regards Austin
Does even Monster have a resume for 1 out of every five people in the US? I doubt it seriously. I'm guessing that you would have to service every country in the world to get that many, and then translation would to be your biggest problem. If you use CCK, I believe you can get the skillset indexed for search (at the very least in a custom module). I don't know why Webform would be preferable here. A custom module could probably invoke PHP's zipping functions to zip going into the database and unzip it coming back out. I see this over and over: A newbie thinks their first Drupal project should be the next Monster. I am sorry to tell you, but you won't get there. Go and focus on some smaller projects to learn Drupal. If it has to be done now, you need to hire a team, and that's big bucks. Nancy Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr. ________________________________ From: Larry Garfield Assuming your number is in fact 70 million records The degree to which you need to manage that data through Drupal rather than just search it is also a major factor. From: pm Austin Einter wrote:
Hi I am looking at storing huge data in database (approximately 70,000,00 records). Each record consists of
1. Name 2. Contact number 3. Resume (may vary from 50KB to 150KB) 4. Skill set
And when a user(having role based permission) wants to search database and retreive those user's record having certain skill set, should able to perform as quick as possible and show it in a view.
For front end interface I am looking at webform. Do you have a better suggestion please recomond.
Thanks Nancy. I wanted to design/implement similar to monester, where I can handle large number of users. But can I have those many users or not, not sure as of now, however it is good to have a scalable design as a starting point. Last few months, I have been looking at Drupal, have gone through number of books. I am comfortable with Drupal to some extent. Now I want to learn and implement how to store and retreive data, and what is the best efficient way when we need to deal with large amount of data. I just created pages using webform and CCK. With CCK I see 1. Number of extra field are shown up (like Title, menu settings, input format etc) and I do not want these fields. Please let me know how can I stop these fields appearing. 2. I wanted to have a field for resume upload, could not find the FILE field using CCK. Then I enabled fileupload module, with this I can see a file upload field in page. Probably I can look at installing file field. 3. There is a fieldset field in webform to group similar fields under one group. I am not aware of module which can do this if using CCK. Appreciate if you can suggest how to keep similar fields in one group. If I can overcome these issues I am pro for CCK. Assuming I am going to use CCK, as you mentioned skillset can be indexed (using custom module), can you kindly provide more information how it can be done atleast from logic perspective. Regards Austin. On Mon, Mar 7, 2011 at 7:19 AM, nan wich <nan_wich@bellsouth.net> wrote:
Does even Monster have a resume for 1 out of every five people in the US? I doubt it seriously. I'm guessing that you would have to service every country in the world to get that many, and then translation would to be your biggest problem.
If you use CCK, I believe you can get the skillset indexed for search (at the very least in a custom module). I don't know why Webform would be preferable here.
A custom module could probably invoke PHP's zipping functions to zip going into the database and unzip it coming back out.
I see this over and over: A newbie thinks their first Drupal project should be the next Monster. I am sorry to tell you, but you won't get there. Go and focus on some smaller projects to learn Drupal. If it has to be done now, you need to hire a team, and that's big bucks.
*Nancy*
Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr.
------------------------------ *From:* Larry Garfield
Assuming your number is in fact 70 million records
The degree to which you need to manage that data through Drupal rather than
just search it is also a major factor.
*From:* pm Austin Einter wrote:
Hi I am looking at storing huge data in database (approximately 70,000,00 records). Each record consists of
1. Name 2. Contact number 3. Resume (may vary from 50KB to 150KB) 4. Skill set
And when a user(having role based permission) wants to search database and retreive those user's record having certain skill set, should able to perform as quick as possible and show it in a view.
For front end interface I am looking at webform. Do you have a better suggestion please recomond.
Indeed, a scalable design is an excellent idea. However, one needs to start with a realistic goal. I seriously doubt that 70 million is at least an order of magnitude too high. Title can be hidden with the Automatic nodetitle module (http://drupal.org/project/auto_nodetitle), which you're probably going to want; all nodes must have a title, in this case you'll probably want to derive it from the name. Menu settings is controlled by permissions. Input format is set by the format and who may use it. Again, I will caution you: if you didn't already know those last two, you are thinking too big to succeed right now. Filefield is specifically what you're asking for, but you still need some function to zip the data, if it will help. CCK comes with fieldgroup in D6, but requires the Field Group module in D7 (and is supposed to be better). http://drupal.org/project/field_group Now, to offer another option. One possible drawback to CCK is that all fields will be loaded when the node is loaded, including possibly unzipping the resume. If you code your own node module, you have more control over which pieces are loaded (think performance) at any particular event. There are examples of this in the API site. I believe the examples include search indexing. In terms of search, and in particular the user experience, you might want to look at ApacheSolr search with its faceted drill-down. By the way, you haven't even mentioned location, which is important in job searching. Nancy Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King, Jr. ________________________________ From: Austin Einter I wanted to design/implement similar to monester, where I can handle large number of users. But can I have those many users or not, not sure as of now, however it is good to have a scalable design as a starting point. Last few months, I have been looking at Drupal, have gone through number of books. I am comfortable with Drupal to some extent. Now I want to learn and implement how to store and retreive data, and what is the best efficient way when we need to deal with large amount of data. I just created pages using webform and CCK. With CCK I see 1. Number of extra field are shown up (like Title, menu settings, input format etc) and I do not want these fields. Please let me know how can I stop these fields appearing. 2. I wanted to have a field for resume upload, could not find the FILE field using CCK. Then I enabled fileupload module, with this I can see a file upload field in page. Probably I can look at installing file field. 3. There is a fieldset field in webform to group similar fields under one group. I am not aware of module which can do this if using CCK. Appreciate if you can suggest how to keep similar fields in one group. If I can overcome these issues I am pro for CCK. Assuming I am going to use CCK, as you mentioned skillset can be indexed (using custom module), can you kindly provide more information how it can be done atleast from logic perspective.
On Sunday, March 06, 2011 10:51:04 pm nan wich wrote:
Now, to offer another option. One possible drawback to CCK is that all fields will be loaded when the node is loaded, including possibly unzipping the resume. If you code your own node module, you have more control over which pieces are loaded (think performance) at any particular event. There are examples of this in the API site. I believe the examples include search indexing.
If you're using filefield (which in D6 you should be), then the resume file won't even be loaded from disk until you try to use it. The rest of the node will be, but if you're just offering the resume file for download you just provide a link to it and you're done. It gets stored on the file system where it belongs. --Larry Garfield
For fast full text search(ie. search needed skill) one of the best solution is using Shphinx or Apache solr. On Mon, Mar 7, 2011 at 8:04 AM, Larry Garfield <larry@garfieldtech.com>wrote:
On Sunday, March 06, 2011 10:51:04 pm nan wich wrote:
Now, to offer another option. One possible drawback to CCK is that all fields will be loaded when the node is loaded, including possibly unzipping the resume. If you code your own node module, you have more control over which pieces are loaded (think performance) at any particular event. There are examples of this in the API site. I believe the examples include search indexing.
If you're using filefield (which in D6 you should be), then the resume file won't even be loaded from disk until you try to use it. The rest of the node will be, but if you're just offering the resume file for download you just provide a link to it and you're done. It gets stored on the file system where it belongs.
--Larry Garfield
-- With best wishes, Anatoly Belyaev
participants (5)
-
Austin Einter -
Belyaev Anatoly -
Kamal Palei -
Larry Garfield -
nan wich